AutoRM: An effective approach for automatic Web data record mining

Shi, Shengsheng; Liu, Chengfei; Shen, Yi; Yuan, Chunfeng; Huang, Yihua

doi:10.1016/j.knosys.2015.07.012

Cited by 18 publications

(11 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…"Page content mining" que funciona en las páginas clasificadas por los motores de búsqueda. En el caso de la técnica de Web Crawler incluye un técnica de extracción llamada "Web scraping" [14] [15]. Particularmente en este estudio se hará énfasis en la técnica Web Scraping.…”

Section: Minería De Datos Y Web Scrapingunclassified

Extracción de datos de perfiles en Google Scholar utilizando un algoritmo en el lenguaje R para hacer minería de datos

2018

View full text Add to dashboard Cite

RESUMEN-El objetivo de este artículo es hacer uso de la técnica Web Scraping para extraer datos de Google Scholar (GS)a través de diferentes métodos. El Web Scraping es una forma de minería de datos no estructurada, que permite extraer información de páginas web, escanear su código HTML y generar patrones de extracción de datos. Además, con el fin de realizar un análisis más profundo, se creó un algoritmo en el lenguaje R para comparar la velocidad de extracción de los datos y la eficiencia en el formato de salida de los datos. El artículo muestra las pruebas realizadas de estos métodos para medir la velocidad de extracción de los datos y buscar la mejor forma de extraer los datos de GS de forma estructurada. Palabras claves-Web Scraping, Google Scholar, minería de datos, lenguaje R, análisis de datos.ABSTRACT-The purpose of this article is to show a study using the Web Scraping technique to extract data from Google Scholar through several methods. Web Scraping is a way of no strutured Data Miner which allow: to extract information from websites, to scan its HTML code and to generate patterns of data extraction. In addtion, to obtain better analysis in this study, an algorithm was created based on the R language in order to compare the speed of data extraction and the effciciency related to the format of out data as well as to identify a better way of extraction data from GS as structured way.

show abstract

Section: Minería De Datos Y Web Scrapingunclassified

Extracción de datos de perfiles en Google Scholar utilizando un algoritmo en el lenguaje R para hacer minería de datos

2018

View full text Add to dashboard Cite

show abstract

“…Previous studies on region and review extraction search similar data records through the DOM structure. They have a string similarity function [41] of which the complexity is O(n 2 ), where n is the total number of nodes that are the tags of the DOM tree. The learning stage of the algorithm reduces complexity to O(n) using shallow text features with decision tree learning.…”

Section: Algorithm Overviewmentioning

confidence: 99%

A novel algorithm for extracting the user reviews from web pages

Uçar

Uzun

Tüfekçi

2016

Journal of Information Science

View full text Add to dashboard Cite

Extracting the user reviews in websites such as forums, blogs, newspapers, commerce, trips, etc. is crucial for text processing applications (e.g. sentiment analysis, trend detection/monitoring and recommendation systems) which are needed to deal with structured data. Traditional algorithms have three processes consisting of Document Object Model (DOM) tree creation, extraction of features obtained from this tree and machine learning. However, these algorithms increase time complexity of extraction process. This study proposes a novel algorithm that involves two complementary stages. The first stage determines which HTML tags correspond to review layout for a web domain by using the DOM tree as well as its features and decision tree learning. The second stage extracts review layout for web pages in a web domain using the found tags obtained from the first stage. This stage is more time-efficient, being approximately 21 times faster compared to the first stage. Moreover, it achieves a relatively high accuracy of 96.67% in our experiments of review block extraction.

show abstract

“…Ortiz-Servin, Cadenas, Pelta, Castillo, and Montes-Tadeo (2015) showed that data mining methods can be used to predict fuel lattices operations. Shi, Liu, Shen, Yuan, and Huang (2015) proposed an effective approach to mine web data records, which can extract information from semi-structured web data objects. It is useful for meta search, comparative shopping, etc., therefore it can predict customers preference.…”

Section: Planning and Predictionmentioning

confidence: 99%

Integrating Data Mining Into Managerial Accounting System: Challenges and Opportunities

Wang¹,

Wang²

2016

CBR

View full text Add to dashboard Cite

Data mining involves extracting information from large data sets, discovering the hidden relationships and unknown dependencies, and supporting strategic decision-making tasks. The alignment of data mining and business would bring benefits to the organization's management. The study investigated the adoption of data mining technologies in managerial accounting system, concentrating on the challenges and opportunities. The research showed that with the technology adoption, managerial functions could be improved and current information system could be upgraded. Since the technical progresses are reshaping the world of business and accountancy, it is significant for accountants and finance professionals to exploit information technologies.

show abstract

AutoRM: An effective approach for automatic Web data record mining

Cited by 18 publications

References 43 publications

Extracción de datos de perfiles en Google Scholar utilizando un algoritmo en el lenguaje R para hacer minería de datos

Extracción de datos de perfiles en Google Scholar utilizando un algoritmo en el lenguaje R para hacer minería de datos

A novel algorithm for extracting the user reviews from web pages

Integrating Data Mining Into Managerial Accounting System: Challenges and Opportunities

Contact Info

Product

Resources

About