Advanced Techniques in Web Data Pre-processing and Cleaning

Román, Pablo E.; Dell, Robert F.; Velásquez, Juan D.

doi:10.1007/978-3-642-14461-5_2

Cited by 2 publications

(3 citation statements)

References 83 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The first step was to ask the Web developer to group the HTML elements of each Web page into Web objects [37]. The resulting set and the respective components of each Web Object were stored in a database, including its cumulative size and position on the page layout.…”

Section: Web Data Preprocessingmentioning

confidence: 99%

Combining eye tracking and pupillary dilation analysis to identify Website Key Objects

et al. 2015

Self Cite

View full text Add to dashboard Cite

Section: Web Data Preprocessingmentioning

confidence: 99%

Combining eye tracking and pupillary dilation analysis to identify Website Key Objects

et al. 2015

Self Cite

View full text Add to dashboard Cite

“…Particularly the authors [2,5,10,14,16,17,23] have paid attention on deriving functional web usage patterns from big web log data, still, the researchers expressed the need of more attention on data preparation stage in the overall process of big data analytics. In the same line, some of the research works [21,22,25,29,34] paid attention on data cleaning, one of the important stage of big data preparation. They find-out the issues and approaches that are involved in cleaning the complex and noise web log data to tackle efficiency and scalability of analytics.…”

Section: Related Workmentioning

confidence: 99%

“…They find-out the issues and approaches that are involved in cleaning the complex and noise web log data to tackle efficiency and scalability of analytics. The other approaches [29,30,32,33] are devised to address the remaining individual stages of data preprocessing, procedures of parsing of weblog entries [19,25,29], feature identification [13, 19 ,25], feature selection [13,19,25,29,34], user identification [32,35,36], sessionization [26,32,33] and path completion [32,33,36] etc. In addition to that, very few studies [23,31,32,33] identified the necessity of discarding the transactions that are performed by web robots or crawlers, although the investigators strongly recommend the efficient learning algorithms to differentiate humans from web crawlers.…”

Section: Related Workmentioning

confidence: 99%

Enriched Big Data Pre-Processing Model With Machine Learning Approach to Investigate Web User Usage Behavioury

Silpa¹,

Rao²

2021

INDJCSE

View full text Add to dashboard Cite

In the present, the web has become the environment to live, learn, entertain, and socialize individually or as a group through digital platforms where users with high aspirations. As a result, investigating the web user behaviour is most active research even in the present and demands re-innovation in potential analytics to provide reliable and quality customized solutions. To perform this, the weblog is the primary source and poses tremendous challenges for the web researchers with complex sequence of processing steps and abundant information of weblog. Further, limited distributed storage models, partial parallel computing techniques, typical identification of appropriate attributes in the weblog analysis demands the high competitive performance models for effective characterization of web users. The importance of preprocessing in the entire process of weblog analysis is so critical while it is popular among researchers, nonetheless, the studies are limited. In addition, existing pre-processing studies focus on elicitation, reduction and transformation of web user usage data individually not comprehensively.Towards this, the present paper proposes Enriched Pre-processing Model (EPPM) that comprehensively concentrating on all the stages of pre-processing of weblog data in the framework of apache spark. The EPPM enables the capability of processing real time streaming data along with batch data as to sustain the validity of web user behaviour extracted from historical data also requires the strategy of processing real time streaming data. In addition to all pre-processing steps, EPPM integrates a machine learning approach to discard the search engine accessed logs from weblog as they are excessive in noticing the web user behaviour. The performance of EPPM is validated by conducting a series of experiments on a server side weblog data in a standard execution environment. The experimental results are also included.

show abstract

Advanced Techniques in Web Data Pre-processing and Cleaning

Cited by 2 publications

References 83 publications

Combining eye tracking and pupillary dilation analysis to identify Website Key Objects

Combining eye tracking and pupillary dilation analysis to identify Website Key Objects

Enriched Big Data Pre-Processing Model With Machine Learning Approach to Investigate Web User Usage Behavioury

Contact Info

Product

Resources

About