2010
DOI: 10.1007/978-3-642-14461-5_2
|View full text |Cite
|
Sign up to set email alerts
|

Advanced Techniques in Web Data Pre-processing and Cleaning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 83 publications
0
3
0
Order By: Relevance
“…The first step was to ask the Web developer to group the HTML elements of each Web page into Web objects [37]. The resulting set and the respective components of each Web Object were stored in a database, including its cumulative size and position on the page layout.…”
Section: Web Data Preprocessingmentioning
confidence: 99%
“…The first step was to ask the Web developer to group the HTML elements of each Web page into Web objects [37]. The resulting set and the respective components of each Web Object were stored in a database, including its cumulative size and position on the page layout.…”
Section: Web Data Preprocessingmentioning
confidence: 99%
“…Particularly the authors [2,5,10,14,16,17,23] have paid attention on deriving functional web usage patterns from big web log data, still, the researchers expressed the need of more attention on data preparation stage in the overall process of big data analytics. In the same line, some of the research works [21,22,25,29,34] paid attention on data cleaning, one of the important stage of big data preparation. They find-out the issues and approaches that are involved in cleaning the complex and noise web log data to tackle efficiency and scalability of analytics.…”
Section: Related Workmentioning
confidence: 99%
“…They find-out the issues and approaches that are involved in cleaning the complex and noise web log data to tackle efficiency and scalability of analytics. The other approaches [29,30,32,33] are devised to address the remaining individual stages of data preprocessing, procedures of parsing of weblog entries [19,25,29], feature identification [13, 19 ,25], feature selection [13,19,25,29,34], user identification [32,35,36], sessionization [26,32,33] and path completion [32,33,36] etc. In addition to that, very few studies [23,31,32,33] identified the necessity of discarding the transactions that are performed by web robots or crawlers, although the investigators strongly recommend the efficient learning algorithms to differentiate humans from web crawlers.…”
Section: Related Workmentioning
confidence: 99%