2003
DOI: 10.1613/jair.1145
|View full text |Cite
|
Sign up to set email alerts
|

Wrapper Maintenance: A Machine Learning Approach

Abstract: The proliferation of online information sources has led to an increased use of wrappers for extracting data from Web sources. While most of the previous research has focused on quick and efficient generation of wrappers, the development of tools for wrapper maintenance has received less attention. This is an important research problem because Web sources often change in ways that prevent the wrappers from extracting data correctly. We present an efficient algorithm that learns structural information about data… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
107
0
6

Year Published

2004
2004
2015
2015

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 109 publications
(113 citation statements)
references
References 27 publications
0
107
0
6
Order By: Relevance
“…On the topic of wrapper breakage, which occurs when a wrapper can no longer retrieve data because the HTML document is changed structurally, there has been less research [13][14][15][16][17]. These approaches discuss ways to identify data in labeled documents, but do not offer a solution for text analysis or entity recognition.…”
Section: Related Workmentioning
confidence: 99%
“…On the topic of wrapper breakage, which occurs when a wrapper can no longer retrieve data because the HTML document is changed structurally, there has been less research [13][14][15][16][17]. These approaches discuss ways to identify data in labeled documents, but do not offer a solution for text analysis or entity recognition.…”
Section: Related Workmentioning
confidence: 99%
“…It is possible to use Machine Learning algorithms to learn automatically the wrappers [22,23,26,27,31]. The automatic wrapper induction [22,31] Each landmark automaton is specialized in extracting an attribute.…”
Section: Machine Learningmentioning
confidence: 99%
“…Maintaining wrappers is related to two different issues: on the one hand, to detect when a wrapper is not retrieving correctly the data (wrapper verification). On the other hand, to automatically recover the wrapper generating a new wrapper that takes into account the possible changes in the Web source (wrapper reinduction) [23,26,27]. …”
Section: Machine Learningmentioning
confidence: 99%
“…Thus, efficient maintenance of Web information systems is a crucial task, which can benefit much from mining techniques. This topic has only recently received some attention, with several works on using learning techniques to detect and repair wrappers [54,59].…”
Section: Querying With Information Processing Systemsmentioning
confidence: 99%