2014
DOI: 10.1007/978-3-319-08326-1_37
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Extraction of Logical Web Lists

Abstract: Recently, there has been increased interest in the extraction of structured data from the web (both "Surface" Web and"Hidden" Web). In particular, in this paper we focus on the automatic extraction of Web Lists. Although this task has been studied extensively, existing approaches are based on the assumption that lists are wholly contained in a Web page.They do not consider that many websites span their listing on several Web Pages and show for each of these only a partial view. Similar to databases, where a vi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
3
1
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 13 publications
0
4
0
Order By: Relevance
“…Here, researchers and practitioners have used the hyperlink structure to organize Web pages for many years. The basic idea of Web structure mining algorithms is that if there is a hyperlink between two pages, then some semantic relation may exist between them [5,22,27]. A Web structure mining naïve solution for sitemap generation is the application of the simple breadth search algorithm.…”
Section: Sitemap Extractionmentioning
confidence: 99%
See 3 more Smart Citations
“…Here, researchers and practitioners have used the hyperlink structure to organize Web pages for many years. The basic idea of Web structure mining algorithms is that if there is a hyperlink between two pages, then some semantic relation may exist between them [5,22,27]. A Web structure mining naïve solution for sitemap generation is the application of the simple breadth search algorithm.…”
Section: Sitemap Extractionmentioning
confidence: 99%
“…Several works in the field of Web mining exploit Web pages taking advantage of the structural and visual information embedded in the HTML tags. In [5,22,26] collections of hyperlinks having similar visual and/or structural properties are used to filter noisy links and collect Web pages belonging to same semantic type. In [5,26] the aim is to exploit Web lists for the task of Web page clustering.…”
Section: Automatic Extraction Of Web Listsmentioning
confidence: 99%
See 2 more Smart Citations