2015
DOI: 10.1007/978-3-319-19890-3_24
|View full text |Cite
|
Sign up to set email alerts
|

A Quantitative Comparison of Semantic Web Page Segmentation Approaches

Abstract: This paper explores the effectiveness of different semantic web page segmentation algorithms on modern websites. We compare three known algorithms each serving as an example of a particular approach to the problem, and one self-developed algorithm, WebTerrain, that combines two of the approaches. With our testing framework we have compared the performance of four algorithms for a large benchmark we have constructed. We have examined each algorithm for a total of eight different configurations (varying datasets… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(12 citation statements)
references
References 13 publications
0
12
0
Order By: Relevance
“…Some provide access to the tools but it is not the general case. An interesting experience is the work of [11]. They present a method for quantitative comparison of semantic Web page segmentation algorithms.…”
Section: Segmentation Correctness Evaluationmentioning
confidence: 98%
“…Some provide access to the tools but it is not the general case. An interesting experience is the work of [11]. They present a method for quantitative comparison of semantic Web page segmentation algorithms.…”
Section: Segmentation Correctness Evaluationmentioning
confidence: 98%
“…Algorithms use structural features based on the DOM tree and the textual content, and visual features extracted from renderings of individual nodes as well as the entire web page. Most algorithms use the DOM tree structure in some way, for example to identify headings [26], block nodes [1,6], or regularities [16], and to compute the tree depth [24] or the tree distance [18] of nodes. Other algorithms use the text density [22] or visual appearance of DOM nodes when rendered (e.g., their size or color; Baluja [4], Zeleny et al [37]).…”
Section: Related Workmentioning
confidence: 99%
“…Nine of the 19 publications listed in Table 1 give-explicitly or implicitly-a definition of what a web page segment is. The most common one (though used in only four publications) is that of a visual "block" with coherent content [9,24,26,37]. Other definitions characterize segments by their edges [12,13], as being semantically self-contained [16], as distinct [30], or as labeled with a heading [28].…”
Section: Concept Formation: Page Segmentmentioning
confidence: 99%
See 1 more Smart Citation
“…et al [26] proposed an algorithm which combines a plain structural approach with a rendering-based approach. They make use of DOM tree with addition to the information of visibility and dimensions of each of the tree.…”
Section: Hybrid Approachmentioning
confidence: 99%