2017
DOI: 10.1016/j.ipm.2017.02.002
|View full text |Cite
|
Sign up to set email alerts
|

Box clustering segmentation: A new method for vision-based web page preprocessing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
24
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 29 publications
(26 citation statements)
references
References 15 publications
0
24
0
Order By: Relevance
“…Most algorithms use the DOM tree structure in some way, for example to identify headings [26], block nodes [1,6], or regularities [16], and to compute the tree depth [24] or the tree distance [18] of nodes. Other algorithms use the text density [22] or visual appearance of DOM nodes when rendered (e.g., their size or color; Baluja [4], Zeleny et al [37]). Few algorithms exclusively exploit visual cues, e.g., using edge detection on screenshots [8,12].…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Most algorithms use the DOM tree structure in some way, for example to identify headings [26], block nodes [1,6], or regularities [16], and to compute the tree depth [24] or the tree distance [18] of nodes. Other algorithms use the text density [22] or visual appearance of DOM nodes when rendered (e.g., their size or color; Baluja [4], Zeleny et al [37]). Few algorithms exclusively exploit visual cues, e.g., using edge detection on screenshots [8,12].…”
Section: Related Workmentioning
confidence: 99%
“…Few algorithms exclusively exploit visual cues, e.g., using edge detection on screenshots [8,12]. Indeed, recent publications argue that only visual features provide for the necessary robustness for a generalizable algorithm [12,37], but this claim has not been verified. Our dataset provides the resources required by all the various approaches, enabling a fair and comprehensive comparison.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…[24] proposes a method based on analyzing the Document Object Model (DOM) of a web page. [4] and [31] uses a visual approach by analyzing the page rendition in a browser to extract areas. Other techniques may be based on image processing [5], semantic structures or graph resolution [13].…”
Section: Software Architecture and Evaluationsmentioning
confidence: 99%