2003
DOI: 10.1007/3-540-36901-5_42
|View full text |Cite
|
Sign up to set email alerts
|

Extracting Content Structure for Web Pages Based on Visual Representation

Abstract: A new web content structure based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and automatic page adaptation can benefit from this structure. This paper presents an automatic top-down, tag-tree independent approach to detect web content structure. It simulates how a user understands web layout structure based on his visual perception. Comparing to other existing techniques, our approach is independent to underlying documentation… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
158
0

Year Published

2006
2006
2014
2014

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 249 publications
(166 citation statements)
references
References 16 publications
1
158
0
Order By: Relevance
“…Vision-based Page Segmentation, or VIPS [8] for short, is intended to find all of the regions of which a document is composed. It builds on the hypothesis that web designers provide visual cues that help people recognize the different regions of which a document is composed, e.g., horizontal or vertical rules, boxes, colored panels, special fonts, or background images.…”
Section: Vips [8]mentioning
confidence: 99%
“…Vision-based Page Segmentation, or VIPS [8] for short, is intended to find all of the regions of which a document is composed. It builds on the hypothesis that web designers provide visual cues that help people recognize the different regions of which a document is composed, e.g., horizontal or vertical rules, boxes, colored panels, special fonts, or background images.…”
Section: Vips [8]mentioning
confidence: 99%
“…VIPS algorithm is a popular algorithm for web page segmentation based on the vision of a human and hence useful in many contexts like information extraction, information retrieval and automatic understanding of a web page (Cai, Yu, Wen, & Ma, 2003). VIPS algorithm makes use of the HTML DOM (Document Object Model) tree and extracts suitable blocks from it.…”
Section: Snippet Extraction Using Vips Algorithmmentioning
confidence: 99%
“…Users discover a lot of loaded hyper structure as the growth on the web is massive [4], [9], [13]. Updating incoming data and extracting relevant information without redundancy from the web quickly and efficiently becomes a growing concern among web mining research communities [6], [8].…”
Section: Introductionmentioning
confidence: 99%