2007
DOI: 10.1109/icdar.2007.4376990
|View full text |Cite
|
Sign up to set email alerts
|

Layout Based Information Extraction from HTML Documents

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2009
2009
2015
2015

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(7 citation statements)
references
References 7 publications
0
7
0
Order By: Relevance
“…These methods also have some limitations, for example: these methods may falsely separate closely related contents and combine unrelated contents together. Some other heuristicsbased approaches rely on visual cues from browser renderings [2], [5]- [7], [10]. Most of them focus on the location, size or font cues of web pages.…”
Section: Copyright C 2014 the Institute Of Electronics Information Amentioning
confidence: 99%
See 1 more Smart Citation
“…These methods also have some limitations, for example: these methods may falsely separate closely related contents and combine unrelated contents together. Some other heuristicsbased approaches rely on visual cues from browser renderings [2], [5]- [7], [10]. Most of them focus on the location, size or font cues of web pages.…”
Section: Copyright C 2014 the Institute Of Electronics Information Amentioning
confidence: 99%
“…The early techniques of web page segmentation are mainly based on machine learning algorithms [1], [4], [8], [9] and rule-based heuristics [2], [3], [5]- [7], [10]- [12], [15]. Because of the small scale training data set, machinelearning-based methods can only be applied in some certain fields of web pages.…”
Section: Introductionmentioning
confidence: 99%
“…Our segmentation algorithm (Burget, 2007), works in a bottom-up manner. Each block is represented by the coordinates of top-left and bottom-right point of a box.…”
Section: Phase I -Visual Block Classificationmentioning
confidence: 99%
“…Processing of digital documents not having a strict layout is also faced by other works, such as Web pages (Feng, 2005;Burget, 2007;Guo, 2007) or email messages (Chao, 2005).…”
Section: Related Workmentioning
confidence: 99%