2017
DOI: 10.1587/transinf.2016edp7375
|View full text |Cite
|
Sign up to set email alerts
|

LTDE: A Layout Tree Based Approach for Deep Page Data Extraction

Abstract: SUMMARYContent extraction from deep Web pages has received great attention in recent years. However, the increasingly complicated HTML structure of Web documents makes it more difficult to recognize the data records by only analyzing the HTML source code. In this paper, we propose a method named LTDE to extract data records from a deep Web page. Instead of analyzing the HTML source code, LTDE utilizes the visual features of data records in deep Web pages. A Web page is considered as a finite set of visual bloc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2019
2019
2019
2019

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
references
References 29 publications
0
0
0
Order By: Relevance