2013
DOI: 10.1007/s11280-013-0248-y
|View full text |Cite
|
Sign up to set email alerts
|

Information extraction for deep web using repetitive subject pattern

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 16 publications
(17 citation statements)
references
References 32 publications
0
17
0
Order By: Relevance
“…It uses domain classification technique for web page retrieval based on user query. Information extraction from deep web using Repetitive Subject Pattern [36] is based on the hypothesis that information in web page is about a subject item and repetitive pattern around the subject items can be used to identify boundary. The limitation of this approach that it cannot be used for detail pages having a single subject item.…”
Section: Related Workmentioning
confidence: 99%
“…It uses domain classification technique for web page retrieval based on user query. Information extraction from deep web using Repetitive Subject Pattern [36] is based on the hypothesis that information in web page is about a subject item and repetitive pattern around the subject items can be used to identify boundary. The limitation of this approach that it cannot be used for detail pages having a single subject item.…”
Section: Related Workmentioning
confidence: 99%
“…(STEM [11] can also detect the data records from multiple pages.) We roughly divide these approaches into two groups: HTMLbased approaches [5], [7], [10], [11], [13], [15] and visionbased approaches [3], [4], [8], [12], [19], [22]. Our method named LTDE is a vision-based method, thus, more detailed about vision-based approaches will be discussed in this section.…”
Section: Related Workmentioning
confidence: 99%
“…Lines (1)(2)(3)(4)(5) show the main body of the algorithm, the input is a visual block and the output is a sequence of split lines. Lines (6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20) show the "separate" function which is the core of this algorithm. The "separate" function is a recursive function whose parameters are a rectangular region and a sequence of leaf blocks.…”
Section: Determination Of Split Linesmentioning
confidence: 99%
“…Many of the developed approaches aim to detect the schema of a web site which can be used with the generated wrapper for data extraction. Examples of these wrapper induction systems are EXLAG [1], FiVaTech [3], RoadRunner [4], Dela [5], DEPTA [6], ViPER [7], and others [8][9][10]. FiVaTech, EXLAG and RoadRunner are designed to solve the page-level extraction task, while DeLa, DEPTA, and ViPER are designed for the record-level extraction task.…”
Section: Related Workmentioning
confidence: 99%
“…10 ). The 4-tuple type includes five basic types (4)(5)(6)(7)(8)(9), where the last two are optional. The optional tuple () 10 has two basic types (11)(12).…”
Section: Introductionmentioning
confidence: 99%