2006
DOI: 10.1007/s10115-006-0014-x
|View full text |Cite
|
Sign up to set email alerts
|

Visual information extraction

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2007
2007
2014
2014

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 22 publications
(9 citation statements)
references
References 17 publications
0
9
0
Order By: Relevance
“…In particular, HTML tables and HTML lists are known to contain relational data. -Decoration, visual appearance [Aumann et al 2006;Meng et al 2003;Yoshinaga and Torisawa 2007]: Sometimes the structure of a document is easier to learn through its visual aspects, especially when a pattern in terms of tags is difficult to define or learn. -PMI and search hits [Church and Hanks 1989;Turney 2001;: Pointwise Mutual Information (PMI) is a statistical measure that indicates possible correlation between two expressions.…”
Section: Relation Retrieval: How To Acquire Relations?mentioning
confidence: 99%
“…In particular, HTML tables and HTML lists are known to contain relational data. -Decoration, visual appearance [Aumann et al 2006;Meng et al 2003;Yoshinaga and Torisawa 2007]: Sometimes the structure of a document is easier to learn through its visual aspects, especially when a pattern in terms of tags is difficult to define or learn. -PMI and search hits [Church and Hanks 1989;Turney 2001;: Pointwise Mutual Information (PMI) is a statistical measure that indicates possible correlation between two expressions.…”
Section: Relation Retrieval: How To Acquire Relations?mentioning
confidence: 99%
“…Zhao et al [41], Zhai and Liu [40] and Simon and Lausen [28] describe different approaches for detecting repetitive patterns on web pages, which are predominantly source-code based and enhanced with visual cues. In contrast, Aumann et al [3] describe a system that works only on a hierarchical structure of the visual representation (experiments are performed with PDF documents) and learns to recognize text fields such as author or title from manually tagged training sets of documents. Conversely, our approach does not attempt to find individual text fields, but rather, larger structures, does not require training sets and neither imposes a tree structure on web pages.…”
Section: Related Workmentioning
confidence: 99%