Visual information extraction

Aumann, Yonatan; Feldman, Ronen; Liberzon, Yair; Rosenfeld, Benjamin; Schler, Jonathan

doi:10.1007/s10115-006-0014-x

Cited by 22 publications

(9 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In particular, HTML tables and HTML lists are known to contain relational data. -Decoration, visual appearance [Aumann et al 2006;Meng et al 2003;Yoshinaga and Torisawa 2007]: Sometimes the structure of a document is easier to learn through its visual aspects, especially when a pattern in terms of tags is difficult to define or learn. -PMI and search hits [Church and Hanks 1989;Turney 2001;: Pointwise Mutual Information (PMI) is a statistical measure that indicates possible correlation between two expressions.…”

Section: Relation Retrieval: How To Acquire Relations?mentioning

confidence: 99%

Aggregated search

2014

View full text Add to dashboard Cite

Traditional search engines return ranked lists of search results. It is up to the user to scroll this list, scan within different documents, and assemble information that fulfill his/her information need. Aggregated search represents a new class of approaches where the information is not only retrieved but also assembled. This is the current evolution in Web search, where diverse content (images, videos, etc.) and relational content (similar entities, features) are included in search results. In this survey, we propose a simple analysis framework for aggregated search and an overview of existing work. We start with related work in related domains such as federated search, natural language generation, and question answering. Then we focus on more recent trends, namely cross vertical aggregated search and relational aggregated search, which are already present in current Web search.

show abstract

Section: Relation Retrieval: How To Acquire Relations?mentioning

confidence: 99%

Aggregated search

2014

View full text Add to dashboard Cite

show abstract

“…Zhao et al [41], Zhai and Liu [40] and Simon and Lausen [28] describe different approaches for detecting repetitive patterns on web pages, which are predominantly source-code based and enhanced with visual cues. In contrast, Aumann et al [3] describe a system that works only on a hierarchical structure of the visual representation (experiments are performed with PDF documents) and learns to recognize text fields such as author or title from manually tagged training sets of documents. Conversely, our approach does not attempt to find individual text fields, but rather, larger structures, does not require training sets and neither imposes a tree structure on web pages.…”

Section: Related Workmentioning

confidence: 99%

Towards domain-independent information extraction from web tables

Gatterbauer

Bohunsky

Herzog

et al. 2007

Proceedings of the 16th International Conference on World Wide Web

178

148

View full text Add to dashboard Cite

Traditionally, information extraction from web tables has focused on small, more or less homogeneous corpora, often based on assumptions about the use of

tags. A multitude of different HTML implementations of web tables make these approaches difficult to scale. In this paper, we approach the problem of domain-independent information extraction from web tables by shifting our attention from the tree-based representation of web pages to a variation of the two-dimensional visual box model used by web browsers to display the information on the screen. The thereby obtained topological and style information allows us to fill the gap created by missing domain-specific knowledge about content and table templates. We believe that, in a future step, this approach can become the basis for a new way of large-scale knowledge acquisition from the current "Visual Web."

show abstract

“…natural form by discovering and interpreting patterns in data [2]. It can be applied to textual data [37,42], numeric data [40,41], spatial data [43,45], web data [39] and visual data [38] as well.…”

Section: Introductionmentioning

confidence: 99%

The Soar of cognitive architectures

Butt

Mazhar

et al. 2013

2013 International Conference on Current Trends in Information Technology (CTIT)

View full text Add to dashboard Cite

this paper presents a review of "How AI, cognitive science and DM are combined to develop intelligent agents", and how the paradigm first shifted from AI to Data mining and then towards combination of data mining and artificial intelligence. The paper will also provide a state-of-the-art account of the cognitive architectures. It also gives a detailed comparative study of all the architectures discussed in the paper. All the survey of data mining and cognitive architecture is done w.r.t Multi agent systems. Therefore, paper will also provide a bird eye view of MAS! ABMS.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Visual information extraction

Cited by 22 publications

References 17 publications

Aggregated search

Aggregated search

Towards domain-independent information extraction from web tables

The Soar of cognitive architectures

Contact Info

Product

Resources

About