“…In particular, HTML tables and HTML lists are known to contain relational data. -Decoration, visual appearance [Aumann et al 2006;Meng et al 2003;Yoshinaga and Torisawa 2007]: Sometimes the structure of a document is easier to learn through its visual aspects, especially when a pattern in terms of tags is difficult to define or learn. -PMI and search hits [Church and Hanks 1989;Turney 2001;: Pointwise Mutual Information (PMI) is a statistical measure that indicates possible correlation between two expressions.…”
Section: Relation Retrieval: How To Acquire Relations?mentioning
Traditional search engines return ranked lists of search results. It is up to the user to scroll this list, scan within different documents, and assemble information that fulfill his/her information need. Aggregated search represents a new class of approaches where the information is not only retrieved but also assembled. This is the current evolution in Web search, where diverse content (images, videos, etc.) and relational content (similar entities, features) are included in search results. In this survey, we propose a simple analysis framework for aggregated search and an overview of existing work. We start with related work in related domains such as federated search, natural language generation, and question answering. Then we focus on more recent trends, namely cross vertical aggregated search and relational aggregated search, which are already present in current Web search.
“…In particular, HTML tables and HTML lists are known to contain relational data. -Decoration, visual appearance [Aumann et al 2006;Meng et al 2003;Yoshinaga and Torisawa 2007]: Sometimes the structure of a document is easier to learn through its visual aspects, especially when a pattern in terms of tags is difficult to define or learn. -PMI and search hits [Church and Hanks 1989;Turney 2001;: Pointwise Mutual Information (PMI) is a statistical measure that indicates possible correlation between two expressions.…”
Section: Relation Retrieval: How To Acquire Relations?mentioning
Traditional search engines return ranked lists of search results. It is up to the user to scroll this list, scan within different documents, and assemble information that fulfill his/her information need. Aggregated search represents a new class of approaches where the information is not only retrieved but also assembled. This is the current evolution in Web search, where diverse content (images, videos, etc.) and relational content (similar entities, features) are included in search results. In this survey, we propose a simple analysis framework for aggregated search and an overview of existing work. We start with related work in related domains such as federated search, natural language generation, and question answering. Then we focus on more recent trends, namely cross vertical aggregated search and relational aggregated search, which are already present in current Web search.
“…Zhao et al [41], Zhai and Liu [40] and Simon and Lausen [28] describe different approaches for detecting repetitive patterns on web pages, which are predominantly source-code based and enhanced with visual cues. In contrast, Aumann et al [3] describe a system that works only on a hierarchical structure of the visual representation (experiments are performed with PDF documents) and learns to recognize text fields such as author or title from manually tagged training sets of documents. Conversely, our approach does not attempt to find individual text fields, but rather, larger structures, does not require training sets and neither imposes a tree structure on web pages.…”
Traditionally, information extraction from web tables has focused on small, more or less homogeneous corpora, often based on assumptions about the use of
tags. A multitude of different HTML implementations of web tables make these approaches difficult to scale. In this paper, we approach the problem of domain-independent information extraction from web tables by shifting our attention from the tree-based representation of web pages to a variation of the two-dimensional visual box model used by web browsers to display the information on the screen. The thereby obtained topological and style information allows us to fill the gap created by missing domain-specific knowledge about content and table templates. We believe that, in a future step, this approach can become the basis for a new way of large-scale knowledge acquisition from the current "Visual Web."
“…natural form by discovering and interpreting patterns in data [2]. It can be applied to textual data [37,42], numeric data [40,41], spatial data [43,45], web data [39] and visual data [38] as well.…”
this paper presents a review of "How AI, cognitive science and DM are combined to develop intelligent agents", and how the paradigm first shifted from AI to Data mining and then towards combination of data mining and artificial intelligence. The paper will also provide a state-of-the-art account of the cognitive architectures. It also gives a detailed comparative study of all the architectures discussed in the paper. All the survey of data mining and cognitive architecture is done w.r.t Multi agent systems. Therefore, paper will also provide a bird eye view of MAS! ABMS.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.