Automatic hidden-web table interpretation, conceptualization, and semantic annotation

Lecture Notes in Computer Science

Zitzelberger

2010

Self Cite

Abstract. The current web is a web of linked pages. Frustrated users search for facts by guessing which keywords or keyword phrases might lead them to pages where they can find facts. Can we make it possible for users to search directly for facts embedded in web pages? Instead of a web of human-readable pages containing machine-inaccessible facts, can the web be a web of machine-accessible facts superimposed over a web of human-readable pages? Ultimately, can the web be a web of knowledge that can provide direct answers to factual questions and support these answers by referencing and highlighting relevant base facts embedded in source pages? Answers to these questions call for distilling knowledge from the web's wealth of heterogeneous digital data into a web of knowledge. But how? Or, even more fundamentally, what, precisely, is this web of knowledge, and what is required to enable it? To answer these questions, we proffer a theoretical foundation for a web of knowledge: We formally define a computational view of knowledge in a way that enables practical construction and use of a web of knowledge.

Section: Resultsmentioning

confidence: 99%

Section: Theorem 2 Let S Be a Nested Table With A Single Label Path mentioning

confidence: 99%

Section: Theorem 2 Let S Be a Nested Table With A Single Label Path mentioning

confidence: 99%

See 1 more Smart Citation

Theoretical Foundations for Enabling a Web of Knowledge

Lecture Notes in Computer Science

Zitzelberger

2010

Self Cite

“…Input tables were matched with known conceptualizations in an attempt to interpret them in [56]. Information extraction from sibling tables with identical headers was demonstrated in [57]. A taxonomy of tables based on the geometric relationship of tabular structures to isothetic tessellations and to X-Y trees was proposed in [58], a machine learning approach to segmentation of grid tables in [59], and algorithms for turning web tables into relational tables by recovering and factoring header paths in [60].…”

Section: Our Earlier Workmentioning

confidence: 99%

Converting heterogeneous statistical tables on the web to searchable databases

Krishnamoorthy

Nagy

et al. 2016

IJDAR

Self Cite

Much of the world's quantitative data resides in scattered web tables. For a meaningful role in Big Data analytics, the facts reported in these tables must be brought into a uniform framework. Based on a formalization of header-indexed tables, we proffer an algorithmic solution to end-to-end table processing for a large class of humanreadable tables. The proposed algorithms transform headerindexed tables to a category table format that maps easily to a variety of industry-standard data stores for query processing. The algorithms segment table regions based on the unique indexing of the data region by header paths, classify table cells, and factor header category structures of two-dimensional as well as the less common multidimensional tables. Experimental evaluations substantiate the algorithmic approach to processing heterogeneous tables. As demonstrable results, the algorithms generate queryable relational database tables and semantic-web triple stores. Application of our algorithms to 400 web tables randomly selected from diverse sources shows that the algorithmic solution automates end-to-end table processing. document analysis table segmentation table analysis table header factoring end-to-end table processing· table headers queries over table data ___________________________________________ Keywords

“…A user can then modify the form, 9 if desired, and use it to harvest information. We have implemented this reverse-engineering of tables into FOCIH forms based on a system called TISP (Table Interpretation for Sibling Pages) [29,30]. TISP converts tables from sites like hidden-web sites that have machinegenerated sibling pages into FOCIH forms and thus into FOCIH-generated ontologies.…”

Section: Further Reduction Of Labor-intensive Tasksmentioning

confidence: 99%

FOCIH: Form-Based Ontology Creation and Information Harvesting

Tao

Conceptual Modeling - ER 2009

Liddle

2009

Self Cite

Abstract.Creating an ontology and populating it with data are both labor-intensive tasks requiring a high degree of expertise. Thus, scaling ontology creation and population to the size of the web in an effort to create a web of data-which some see as Web 3.0-is prohibitive. Can we find ways to streamline these tasks and lower the barrier enough to enable Web 3.0? Toward this end we offer a form-based approach to ontology creation that provides a way to create Web 3.0 ontologies without the need for specialized training. And we offer a way to semi-automatically harvest data from the current web of pages for a Web 3.0 ontology. In addition to harvesting information with respect to an ontology, the approach also annotates web pages and links facts in web pages to ontological concepts, resulting in a web of data superimposed over the web of pages. Experience with our prototype system shows that mappings between conceptual-model-based ontologies and forms are sufficient for creating the kind of ontologies needed for Web 3.0, and experiments with our prototype system show that automatic harvesting, automatic annotation, and automatic superimposition of a web of data over a web of pages work well. Keywords: ontology generation from forms, information harvesting from the web, automatic annotation of web pages, web of data, Web 3.0.