2009
DOI: 10.1016/j.datak.2009.02.010
|View full text |Cite
|
Sign up to set email alerts
|

Automatic hidden-web table interpretation, conceptualization, and semantic annotation

Abstract: The longstanding problem of automatic table interpretation still illudes us. Its solution would not only be an aid to table processing applications such as large volume table conversion, but would also be an aid in solving related problems such as information extraction, semantic annotation, and semi-structured data management. In this paper, we offer a solution for the common special case in which so-called sibling pages are available. The sibling pages we consider are pages on the hidden web, commonly genera… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

2009
2009
2022
2022

Publication Types

Select...
4
2

Relationship

3
3

Authors

Journals

citations
Cited by 35 publications
(21 citation statements)
references
References 36 publications
0
18
0
Order By: Relevance
“…We have also done some work on automated extractionontology construction [32,24,33,34] and some work on free-form query processing [36,2]. We nevertheless still have much work to do, even on fundamental WoK components such as creating a sharable data-frame library, constructing data frames for relationship sets, finding ways to more easily produce instance recognizers, reverse-engineering of many genres of semi-structured sources to extraction ontologies, enhancing query processing, incorporating reasoning, and addressing performance scalability.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…We have also done some work on automated extractionontology construction [32,24,33,34] and some work on free-form query processing [36,2]. We nevertheless still have much work to do, even on fundamental WoK components such as creating a sharable data-frame library, constructing data frames for relationship sets, finding ways to more easily produce instance recognizers, reverse-engineering of many genres of semi-structured sources to extraction ontologies, enhancing query processing, incorporating reasoning, and addressing performance scalability.…”
Section: Resultsmentioning
confidence: 99%
“…We have shown elsewhere that we can automatically construct an ontology for the site (and any other site with sibling tables) and extract the information in all the tables to populate the ontology [32]. Third we can create extraction ontologies automatically (although they likely need some enhancement) [32]. Fourth we can turn the process around and let users specify ontologies via nested forms [33].…”
Section: Theorem 2 Let S Be a Nested Table With A Single Label Path mentioning
confidence: 99%
See 1 more Smart Citation
“…Input tables were matched with known conceptualizations in an attempt to interpret them in [56]. Information extraction from sibling tables with identical headers was demonstrated in [57]. A taxonomy of tables based on the geometric relationship of tabular structures to isothetic tessellations and to X-Y trees was proposed in [58], a machine learning approach to segmentation of grid tables in [59], and algorithms for turning web tables into relational tables by recovering and factoring header paths in [60].…”
Section: Our Earlier Workmentioning
confidence: 99%
“…A user can then modify the form, 9 if desired, and use it to harvest information. We have implemented this reverse-engineering of tables into FOCIH forms based on a system called TISP (Table Interpretation for Sibling Pages) [29,30]. TISP converts tables from sites like hidden-web sites that have machinegenerated sibling pages into FOCIH forms and thus into FOCIH-generated ontologies.…”
Section: Further Reduction Of Labor-intensive Tasksmentioning
confidence: 99%