Discriminating Meaningful Web Tables from Decorative Tables Using a Composite Kernel

Jeong-woo, Son; Lee, Jaean; Park, Seong-Bae; Song, He Sun; Lee, Sang-Jo

doi:10.1109/wiiat.2008.241

Cited by 11 publications

(15 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, this is the obstacle of their approach. Jeong-Woo Son et al [5] have proposed an approach to discriminate web tables using a composite kernel which combines a parse tree kernel and a linear kernel. They proposed three kinds of features to capture both kinds of web table information which is composed of structural and content ones.…”

Section: Structure-basedmentioning

confidence: 99%

Information extraction from web tables

Shaker

Ibrahim

Abdullah

2009

Proceedings of the 11th International Conference on Information Integration and Web-Based Applications &Amp; Services

View full text Add to dashboard Cite

Nowadays, many users use web search engines to find and gather information. User faces an increasing amount of various web pages information sources. The issue of correlating, integrating and presenting related information to users becomes important. When a user uses a search engine such as Yahoo and Google to seek a specific information, the results are not only information about the availability of the desired information, but also information about other pages on which the desired information is mentioned. Extracting information from the web pages also becomes very important because the massive and increasing amount of diverse web pages information sources in the Internet that are available to users, and the variety of web pages making the process of information extraction from web a challenging problem. This paper proposes an approach for extracting information from web tables based on standard classifications. The proposed approach consists of four main phases, namely: (i) pre-processing, (ii) extraction, (iii) classification, and (iv) simplification. The proposed approach is evaluated by conducting experiments on a number of web pages from the Nokia products domain, as to the best of our knowledge this is the only product that has complete and complex standard classifiers.

show abstract

Section: Structure-basedmentioning

confidence: 99%

Information extraction from web tables

Shaker

Ibrahim

Abdullah

2009

Proceedings of the 11th International Conference on Information Integration and Web-Based Applications &Amp; Services

View full text Add to dashboard Cite

show abstract

“…Figure 1 illustrates the taxonomy of Information Extraction which consists of different type of data as input and the approaches that have been proposed for extracting information from semistructured data. The web tables provide more organized information, summarized information, and conciseness in expressing knowledge (Jeong-Woo Son et al 2008). Therefore, focus is given more on the structure-based which is the main focus of this chapter.…”

Section: Concepts Of Information Extraction (Ie)mentioning

confidence: 99%

“…They argued that there is a need to divide a web page into information blocks or several segments before organizing the content into hierarchical groups and during this process (partition a web page) some of the attribute labels of values may be missing. Structure-based: The structure based approaches employ assumptions about the general structure of tables (i.e., <TABLE> tags) on the web pages (Wolfgang Gatterbauer et al 2007;Jeong-Woo Son et al 2008). Wolfgang Gatterbauer et al (2007) have proposed an approach for extracting information from web tables.…”

Section: Semantic-basedmentioning

confidence: 99%

“…The task of extracting web tables is formulated as the task of (i) finding all frames for a given web page, (ii) discerning those which adhere to the definition of tables where a 2-D grid is semantically significant from lists and other frames intended for nonrelational layout purposes, (iii) transferring the content into a topological grid description in which logical cells are flush with neighboring cells and their spatial relations are explicit. Jeong-Woo Son et al (2008) have proposed an approach to discriminate web tables using a composite kernel which combines a parse tree kernel and a linear kernel. They proposed three kinds of features to capture both kinds of web table information which is composed of structural and content ones.…”

Section: Semantic-basedmentioning

confidence: 99%

See 1 more Smart Citation

A Framework for Extracting Information from Semi-Structured Web Data Sources

Shaker

Ibrahim

Abdullah

2008

2008 Third International Conference on Convergence and Hybrid Information Technology

View full text Add to dashboard Cite

“…For example, table tags exist in HTML, but they are often used for formatting web page layout. Previous work focused on detecting tables from PDF, HTML and ASCII documents using Optical Character Recognition [13], machine learning algorithms such as C4.5 decision trees [17] or SVM [22,19], and heuristics [26].…”

Section: Introductionmentioning

confidence: 99%

Disentangling the Structure of Tables in Scientific Literature

Milošević

Gregson

Hernandez

et al. 2016

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Within the scientific literature, tables are commonly used to present factual and statistical information in a compact way, which is easy to digest by readers. The ability to "understand" the structure of tables is key for information extraction in many domains. However, the complexity and variety of presentation layouts and value formats makes it difficult to automatically extract roles and relationships of table cells. In this paper, we present a model that structures tables in a machine readable way and a methodology to automatically disentangle and transform tables into the modelled data structure. The method was tested in the domain of clinical trials: it achieved an F-score of 94.26% for cell function identification and 94.84% for identification of inter-cell relationships.

show abstract

Discriminating Meaningful Web Tables from Decorative Tables Using a Composite Kernel

Cited by 11 publications

References 6 publications

Information extraction from web tables

Information extraction from web tables

A Framework for Extracting Information from Semi-Structured Web Data Sources

Disentangling the Structure of Tables in Scientific Literature

Contact Info

Product

Resources

About