2018
DOI: 10.48550/arxiv.1802.06290
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

TabVec: Table Vectors for Classification of Web Tables

Abstract: There are hundreds of millions of tables in Web pages that contain useful information for many applications. Leveraging data within these tables is di cult because of the wide variety of structures, formats and data encoded in these tables. TabVec is an unsupervised method to embed tables into a vector space to support classi cation of tables into categories (entity, relational, matrix, list, and nondata) with minimal user intervention. TabVec deploys syntax and semantics of table cells, and embeds the structu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 19 publications
0
3
0
Order By: Relevance
“…By using cost-sensitive classification and assign larger costs to positive class than to negative, it is possible to improve these results. We managed to improve the F1-scores by almost 10% (see Tables 11,12,13).…”
Section: Table 11mentioning
confidence: 98%
See 1 more Smart Citation
“…By using cost-sensitive classification and assign larger costs to positive class than to negative, it is possible to improve these results. We managed to improve the F1-scores by almost 10% (see Tables 11,12,13).…”
Section: Table 11mentioning
confidence: 98%
“…Vector representations that would involve both context and structure of the article element should be explored in the future. The first attempt to do so was performed by [13]; however, this approach is limited to classification of table clusters, with no relation to other tasks where vector representation may be helpful. Also, the performance of information extraction using recurrent neural networks in combination with the mentioned representation model should be further explored in the future.…”
Section: Explore Table Representations For Deep Learningmentioning
confidence: 99%
“…Milosevic et al [39] tested methods for extracting numerical (number of patients, age, gender distribution) and textual (adverse reactions) information from tables identified by the < table > tag in the clinical literature as XML articles. Further, another line of work examined the classification of tables from HTML pages as entity, relational, matrix, list, and nondata leveraging specialized table embeddings called TabVec [21]. Wei et al [46] defined a question answering task with data in Table cells as the answers over two different datasets, i.e.…”
Section: Digitalization Based On Table Miningmentioning
confidence: 99%