Proceedings of the Web Conference 2021 2021
DOI: 10.1145/3442381.3450090
|View full text |Cite
|
Sign up to set email alerts
|

TCN: Table Convolutional Network for Web Table Interpretation

Abstract: Information extraction from semi-structured webpages provides valuable long-tailed facts for augmenting knowledge graph. Relational Web tables are a critical component containing additional entities and attributes of rich and diverse knowledge. However, extracting knowledge from relational tables is challenging because of sparse contextual information. Existing work linearize table cells and heavily rely on modifying deep language models such as BERT which only captures related cells information in the same ta… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 22 publications
(11 citation statements)
references
References 42 publications
(36 reference statements)
0
11
0
Order By: Relevance
“…Cell Value Recovery (CVR) objective used in TaBERT [101] applies the span-based prediction objective to deal with multiple value tokens. In TCN [93], each token represents each cell, so they randomly mask 10% of table cells beforehand for recovery from the set of all cell values. • Cloze: Cell-level cloze used in TUTA [92] samples cell strings based on the bi-tree structure as candidate choices, and at each blanked position, encourages the model to retrieve its corresponding cell string.…”
Section: Token-level Most Pre-training Models Use Token Mlmmentioning
confidence: 99%
“…Cell Value Recovery (CVR) objective used in TaBERT [101] applies the span-based prediction objective to deal with multiple value tokens. In TCN [93], each token represents each cell, so they randomly mask 10% of table cells beforehand for recovery from the set of all cell values. • Cloze: Cell-level cloze used in TUTA [92] samples cell strings based on the bi-tree structure as candidate choices, and at each blanked position, encourages the model to retrieve its corresponding cell string.…”
Section: Token-level Most Pre-training Models Use Token Mlmmentioning
confidence: 99%
“…Temporarily, they show much more promising performances than RNN/LSTM based models. However, except few models, feature extractors of others are still limited by the old stereotype that the input sequence length shall be kept invariant throughout the whole network [2], [5], [7]. Even those which do shorten the input sequence length make some 'compensations' for this behavior [6].…”
Section: Invariance Of Input Sequence Length As Features?mentioning
confidence: 99%
“…Baselines we select contain end-to-end models, contrastive learning based models (CoST [18], TS2Vec [22], TNC [24], MoCo [51], Triplet [52], CPC [53], TST [54], TCC [55]) and a feature engineered model (TSFresh package). Endto-end models include traditional time series forecasting models (ARIMA [20], [21], Prophet [56], N-BEATS [17]), CNN (SCINet [1], TCN [2]), RNN (LSTNet [5], DeepAR [4], LSTMa [57]), Transformer (LogTrans [7], Informer [6], Reformer [39]) and GNN (StemGNN [9]). Most of the results are taken from other papers [6], [18], [22] and the rest of them are compensated by us using unified settings for fair comparison.…”
Section: Comparison Experimentsmentioning
confidence: 99%
“…Chen et al [9] formulate the web information extraction problem as structural reading comprehension and build a BERT [15] based model to extract structured fields from the web documents. It is worth mentioning that there are also methods that work on multimodal information extraction [44,45,48,55], which focus on extracting the field information from the visual layout or the rendered HTML of the web documents.…”
Section: Related Work 21 Information Extractionmentioning
confidence: 99%