Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021
DOI: 10.18653/v1/2021.emnlp-main.343
|View full text |Cite
|
Sign up to set email alerts
|

WebSRC: A Dataset for Web-Based Structural Reading Comprehension

Abstract: Web search is an essential way for humans to obtain information, but it's still a great challenge for machines to understand the contents of web pages. In this paper, we introduce the task of structural reading comprehension (SRC) on web. Given a web page and a question about it, the task is to find the answer from the web page. This task requires a system not only to understand the semantics of texts but also the structure of the web page. Moreover, we proposed Web-SRC, a novel Web-based Structural Reading Co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 26 publications
(25 citation statements)
references
References 43 publications
0
25
0
Order By: Relevance
“…This section provides discussion that connects WebFormer with previous methods as well as the limitations of our model. If we treat HTML tags as additional text tokens, and combine with the text into a single sequence without the H2H, H2T and T2H attentions, our model architecture degenerates to the sequence modeling approaches [9,51] that serialize the HTML layout. If we further trim the HTML from the sequence, our model is regressed to the sequence model [47] that only uses the text information.…”
Section: Discussionmentioning
confidence: 99%
See 4 more Smart Citations
“…This section provides discussion that connects WebFormer with previous methods as well as the limitations of our model. If we treat HTML tags as additional text tokens, and combine with the text into a single sequence without the H2H, H2T and T2H attentions, our model architecture degenerates to the sequence modeling approaches [9,51] that serialize the HTML layout. If we further trim the HTML from the sequence, our model is regressed to the sequence model [47] that only uses the text information.…”
Section: Discussionmentioning
confidence: 99%
“…Recently, there has been an increasing number of works that develop natural language models with sequence modeling [9,20,26,30,34,61] for web information extraction. Zheng et al [59] develop an end-to-end tagging model utilizing BiLSTM, CRF, and attention mechanism without any dictionary.…”
Section: Related Work 21 Information Extractionmentioning
confidence: 99%
See 3 more Smart Citations