2021
DOI: 10.48550/arxiv.2110.04845
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

What Makes Sentences Semantically Related: A Textual Relatedness Dataset and Empirical Study

Abstract: The degree of semantic relatedness (or, closeness in meaning) of two units of language has long been considered fundamental to understanding meaning. Automatically determining relatedness has many applications such as question answering and summarization. However, prior NLP work has largely focused on semantic similarity (a subset of relatedness), because of a lack of relatedness datasets. Here for the first time, we introduce a dataset of semantic relatedness for sentence pairs. This dataset, STR-2021, has 5,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(5 citation statements)
references
References 18 publications
1
3
0
Order By: Relevance
“…Interestingly, according to our results, even though STR evaluation does not correlate well with downstream tasks, the positive pairs collected from STR have better quality than STS-B. It also confirms the argument that STR improves the dataset collection process (Abdalla et al, 2021). Benchmarking Results.…”
Section: Results and Analysissupporting
confidence: 77%
See 3 more Smart Citations
“…Interestingly, according to our results, even though STR evaluation does not correlate well with downstream tasks, the positive pairs collected from STR have better quality than STS-B. It also confirms the argument that STR improves the dataset collection process (Abdalla et al, 2021). Benchmarking Results.…”
Section: Results and Analysissupporting
confidence: 77%
“…Reimers et al (2016), Eger et al (2019) and Zhelezniak et al (2019) states current evaluation paradigm for Semantic Textual Similarity (STS) tasks are not ideal. One most recent work (Abdalla et al, 2021) questions about the data collection process of STS datasets and creates a new semantic relatedness dataset (STR) by comparative annotations (Louviere and Woodworth, 1991).…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…We ran pilots for obtaining importance annotations for each sentence in a contract, as well as a pair of sentences on a scale of 0-5 taking inspiration from prior works (Sakaguchi et al, 2014;Sakaguchi and Van Durme, 2018) but found they had a poor agreement (see details in A.1). Thus, following Abdalla et al (2021), we use Best-Worst Scaling (BWS), a comparative annotation schema, which builds on pairwise comparisons and does not require N 2 labels. Annotators are presented with n=4 sentences from a contract and a party, and are instructed to choose the best (i.e., most important) and worst (i.e., least important) sentence.…”
Section: Dataset Curationmentioning
confidence: 99%