What Makes Sentences Semantically Related: A Textual Relatedness Dataset and Empirical Study

Abdalla, Mahmoud A.; Vishnubhotla, Krishnapriya; Mohammad, Saif M.

doi:10.48550/arxiv.2110.04845

Cited by 2 publications

(5 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Interestingly, according to our results, even though STR evaluation does not correlate well with downstream tasks, the positive pairs collected from STR have better quality than STS-B. It also confirms the argument that STR improves the dataset collection process (Abdalla et al, 2021). Benchmarking Results.…”

Section: Results and Analysissupporting

confidence: 77%

“…Reimers et al (2016), Eger et al (2019) and Zhelezniak et al (2019) states current evaluation paradigm for Semantic Textual Similarity (STS) tasks are not ideal. One most recent work (Abdalla et al, 2021) questions about the data collection process of STS datasets and creates a new semantic relatedness dataset (STR) by comparative annotations (Louviere and Woodworth, 1991).…”

Section: Related Workmentioning

confidence: 99%

“…Sentence-Level. Similarly, the pairs with top 25% similarity/relatedness score from STS-Benchmark dataset (Cer et al, 2017) and STR dataset (Abdalla et al, 2021) are collected as positive pairs. All sentences that appear at least once are used as the background sentence samples.…”

Section: Dataset Collectionmentioning

confidence: 99%

“…). Recently,Abdalla et al (2021) questioned the labeling process of STS datasets and released a new semantic textual relatedness (STR) dataset, which is also included in our experiments.…”

mentioning

confidence: 99%

See 3 more Smart Citations

Just Rank: Rethinking Evaluation with Word and Sentence Similarities

Wang¹,

Kuo²,

Li³

2022

Preprint

View full text Add to dashboard Cite

Word and sentence embeddings are useful feature representations in natural language processing. However, intrinsic evaluation for embeddings lags far behind, and there has been no significant update since the past decade. Word and sentence similarity tasks have become the de facto evaluation method. It leads models to overfit to such evaluations, negatively impacting embedding models' development. This paper first points out the problems using semantic similarity as the gold standard for word and sentence embedding evaluations. Further, we propose a new intrinsic evaluation method called EvalRank, which shows a much stronger correlation with downstream tasks. Extensive experiments are conducted based on 60+ models and popular datasets to certify our judgments. Finally, the practical evaluation toolkit is released for future benchmarking purposes. 1

show abstract

Section: Results and Analysissupporting

confidence: 77%

Section: Related Workmentioning

confidence: 99%

Section: Dataset Collectionmentioning

confidence: 99%

“…). Recently,Abdalla et al (2021) questioned the labeling process of STS datasets and released a new semantic textual relatedness (STR) dataset, which is also included in our experiments.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Just Rank: Rethinking Evaluation with Word and Sentence Similarities

Wang¹,

Kuo²,

Li³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…We ran pilots for obtaining importance annotations for each sentence in a contract, as well as a pair of sentences on a scale of 0-5 taking inspiration from prior works (Sakaguchi et al, 2014;Sakaguchi and Van Durme, 2018) but found they had a poor agreement (see details in A.1). Thus, following Abdalla et al (2021), we use Best-Worst Scaling (BWS), a comparative annotation schema, which builds on pairwise comparisons and does not require N 2 labels. Annotators are presented with n=4 sentences from a contract and a party, and are instructed to choose the best (i.e., most important) and worst (i.e., least important) sentence.…”

Section: Dataset Curationmentioning

confidence: 99%

What to Read in a Contract? Party-Specific Summarization of Legal Obligations, Entitlements, and Prohibitions

Sancheti,

Garimella,

Srinivasan

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Reviewing and comprehending key obligations, entitlements, and prohibitions in legal contracts can be a tedious task due to their length and domain-specificity. Furthermore, the key rights and duties requiring review vary for each contracting party. In this work, we propose a new task of party-specific extractive summarization for legal contracts to facilitate faster reviewing and improved comprehension of rights and duties. To facilitate this, we curate a dataset comprising of party-specific pairwise importance comparisons annotated by legal experts, covering ∼293K sentence pairs that include obligations, entitlements, and prohibitions extracted from lease agreements. Using this dataset, we train a pairwise importance ranker and propose a pipeline-based extractive summarization system that generates a party-specific contract summary. We establish the need for incorporating domain-specific notion of importance during summarization by comparing our system against various baselines using both automatic and human evaluation methods 1 .

show abstract

What Makes Sentences Semantically Related: A Textual Relatedness Dataset and Empirical Study

Cited by 2 publications

References 18 publications

Just Rank: Rethinking Evaluation with Word and Sentence Similarities

Just Rank: Rethinking Evaluation with Word and Sentence Similarities

What to Read in a Contract? Party-Specific Summarization of Legal Obligations, Entitlements, and Prohibitions

Contact Info

Product

Resources

About