Why Don’t You Do It Right? Analysing Annotators’ Disagreement in Subjective Tasks

Sandri, Marta; Leonardelli, Elisa; Tonelli, Sara; Jezek, Elisabetta

doi:10.18653/v1/2023.eacl-main.178

Cited by 3 publications

(1 citation statement)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other domain tasks are transferable to NLI. Our work can be expanded to test LLMs on other NLP applications (Plank, 2022) such as Question Answering (De Marneffe et al, 2019), Fact Verification (Thorne et al, 2018), and Toxic Language Detection (Schmidt and Wiegand, 2017;Sandri et al, 2023). Further, our method can be applied for tasks that contain disagreements since they are easily transferable to NLI tasks (Dagan et al, 2006) like the QNLI dataset from Table 2, for example, instead of directly asking controversial questions (e.g., abortion) to the model (Santurkar et al, 2023), the question format can be modified into a declarative statement in the premise and place a possible answer in the hypothesis with a binary True/False label (Dagan et al, 2006).…”

Section: Discussionmentioning

confidence: 99%

Can Large Language Models Capture Dissenting Human Voices?

Lee,

An,

Thorne

2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Large language models (LLMs) have shown impressive achievements in solving a broad range of tasks. Augmented by instruction fine-tuning, LLMs have also been shown to generalize in zero-shot settings as well. However, whether LLMs closely align with the human disagreement distribution has not been well-studied, especially within the scope of natural language inference (NLI). In this paper, we evaluate the performance and alignment of LLM distribution with humans using two different techniques to estimate the multinomial distribution: Monte Carlo Estimation (MCE) and Log Probability Estimation (LPE). As a result, we show LLMs exhibit limited ability in solving NLI tasks and simultaneously fail to capture human disagreement distribution. The inference and human alignment performances plunge even further on data samples with high human disagreement levels, raising concerns about their natural language understanding (NLU) ability and their representativeness to a larger human population. 1

show abstract

Section: Discussionmentioning

confidence: 99%

Can Large Language Models Capture Dissenting Human Voices?

Lee,

An,

Thorne

2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

Unraveling Disagreement Constituents in Hateful Speech

Rizzi,

Astorino,

Rosso

et al. 2024

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Federated Learning for Exploiting Annotators’ Disagreements in Natural Language Processing

Rodríguez-Barroso,

Cámara,

Collados

et al. 2024

Transactions of the Association for Computational Linguistics

View full text Add to dashboard Cite

The annotation of ambiguous or subjective NLP tasks is usually addressed by various annotators. In most datasets, these annotations are aggregated into a single ground truth. However, this omits divergent opinions of annotators, hence missing individual perspectives. We propose FLEAD (Federated Learning for Exploiting Annotators’ Disagreements), a methodology built upon federated learning to independently learn from the opinions of all the annotators, thereby leveraging all their underlying information without relying on a single ground truth. We conduct an extensive experimental study and analysis in diverse text classification tasks to show the contribution of our approach with respect to mainstream approaches based on majority voting and other recent methodologies that also learn from annotator disagreements.

show abstract

Why Don’t You Do It Right? Analysing Annotators’ Disagreement in Subjective Tasks

Cited by 3 publications

References 38 publications

Can Large Language Models Capture Dissenting Human Voices?

Can Large Language Models Capture Dissenting Human Voices?

Unraveling Disagreement Constituents in Hateful Speech

Federated Learning for Exploiting Annotators’ Disagreements in Natural Language Processing

Contact Info

Product

Resources

About