Improving Human-Labeled Data through Dynamic Automatic Conflict Resolution

Sun, David Q.; Kotek, Hadas; Klein, Christopher C.; Gupta, Mayank; Williams, Jason D.

doi:10.18653/v1/2020.coling-main.316

Cited by 9 publications

(5 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [26], several important aspects of data labeling are discussed. The debate surrounding expert annotators versus non-expert annotators is highlighted, with arguments for the effectiveness of both approaches.…”

Section: Relevant Researchmentioning

confidence: 99%

Data Sorting Influence on Short Text Manual Labeling Quality for Hierarchical Classification

Narushynska,

Teslyuk,

Doroshenko

et al. 2024

BDCC

View full text Add to dashboard Cite

The precise categorization of brief texts holds significant importance in various applications within the ever-changing realm of artificial intelligence (AI) and natural language processing (NLP). Short texts are everywhere in the digital world, from social media updates to customer reviews and feedback. Nevertheless, short texts’ limited length and context pose unique challenges for accurate classification. This research article delves into the influence of data sorting methods on the quality of manual labeling in hierarchical classification, with a particular focus on short texts. The study is set against the backdrop of the increasing reliance on manual labeling in AI and NLP, highlighting its significance in the accuracy of hierarchical text classification. Methodologically, the study integrates AI, notably zero-shot learning, with human annotation processes to examine the efficacy of various data-sorting strategies. The results demonstrate how different sorting approaches impact the accuracy and consistency of manual labeling, a critical aspect of creating high-quality datasets for NLP applications. The study’s findings reveal a significant time efficiency improvement in terms of labeling, where ordered manual labeling required 760 min per 1000 samples, compared to 800 min for traditional manual labeling, illustrating the practical benefits of optimized data sorting strategies. Comparatively, ordered manual labeling achieved the highest mean accuracy rates across all hierarchical levels, with figures reaching up to 99% for segments, 95% for families, 92% for classes, and 90% for bricks, underscoring the efficiency of structured data sorting. It offers valuable insights and practical guidelines for improving labeling quality in hierarchical classification tasks, thereby advancing the precision of text analysis in AI-driven research. This abstract encapsulates the article’s background, methods, results, and conclusions, providing a comprehensive yet succinct study overview.

show abstract

Section: Relevant Researchmentioning

confidence: 99%

Data Sorting Influence on Short Text Manual Labeling Quality for Hierarchical Classification

Narushynska,

Teslyuk,

Doroshenko

et al. 2024

BDCC

View full text Add to dashboard Cite

show abstract

“…Finally, there has been considerable work on measuring and rectifying inaccuracies in human annotation (Sun et al, 2020;Wei and Jia, 2021;Gladkoff et al, 2021;Paun et al, 2018). We sidestep this issue by aiming to predict the performance of a single human rater, assuming that if this can be done accurately, conflicts among raters can be resolved in a post-processing step.…”

Section: Chaganty Et Al (2018) Pioneered Control Variatesmentioning

confidence: 99%

Toward More Effective Human Evaluation for Machine Translation

Saldías¹,

Foster²,

Freitag³

et al. 2022

Preprint

View full text Add to dashboard Cite

Improvements in text generation technologies such as machine translation have necessitated more costly and time-consuming human evaluation procedures to ensure an accurate signal. We investigate a simple way to reduce cost by reducing the number of text segments that must be annotated in order to accurately predict a score for a complete test set. Using a sampling approach, we demonstrate that information from document membership and automatic metrics can help improve estimates compared to a pure random sampling baseline. We achieve gains of up to 20% in average absolute error by leveraging stratified sampling and control variates. Our techniques can improve estimates made from a fixed annotation budget, are easy to implement, and can be applied to any problem with structure similar to the one we study.

show abstract

“…Collecting ground truth data ("gold" datasets) is time consuming and expensive, and sometimes in-volves heavy engineering efforts (Sun et al, 2020). The confidence score generated by our model offers the potential to perform large-scale evaluations of annotation tasks.…”

Section: Introductionmentioning

confidence: 99%

CML: A Contrastive Meta Learning Method to Estimate Human Label Confidence Scores and Reduce Data Collection Cost

Dong¹,

Wang²,

Sun³

et al. 2022

Proceedings of the Fifth Workshop on E-Commerce and NLP (ECNLP 5)

View full text Add to dashboard Cite

Deep neural network models are especially susceptible to noise in annotated labels. In the real world, annotated data typically contains noise caused by a variety of factors such as task difficulty, annotator experience, and annotator bias. Label quality is critical for label validation tasks; however, correcting for noise by collecting more data is often costly. In this paper, we propose a contrastive meta-learning framework (CML) to address the challenges introduced by noisy annotated data, specifically in the context of natural language processing. CML combines contrastive and meta learning to improve the quality of text feature representations. Meta-learning is also used to generate confidence scores to assess label quality. We demonstrate that a model built on CML-filtered data outperforms a model built on clean data. Furthermore, we perform experiments on deidentified commercial voice assistant datasets and demonstrate that our model outperforms several SOTA approaches.

show abstract

Improving Human-Labeled Data through Dynamic Automatic Conflict Resolution

Cited by 9 publications

References 23 publications

Data Sorting Influence on Short Text Manual Labeling Quality for Hierarchical Classification

Data Sorting Influence on Short Text Manual Labeling Quality for Hierarchical Classification

Toward More Effective Human Evaluation for Machine Translation

CML: A Contrastive Meta Learning Method to Estimate Human Label Confidence Scores and Reduce Data Collection Cost

Contact Info

Product

Resources

About