Is GPT-3 a Good Data Annotator?

Ding, Bosheng; Qin, Chengwei; Liu, Linlin; Joty, Shafiq; Li, Boyang

doi:10.48550/arxiv.2212.10450

Cited by 16 publications

(20 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Some of the recent research proposes text annotation or label generation using GPTs, (Generative Pretrained Transformers (GPT). The methods based on GPT have made breakthrough changes in automatic labeling tasks for data in supervised machine learning tasks [28][29][30]. With the recent launch of ChatGPT in year 2022, GPT became popular for various natural language processing (NLP) tasks.…”

Section: Limitations Of Various Topic Labeling Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Automatic label curation from large-scale text corpus

Avasthi,

Chauhan

2024

Eng. Res. Express

View full text Add to dashboard Cite

The topic modeling technique extracts themes based on their probabilistic measurements from any large-scale text collection. Even though topic modeling pulls out the most important phrases that describe latent themes in text collections, a suitable label has yet to be found. Interpreting the topics extracted from a text corpus and identifying a suitable label automatically reduces the cognitive load for the analyst. Extractive methods are used typically to select a label from a given candidate set, based on probability metrics for each candidate set. Some of the existing approaches use phrases, words, and images to generate labels using frequency counts of different words in the text. The paper proposes a method to generate labels automatically to represent each topic based on a labeling strategy to filter candidate labels and then apply sequence-to-sequence labelers. The objective of the method is to get a meaningful label for the result of the Latent Dirichlet Allocation algorithm. The BERTScore metric is used to evaluate the effectiveness of the proposed method. The proposed method generates good interpretative labels as compared to baseline models for topic words or terms automatically. The comparison with the label generated through ChatGPT API shows the quality of the generated label with the experiment performed on Four Datasets NIPS, Kindle, PUBMED, and CORD-19.

show abstract

Section: Limitations Of Various Topic Labeling Methodsmentioning

confidence: 99%

“…Limited only to customer complaint data and faces issues of data availability. [28] GPT −3-based approaches have been utilized for sequence and token level NLP tasks, and evaluation was done.…”

Section: Paper Proposed Methods Limitation [3]mentioning

confidence: 99%

Automatic label curation from large-scale text corpus

Avasthi,

Chauhan

2024

Eng. Res. Express

View full text Add to dashboard Cite

show abstract

“…Research exploring GLLMs for data labelling. The research community explored GLLMs for data labelling in a variety of NLP tasks like stance detection [373], [376], political tweets classification [375], sentiment analysis [376], [379], [380], hate speech detection [376], [377], bot detection [376], toxic comments detection [377], offensive comments detection [377], adverse drug reaction extraction [378], text entailment [379], topic classification [379], text generation [379], answer type classification [379], question generation [379], relation extraction [380], named entity recognition [380], [381], text summarization [382], radiology text simplification [324] etc. Most of the research works focused on English datasets, except a few research works focused on other languages like French [381], Spanish [381], Italian [381] and Basque [381].…”

Section: Data Labelling and Data Augmenta-tion Abilities Of Gllms 71 ...mentioning

confidence: 99%

A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4

Kalyan

2023

SSRN Journal

View full text Add to dashboard Cite

“…In the past, the primary research focus was on developing specialized frameworks for specific tasks (Chiu and Nichols, 2016;Liu et al, 2016;Ding et al, 2020;Qin and Joty, 2022a). In recent years, there has been a significant shift in approach towards utilizing powerful, general-purpose language models that can be fine-tuned or prompt-tuned for a wide range of applications (Devlin et al, 2019;Yang et al, 2019;Raffel et al, 2019;Lewis et al, 2019;Brown et al, 2020;Ding et al, 2022b;Qin et al, 2023a). Through pre-training on a large-scale unlabeled corpus, pretrained language models have shown significant improvement in a wide range of NLP tasks (He et al, 2021b;Ding et al, 2022a;Qin et al, 2023b;Zhou et al, 2023).…”

Section: Retrieval-augmented Generationmentioning

confidence: 99%

Retrieving Multimodal Information for Augmented Generation: A Survey

Zhao¹,

Chen²,

Wang³

et al. 2023

Preprint

View full text Add to dashboard Cite

In this survey, we review methods that retrieve multimodal knowledge to assist and augment generative models. This group of works focuses on retrieving grounding contexts from external sources, including images, codes, tables, graphs, and audio. As multimodal learning and generative AI have become more and more impactful, such retrieval augmentation offers a promising solution to important concerns such as factuality, reasoning, interpretability, and robustness. We provide an in-depth review of retrieval-augmented generation in different modalities and discuss potential future directions. As this is an emerging field, we continue to add new papers and methods.

show abstract

Is GPT-3 a Good Data Annotator?

Cited by 16 publications

References 0 publications

Automatic label curation from large-scale text corpus

Automatic label curation from large-scale text corpus

A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4

Retrieving Multimodal Information for Augmented Generation: A Survey

Contact Info

Product

Resources

About