A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4

Kalyan, Katikapalli Subramanyam

doi:10.2139/ssrn.4593895

Cited by 12 publications

(6 citation statements)

References 223 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Additionally, the sheer number of parameters means that the model can potentially generate biased or inappropriate language, depending on the training data and prompts used. Moreover, GPT-3 is a proprietary model developed by OpenAI and is not currently available for download (Kalyan, 2023 ). Therefore, it should be used with caution and with careful prompt optimization to ensure accurate and unbiased results.…”

Section: Methodsmentioning

confidence: 99%

“…The GPT-3 architecture is based on the Transformer model, which was first introduced by Vaswani et al (2017). The Transformer model uses a self-attention mechanism that allows it to process input sequences in parallel, rather than sequentially (Kalyan, 2023). This makes it well-suited for processing long sequences of text, which is important for many NLP tasks, including language generation and sentiment analysis as shown in Figure 4.…”

Section: C) Gpt-3mentioning

confidence: 99%

See 1 more Smart Citation

Sentiment analysis of COP9-related tweets: a comparative study of pre-trained models and traditional techniques

Elmitwalli,

Mehegan

2024

Front. Big Data

View full text Add to dashboard Cite

IntroductionSentiment analysis has become a crucial area of research in natural language processing in recent years. The study aims to compare the performance of various sentiment analysis techniques, including lexicon-based, machine learning, Bi-LSTM, BERT, and GPT-3 approaches, using two commonly used datasets, IMDB reviews and Sentiment140. The objective is to identify the best-performing technique for an exemplar dataset, tweets associated with the WHO Framework Convention on Tobacco Control Ninth Conference of the Parties in 2021 (COP9).MethodsA two-stage evaluation was conducted. In the first stage, various techniques were compared on standard sentiment analysis datasets using standard evaluation metrics such as accuracy, F1-score, and precision. In the second stage, the best-performing techniques from the first stage were applied to partially annotated COP9 conference-related tweets.ResultsIn the first stage, BERT achieved the highest F1-scores (0.9380 for IMDB and 0.8114 for Sentiment 140), followed by GPT-3 (0.9119 and 0.7913) and Bi-LSTM (0.8971 and 0.7778). In the second stage, GPT-3 performed the best for sentiment analysis on partially annotated COP9 conference-related tweets, with an F1-score of 0.8812.DiscussionThe study demonstrates the effectiveness of pre-trained models like BERT and GPT-3 for sentiment analysis tasks, outperforming traditional techniques on standard datasets. Moreover, the better performance of GPT-3 on the partially annotated COP9 tweets highlights its ability to generalize well to domain-specific data with limited annotations. This provides researchers and practitioners with a viable option of using pre-trained models for sentiment analysis in scenarios with limited or no annotated data across different domains.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: C) Gpt-3mentioning

confidence: 99%

Sentiment analysis of COP9-related tweets: a comparative study of pre-trained models and traditional techniques

Elmitwalli,

Mehegan

2024

Front. Big Data

View full text Add to dashboard Cite

show abstract

“…In the past, traditional Seq2Seq frameworks based on RNN for generative models did not exhibit significant advantages in terms of accuracy and efficiency compared to extractive models. It was not until the recent widespread adoption of generative pre-trained models such as UniLM [29], BART [30], T5 [31], and GPT [32] that the development of effective generative information extraction models has gradually emerged as a forefront research direction. Extractive models are more susceptible to schema limitations, while generative models exhibit greater strength in terms of transferability and scalability compared to extractive models.…”

Section: Current Methods Of Information Extractionmentioning

confidence: 99%

RoUIE: A Method for Constructing Knowledge Graph of Power Equipment Based on Improved Universal Information Extraction

Ye,

Qi,

Liu

et al. 2024

Energies

View full text Add to dashboard Cite

The current state evaluation of power equipment often focuses solely on changes in electrical quantities while neglecting basic equipment information as well as textual information such as system alerts, operation records, and defect records. Constructing a device-centric knowledge graph by extracting information from multiple sources related to power equipment is a valuable approach to enhance the intelligence level of asset management. Through the collection of pertinent authentic datasets, we have established a dataset for the state evaluation of power equipment, encompassing 35 types of relationships. To better suit the characteristics of concentrated relationship representations and varying lengths in textual descriptions, we propose a generative model called RoUIE, which is a method for constructing a knowledge graph of power equipment based on improved Universal Information Extraction (UIE). This model first utilizes a pre-trained language model based on rotational position encoding as the text encoder in the fine-tuning stage. Subsequently, we innovatively leverage the Distribution Focal Loss (DFL) to replace Binary Cross-Entropy Loss (BCE) as the loss function, further enhancing the model’s extraction performance. The experimental results demonstrate that compared to the UIE model and mainstream joint extraction benchmark models, RoUIE exhibits superior performance on the dataset we constructed. On a general Chinese dataset, the proposed model also outperforms baseline models, showcasing the model’s universal applicability.

show abstract

“…Fine-tuning GPT-3 can produce powerful LLMs. The optimized GPT-3 has achieved very good performance in the field of NLP [15] [16]. However, compared with the BERT model, GPT-3 lacks sufficient bidirectional context modeling.…”

Section: Introductionmentioning

confidence: 99%

RehabGPT: a MaaS-based solution to large model building for digital rehabilitation

Wang,

Zhong,

Dai

et al. 2024

International Workshop on Advanced Imaging Technology (IWAIT) 2024

View full text Add to dashboard Cite

Digital rehabilitation plays a crucial role in the treatment of chronic diseases, as it enables the assessment of disease grades and the recommendation of treatment measures. In this paper, we propose a generative pre-trained transformer towards rehabilitation (RehabGPT) via a model-as-a-service (MaaS) solution to facilitate foundation model building for digital rehabilitation on Alibaba's ModelScope platform. It offers scalable computational resources needed for pre-trained large models. It also provides tools for multi-modal feature extraction, 3D human mesh reconstruction and analysis of video sequences. RehabGPT automates various aspects of the model development workflow, such as hyper-parameter tuning and architecture selection, making it easier to achieve the desired results in rehabilitation tasks.

show abstract

A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4

Cited by 12 publications

References 223 publications

Sentiment analysis of COP9-related tweets: a comparative study of pre-trained models and traditional techniques

Sentiment analysis of COP9-related tweets: a comparative study of pre-trained models and traditional techniques

RoUIE: A Method for Constructing Knowledge Graph of Power Equipment Based on Improved Universal Information Extraction

RehabGPT: a MaaS-based solution to large model building for digital rehabilitation

Contact Info

Product

Resources

About