Sentiment Analysis for Software Engineering: How Far Can Pre-trained Transformer Models Go?

Zhang, Ting; Xu, Bowen; Thung, Ferdian; Haryono, Stefanus Agus; Lo, David; Jiang, Lingxiao

doi:10.1109/icsme46990.2020.00017

Cited by 80 publications

(89 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, not all the solutions support retraining, as in the case of the lexicon-based tools DEVA and SentiStrength-SE. Based on the results of our analysis on the agreement of tools (see Section 5) and in line with previous evidence (Jongeling et al 2017;Zhang et al 2020), we suggest implementing an ensemble of tools with a majority voting system as a possible way to increase the agreement with manual labels when the retraining of the selected solution is not an option.…”

Section: Sentiment Analysis Tools Should Be Retrained If Possible Rather Than Used Off the Shelfsupporting

confidence: 83%

“…As for disagreement, the proportion of severe disagreement rates drops to 2% and 3% for GitHub and Stack Overflow, respectively, which is comparable to the disagreement between human raters (see Table 11). More recently, similar findings were presented by Zhang et al (2020). In their study leveraging deep learning for sentiment analysis, they provided evidence on how composition of different classifiers may boost performance.…”

Section: Follow-up Analysis On Majority Votingsupporting

confidence: 65%

See 1 more Smart Citation

Assessment of off-the-shelf SE-specific sentiment analysis tools: An extended replication study

et al. 2021

View full text Add to dashboard Cite

Sentiment analysis methods have become popular for investigating human communication, including discussions related to software projects. Since general-purpose sentiment analysis tools do not fit well with the information exchanged by software developers, new tools, specific for software engineering (SE), have been developed. We investigate to what extent off-the-shelf SE-specific tools for sentiment analysis mitigate the threats to conclusion validity of empirical studies in software engineering, highlighted by previous research. First, we replicate two studies addressing the role of sentiment in security discussions on GitHub and in question-writing on Stack Overflow. Then, we extend the previous studies by assessing to what extent the tools agree with each other and with the manual annotation on a gold standard of 600 documents. We find that different SE-specific sentiment analysis tools might lead to contradictory results at a fine-grain level, when used off-the-shelf. Conversely, platform-specific tuning or retraining might be needed to take into account differences in platform conventions, jargon, or document lengths.

show abstract

Section: Sentiment Analysis Tools Should Be Retrained If Possible Rather Than Used Off the Shelfsupporting

confidence: 83%

Section: Follow-up Analysis On Majority Votingsupporting

confidence: 65%

Assessment of off-the-shelf SE-specific sentiment analysis tools: An extended replication study

et al. 2021

View full text Add to dashboard Cite

show abstract

“…For each API review aspect, we evaluate the performance in terms of five evaluation metrics (i.e., P, R, F 1, M CC, and AU C) as introduced in Section III-E. RQ1 Can pre-trained transformer-based models achieve better performance than the state-of-the-art approach which is based on traditional machine learning models? Motivation Previous studies have shown the great potential of pre-trained transformer-based models on many software engineering tasks, e.g., sentiment analysis for software data [9] and code summarization [12]. However, the efficacy of the pre-trained transformer-based models for various types of For the summative result in Table IV, we calculate the arithmetic average of the used evaluation metrics of each approach across all the aspects as avg.…”

Section: Discussionmentioning

confidence: 99%

“…Pre-trained Transformer-based Approaches In this study, we consider four popular and state-of-the-art pre-trained transformer-based models which have been utilized in many other software tasks [9], [24], [25], including BERT, RoBERTa, ALBERT, XLNet. We also apply two PTM variants: BERTOverflow [18] that is pre-trained with software engineer in-domain data; CostSensBERT [19] that designed to handle imbalanced data.…”

Section: Implementationsmentioning

confidence: 99%

“…In recent years, pre-trained transformer-based models have achieved exceptional performance in many tasks and areas including the software engineering domain [9]- [12]. For example, Zhang et al conduct an empirical study on benchmarking four pre-trained transformer-based models (BERT [13], RoBERTa [14], ALBERT [15], and XLNet [16]) for sentiment analysis on six software repositories (e.g., code reviews) [9]. They find that the pre-trained transformer-based models outperform existing sentiment analysis tools by a big margin.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Aspect-Based API Review Classification: How Far Can Pre-Trained Transformer Model Go?

Yang¹,

Xu²,

Khan³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

APIs (Application Programming Interfaces) are reusable software libraries and are building blocks for modern rapid software development. Previous research shows that programmers frequently share and search for reviews of APIs on the mainstream software question and answer (Q&A) platforms like Stack Overflow, which motivates researchers to design tasks and approaches related to process API reviews automatically. Among these tasks, classifying API reviews into different aspects (e.g., performance or security), which is called the aspect-based API review classification, is of great importance. The current state-of-the-art (SOTA) solution to this task is based on the traditional machine learning algorithm. Inspired by the great success achieved by pre-trained models on many software engineering tasks, this study fine-tunes six pre-trained models for the aspect-based API review classification task and compares them with the current SOTA solution on an API review benchmark collected by Uddin et al. The investigated models include four models (BERT, RoBERTa, ALBERT and XLNet) that are pretrained on natural languages, BERTOverflow that is pre-trained on text corpus extracted from posts on Stack Overflow, and CosSensBERT that is designed for handling imbalanced data. The results show that all the six fine-tuned models outperform the traditional machine learning-based tool. More specifically, the improvement on the F1-score ranges from 21.0% to 30.2%. We also find that BERTOverflow, a model pre-trained on the corpus from Stack Overflow, does not show better performance than BERT. The result also suggests that CosSensBERT also does not exhibit better performance than BERT in terms of F1, but it is still worthy of being considered as it achieves better performance on MCC and AUC.

show abstract

How Dimensionality Reduction Affects Sentiment Analysis NLP Tasks: An Experimental Study

Akritidis

Bozanis

2022

IFIP Advances in Information and Communication Technology

View full text Add to dashboard Cite

Sentiment Analysis for Software Engineering: How Far Can Pre-trained Transformer Models Go?

Cited by 80 publications

References 35 publications

Assessment of off-the-shelf SE-specific sentiment analysis tools: An extended replication study

Assessment of off-the-shelf SE-specific sentiment analysis tools: An extended replication study

Aspect-Based API Review Classification: How Far Can Pre-Trained Transformer Model Go?

How Dimensionality Reduction Affects Sentiment Analysis NLP Tasks: An Experimental Study

Contact Info

Product

Resources

About