On negative results when using sentiment analysis tools for software engineering research

Jongeling, Robbert; Sarkar, Proshanta; Datta, Subhajit; Serebrenik, Alexander

doi:10.1007/s10664-016-9493-x

Cited by 146 publications

(113 citation statements)

References 81 publications

(117 reference statements)

Supporting

Mentioning

103

Contrasting

Order By: Relevance

“…Still, many of these tools have not been designed or trained for handling the technical content typical of the software domain [127]. For instance, Jongeling et al [128] have compared various sentiment analysis tools used in previous studies in software engineering and found that they can disagree with the manual labeling of corpora performed by individuals as well as with each other. Therefore, we advocate caution when drawing conclusions from NLP tools not specifically trained for the specific purpose and lexicon, and we acknowledge this as a potential threat to instrumentation validity.…”

Section: Limitationsmentioning

confidence: 99%

A large-scale, in-depth analysis of developers’ personalities in the Apache ecosystem

Calefato

Lanubile²,

Vasilescu³

2019

Information and Software Technology

View full text Add to dashboard Cite

Context: Large-scale distributed projects are typically the results of collective efforts performed by multiple developers with heterogeneous personalities.Objective: We aim to find evidence that personalities can explain developers' behavior in large scale-distributed projects. For example, the propensity to trust others -a critical factor for the success of global software engineering -has been found to influence positively the result of code reviews in distributed projects.Method: In this paper, we perform a quantitative analysis of ecosystem-level data from the code commits and email messages contributed by the developers working on the Apache Software Foundation (ASF) projects, as representative of large scale-distributed projects.Results: We find that there are three common types of personality profiles among Apache developers, characterized in particular by their level of Agreeableness and Neuroticism. We also confirm that developers' personality is stable over time. Moreover, personality traits do not vary with their role, membership, and extent of contribution to the projects. We also find evidence that more open developers are more likely to make contributors to Apache projects.Conclusion: Overall, our findings reinforce the need for future studies on human factors in software engineering to use psychometric tools to control for differences in developers' personalities.

show abstract

Section: Limitationsmentioning

confidence: 99%

A large-scale, in-depth analysis of developers’ personalities in the Apache ecosystem

Calefato

Lanubile²,

Vasilescu³

2019

Information and Software Technology

View full text Add to dashboard Cite

show abstract

“…However, some researchers noticed unreliable results when directly employing such tools for SE tasks [38,41]. Jongeling et al [38] observed the disagreement among these existing tools on the datasets in SE and found that the results of several SE studies involving these sentiment analysis tools cannot be confirmed when a different tool is used. To investigate the challenges in sentiment analysis in SE, Islam and Zibran [35] applied the most popular SentiStrength to some labeled issue comments extracted from JIRA issue tracking system and conducted an indepth qualitative study to uncover twelve difficulties in identifying the sentiments of SE-related texts by analyzing the misclassified samples.…”

Section: Sentiment Analysis In Sementioning

confidence: 99%

“…Previous studies found that sentiment analysis tools trained on non-technical texts are not adequate for SE tasks [35,38] and a lack of domain-specific knowledge is the main reason [35]. Since then, many studies focused on how to use labeled SE-related texts to train SE-customized sentiment classifiers [8,14,35].…”

Section: Lessons Learned and Implicationsmentioning

confidence: 99%

“…It aims to identify the affective states and subjective opinions in texts. Many out-of-the-box sentiment analysis tools (e.g., SentiStrength [63]) not designed for SE-related texts have been applied to SE tasks, but recent work has indicated that they cannot produce reliable results on SE tasks [38]. Furthermore, Islam and Zibran [35] applied SentiStrength to an SErelated dataset and found that misunderstanding of domain-specific meanings of words (namely technical jargon in the rest of this paper) accounts for the most misclassifications.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

SEntiMoji: an emoji-powered learning approach for sentiment analysis in software engineering

Chen

Cao

Lü

et al. 2019

Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of

View full text Add to dashboard Cite

Sentiment analysis has various application scenarios in software engineering (SE), such as detecting developers' emotions in commit messages and identifying their opinions on Q&A forums. However, commonly used out-of-the-box sentiment analysis tools cannot obtain reliable results on SE tasks and the misunderstanding of technical jargon is demonstrated to be the main reason. Then, researchers have to utilize labeled SE-related texts to customize sentiment analysis for SE tasks via a variety of algorithms. However, the scarce labeled data can cover only very limited expressions and thus cannot guarantee the analysis quality. To address such a problem, we turn to the easily available emoji usage data for help. More specifically, we employ emotional emojis as noisy labels of sentiments and propose a representation learning approach that uses both Tweets and GitHub posts containing emojis to learn sentiment-aware representations for SE-related texts. These emoji-labeled posts can not only supply the technical jargon, but also incorporate more general sentiment patterns shared across domains. They as well as labeled data are used to learn the final sentiment classifier. Compared to the existing sentiment analysis methods used in SE, the proposed approach can achieve significant improvement on representative benchmark datasets. By further contrast experiments, we find that the Tweets make a key contribution to the power of our approach. This finding informs future research not to unilaterally pursue the domain-specific resource, but try to transform knowledge from the open domain through ubiquitous signals such as emojis.

show abstract

“…Because of the poor accuracy of existing sentiment analysis tools trained with general sentiment expressions [4], recent studies have tried to customize such tools with software engineering datasets [5]. However, it is reported that no tool is ready to accurately classify sentences to negative, neutral, or positive, even if tools are specifically customized for certain software engineering tasks [5].…”

Section: Introductionmentioning

confidence: 99%

Sentiment Classification Using N-Gram Inverse Document Frequency and Automated Machine Learning

2019

View full text Add to dashboard Cite

We propose a sentiment classification method with a general machine learning framework. For feature representation, n-gram IDF is used to extract software-engineeringrelated, dataset-specific, positive, neutral, and negative n-gram expressions. For classifiers, an automated machine learning tool is used. In the comparison using publicly available datasets, our method achieved the highest F1 values in positive and negative sentences on all datasets.

show abstract

On negative results when using sentiment analysis tools for software engineering research

Cited by 146 publications

References 81 publications

A large-scale, in-depth analysis of developers’ personalities in the Apache ecosystem

A large-scale, in-depth analysis of developers’ personalities in the Apache ecosystem

SEntiMoji: an emoji-powered learning approach for sentiment analysis in software engineering

Sentiment Classification Using N-Gram Inverse Document Frequency and Automated Machine Learning

Contact Info

Product

Resources

About