Predicting Defective Lines Using a Model-Agnostic Technique

Wattanakriengkrai, Supatsara; Thongtanunam, Patanamon; Tantithamthavorn, Chakkrit; Hata, Hideaki; Matsumoto, Kenichi

doi:10.1109/tse.2020.3023177

Cited by 75 publications

(33 citation statements)

References 89 publications

(163 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Yan et al [43] proposed a two-phase approach-i.e., the ML model trained on software metrics (e.g., #added lines) is first used to identify which commits are the most risky, then the N-gram model trained on textual features is finally used to localise the riskiest lines. On the other hand, a recent work by Wattanakriengkrai et al [42] pointed out that a machine learning approach outperforms the n-gram approach. However, their experiment focused solely on file-level defect prediction-not Just-In-Time defect prediction.…”

Section: Related Work and Research Questionsmentioning

confidence: 99%

See 1 more Smart Citation

JITLine: A Simpler, Better, Faster, Finer-grained Just-In-Time Defect Prediction

Pornprasit¹,

Tantithamthavorn²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Just-In-Time (JIT) defect prediction model is a classifier to predict if a commit is defect-introducing. Recently, CC2Vec-a deep learning approach for Just-In-Time defect prediction-has been proposed. However, CC2Vec requires the whole dataset (i.e., training + testing) for model training, assuming that all unlabelled testing datasets would be available beforehand, which does not follow the key principles of just-intime defect predictions. Our replication study shows that, after excluding the testing dataset for model training, the F-measure of CC2Vec is decreased by 38.5% for OpenStack and 45.7% for Qt, highlighting the negative impact of excluding the testing dataset for Just-In-Time defect prediction. In addition, CC2Vec cannot perform fine-grained predictions at the line level (i.e., which lines are most risky for a given commit).In this paper, we propose JITLine-a Just-In-Time defect prediction approach for predicting defect-introducing commits and identifying lines that are associated with that defect-introducing commit (i.e., defective lines). Through a case study of 37,524 commits from OpenStack and Qt, we find that our JITLine approach is at least 26%-38% more accurate (F-measure), 17%-51% more cost-effective (PCI@20%LOC), 70-100 times faster than the state-of-the-art approaches (i.e., CC2Vec and DeepJIT) and the fine-grained predictions at the line level by our approach are 133%-150% more accurate (Top-10 Accuracy) than the baseline NLP approach. Therefore, our JITLine approach may help practitioners to better prioritize defect-introducing commits and better identify defective lines.

show abstract

Section: Related Work and Research Questionsmentioning

confidence: 99%

“…In our studied projects, we found that the average size of the commit varies from 73 to 140 changed lines, but the average ratio of actual defective lines is as low as 51%-53%. Thus, developers still spend unnecessarily effort on locating actual defective lines of that commit [42]. To address bors.…”

Section: Related Work and Research Questionsmentioning

confidence: 99%

JITLine: A Simpler, Better, Faster, Finer-grained Just-In-Time Defect Prediction

Pornprasit¹,

Tantithamthavorn²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Besides of helping understand the behaviour of deep learning models, interpretability methods have also been used to directly improve the process of software engineering. For example, [43], [44] used LIME [45], a model-agnostic explainability method, to pinpoint the defective lines of code from defect detection models.…”

Section: I M O D E L I N S P E C T Io Nmentioning

confidence: 99%

Code Prediction by Feeding Trees to Transformers

Kim

Zhao

Tian

et al. 2021

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

127

103

View full text Add to dashboard Cite

Code prediction, more specifically autocomplete, has become an essential feature in modern IDEs. Autocomplete is more effective when the desired next token is at (or close to) the top of the list of potential completions offered by the IDE at cursor position. This is where the strength of the underlying machine learning system that produces a ranked order of potential completions comes into play.We advance the state-of-the-art in the accuracy of code prediction (next token prediction) used in autocomplete systems. Our work uses Transformers as the base neural architecture. We show that by making the Transformer architecture aware of the syntactic structure of code, we increase the margin by which a Transformer-based system outperforms previous systems. With this, it outperforms the accuracy of several state-of-the-art next token prediction systems by margins ranging from 14% to 18%.We present in the paper several ways of communicating the code structure to the Transformer, which is fundamentally built for processing sequence data. We provide a comprehensive experimental evaluation of our proposal, along with alternative design choices, on a standard Python dataset, as well as on a company internal Python corpus. Our code and data preparation pipeline will be available in open source.

show abstract

“…Interpretable or explainable deep learning based Android malware analysis is also a future interesting topic [60,160]. Recently, researchers have focused on conducting empirical studies to highlight the need of explainable AI/ML models for software engineering [75] and developing novel approaches for explainable AI/ML models for software engineering [75,83,125,125,129,135,176]. Although existing studies have attempted to employ local explainable approaches to provide explanations based on the Android characteristic-based features for each unknown sample [179], there are still several issues requiring further exploration.…”

Section: Rq22: What Deep Learning Architectures Are Used?mentioning

confidence: 99%

Deep Learning for Android Malware Defenses: a Systematic Literature Review

Liu,

Tantithamthavorn,

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Malicious applications (especially in the Android platform) are a serious threat to developers and end-users. Many research efforts have hence been devoted to developing effective approaches to defend Android malware. However, with the explosive growth of Android malware and the continuous advancement of malicious evasion technologies like obfuscation and reflection, android malware defenses based on manual rules or traditional machine learning may not be effective due to limited apriori knowledge. In recent years, a dominant research field of deep learning (DL) with the powerful feature abstraction ability has demonstrated a compelling and promising performance in various fields, like Nature Language processing and image processing. To this end, employing deep learning techniques to thwart the attack of Android malware has recently gained considerable research attention. Yet, there exists no systematic literature review that focuses on deep learning approaches for Android Malware defenses. In this paper, we conducted a systematic literature review to search and analyze how deep learning approaches have been applied in the context of malware defenses in the Android environment. As a result, a total of 104 studies were identified over the period 2014-2020. The results of our investigation show that even though most of these studies still mainly consider DL-based on Android malware detection, 35 primary studies (33.7%) design the defenses approaches based on other scenarios. This review also describes research trends, research focuses, challenges, and future research directions in DL-based Android malware defenses.CCS Concepts: • Security and privacy → Malware and its mitigation; Software security engineering;• General and reference → Surveys and overviews.

show abstract

Predicting Defective Lines Using a Model-Agnostic Technique

Cited by 75 publications

References 89 publications

JITLine: A Simpler, Better, Faster, Finer-grained Just-In-Time Defect Prediction

JITLine: A Simpler, Better, Faster, Finer-grained Just-In-Time Defect Prediction

Code Prediction by Feeding Trees to Transformers

Deep Learning for Android Malware Defenses: a Systematic Literature Review

Contact Info

Product

Resources

About