Ming Tan scite author profile

Abstract-Many defect prediction techniques are proposed to improve software reliability. Change classification predicts defects at the change level, where a change is the modifications to one file in a commit. In this paper, we conduct the first study of applying change classification in practice.We identify two issues in the prediction process, both of which contribute to the low prediction performance. First, the data are imbalanced-there are much fewer buggy changes than clean changes. Second, the commonly used cross-validation approach is inappropriate for evaluating the performance of change classification. To address these challenges, we apply and adapt online change classification, resampling, and updatable classification techniques to improve the classification performance.We perform the improved change classification techniques on one proprietary and six open source projects. Our results show that these techniques improve the precision of change classification by 12.2-89.5% or 6.4-34.8 percentage points (pp.) on the seven projects. In addition, we integrate change classification in the development process of the proprietary project. We have learned the following lessons: 1) new solutions are needed to convince developers to use and believe prediction results, and prediction results need to be actionable, 2) new and improved classification algorithms are needed to explain the prediction results, and insensible and unactionable explanations need to be filtered or refined, and 3) new techniques are needed to improve the relatively low precision.

show abstract

Improved Representation Learning for Question Answer Matching

Tan¹,

Santos²,

Xiang³

et al. 2016

213

155

View full text Add to dashboard Cite

Passage-level question answer matching is a challenging task since it requires effective representations that capture the complex semantic relations between questions and answers. In this work, we propose a series of deep learning models to address passage answer selection. To match passage answers to questions accommodating their complex semantic relations, unlike most previous work that utilizes a single deep learning structure, we develop hybrid models that process the text using both convolutional and recurrent neural networks, combining the merits on extracting linguistic information from both structures. Additionally, we also develop a simple but effective attention mechanism for the purpose of constructing better answer representations according to the input question, which is imperative for better modeling long answer sequences. The results on two public benchmark datasets, InsuranceQA and TREC-QA, show that our proposed models outperform a variety of strong baselines.

show abstract

Out-of-Domain Detection for Low-Resource Text Classification Tasks

Tan

Wang

et al. 2019

View full text Add to dashboard Cite

Out-of-domain (OOD) detection for lowresource text classification is a realistic but understudied task. The goal is to detect the OOD cases with limited in-domain (ID) training data, since we observe that training data is often insufficient in machine learning applications. In this work, we propose an OODresistant Prototypical Network to tackle this zero-shot OOD detection and few-shot ID classification task. Evaluation on real-world datasets show that the proposed solution outperforms state-of-the-art methods in zero-shot OOD detection task, while maintaining a competitive performance on ID classification task.

show abstract

Extracting Multiple-Relations in One-Pass with Pre-Trained Transformers

Wang¹,

Tan²,

Yu³

et al. 2019

View full text Add to dashboard Cite

The state-of-the-art solutions for extracting multiple entity-relations from an input paragraph always require a multiple-pass encoding on the input. This paper proposes a new solution that can complete the multiple entityrelations extraction task with only one-pass encoding on the input corpus, and achieve a new state-of-the-art accuracy performance, as demonstrated in the ACE 2005 benchmark. Our solution is built on top of the pre-trained self-attentive models (Transformer). Since our method uses a single-pass to compute all relations at once, it scales to larger datasets easily; which makes it more usable in real-world applications. 1

show abstract

The reliability of the Akaike information criterion method in cosmological model selection

Tan

Biswas

2011

View full text Add to dashboard Cite

The Akaike information criterion (AIC) has been used as a statistical criterion to compare the appropriateness of different dark energy candidate models underlying a particular data set. Under suitable conditions, the AIC is an indirect estimate of the Kullback–Leibler divergence D(T∥A) of a candidate model A with respect to the truth T. Thus, a dark energy model with a smaller AIC is ranked as a better model, since it has a smaller Kullback–Leibler discrepancy with T. In this paper, we explore the impact of statistical errors in estimating the AIC during model comparison. Using a parametric bootstrap technique, we study the distribution of AIC differences between a set of candidate models due to different realizations of noise in the data and show that the shape and spread of this distribution can be quite varied. We also study the rate of success of the AIC procedure for different values of a threshold parameter popularly used in the literature. For plausible choices of true dark energy models, our studies suggest that investigating such distributions of AIC differences in addition to the threshold is useful in correctly interpreting comparisons of dark energy models using the AIC technique.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ming Tan

Online Defect Prediction for Imbalanced Data

Improved Representation Learning for Question Answer Matching

Out-of-Domain Detection for Low-Resource Text Classification Tasks

Extracting Multiple-Relations in One-Pass with Pre-Trained Transformers

The reliability of the Akaike information criterion method in cosmological model selection

Contact Info

Product

Resources

About