Huy Tu scite author profile

Huy Tu

5Publications

138Citation Statements Received

268Citation Statements Given

How they've been cited

136

How they cite others

363

266

Affiliations

Meta (United States), North Carolina State University, University of Vermont

Publications

Order By: Most citations

Better Data Labelling With EMBLEM (and how that Impacts Defect Prediction)

2022

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

Standard automatic methods for recognizing problematic development commits can be greatly improved via the incremental application of human+artificial expertise. In this approach, called EMBLEM, an AI tool first explore the software development process to label commits that are most problematic. Humans then apply their expertise to check those labels (perhaps resulting in the AI updating the support vectors within their SVM learner). We recommend this human+AI partnership, for several reasons. When a new domain is encountered, EMBLEM can learn better ways to label which comments refer to real problems. Also, in studies with 9 open source software projects, labelling via EMBLEM's incremental application of human+AI is at least an order of magnitude cheaper than existing methods (≈ eight times). Further, EMBLEM is very effective. For the data sets explored here, EMBLEM better labelling methods significantly improved Popt20 and G-score performance in nearly all the projects studied here. TABLE 1This paper argues against using keywords like these as a method for labelling a commit as "buggy'.

show abstract

Identifying Self-Admitted Technical Debts With Jitterbug: A Two-Step Approach

Fahid

2022

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

FRUGAL: Unlocking Semi-Supervised Learning for Software Analytics

2021

View full text Add to dashboard Cite

Mining Workflows for Anomalous Data Transfers

Papadimitriou

Kiran

et al. 2021

View full text Add to dashboard Cite

Modern scientific workflows are data-driven and are often executed on distributed, heterogeneous, high-performance computing infrastructures. Anomalies and failures in the workflow execution cause loss of scientific productivity and inefficient use of the infrastructure. Hence, detecting, diagnosing, and mitigating these anomalies are immensely important for reliable and performant scientific workflows. Since these workflows rely heavily on high-performance network transfers that require strict QoS constraints, accurately detecting anomalous network performance is crucial to ensure reliable and efficient workflow execution. To address this challenge, we have developed X-FLASH, a network anomaly detection tool for faulty TCP workflow transfers. X-FLASH incorporates novel hyperparameter tuning and data mining approaches for improving the performance of the machine learning algorithms to accurately classify the anomalous TCP packets. X-FLASH leverages XGBoost as an ensemble model and couples XGBoost with a sequential optimizer, FLASH, borrowed from search-based Software Engineering to learn the optimal model parameters. X-FLASH found configurations that outperformed the existing approach up to 28%, 29%, and 40% relatively for F-measure, G-score, and recall in less than 30 evaluations. From (1) large improvement and (2) simple tuning, we recommend future research to have additional tuning study as a new standard, at least in the area of scientific workflow anomaly detection.

show abstract

Better Data Labelling with EMBLEM (and how that Impacts Defect Prediction)

Tu¹,

Yu²

2019

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Huy Tu

Better Data Labelling With EMBLEM (and how that Impacts Defect Prediction)

Identifying Self-Admitted Technical Debts With Jitterbug: A Two-Step Approach

FRUGAL: Unlocking Semi-Supervised Learning for Software Analytics

Mining Workflows for Anomalous Data Transfers

Better Data Labelling with EMBLEM (and how that Impacts Defect Prediction)

Contact Info

Product

Resources

About