Jianglin Huang scite author profile

Jianglin Huang

Sign up to set email alerts

|

11Publications

203Citation Statements Received

393Citation Statements Given

How they've been cited

How they cite others

Affiliations

City University of Hong Kong

Publications

Order By: Most citations

An empirical analysis of data preprocessing for machine learning-based software cost estimation

¹

,

²

,

³

2015

Information and Software Technology

View full text Add to dashboard Cite

Context: Due to the complex nature of the software development process, traditional parametric models and statistical methods often appear to be inadequate to model the increasingly complicated relationship between project development cost and the project features (or cost drivers). Machine learning (ML) methods, with several reported successful applications, have gained popularity for software cost estimation in recent years. Data preprocessing has been claimed by many researchers as a fundamental stage of ML methods; however, very few works have been focused on the effects of data preprocessing techniques. Objective: This study aims for a systematic assessment of the effectiveness of data preprocessing techniques on ML methods in the context of software cost estimation. Method: In this work, we first conduct a literature survey of the recent publications using data preprocessing techniques, followed by a systematic empirical study to analyze the strengths and weaknesses of individual data preprocessing techniques as well as their combinations. Results: Our results indicate that data preprocessing techniques may significantly influence the final prediction. They sometimes might have negative impacts on prediction performance of ML methods. Conclusion: In order to reduce prediction errors and improve efficiency, a careful selection is necessary according to the characteristics of machine learning methods, as well as the datasets used for software cost estimation.

Cross-validation based K nearest neighbor imputation for software quality datasets: An empirical study

¹

,

²

,

³

et al. 2017

Journal of Systems and Software

View full text Add to dashboard Cite

Being able to predict software quality is essential, but also it pose significant challenges in software engineering. Historical software project datasets are often being utilized together with various machine learning algorithms for fault-proneness classification.Unfortunately, the missing values in datasets have negative impacts on the estimation accuracy and therefore, could lead to inconsistent results. As a method handling missing data, K nearest neighbor (KNN) imputation gradually gains acceptance in empirical studies by its exemplary performance and simplicity. To date, researchers still call for optimized parameter setting for KNN imputation to further improve its performance. In the work, we develop a novel incomplete-instance based KNN imputation technique, which utilizes a cross-validation scheme to optimize the parameters for each missing value. An experimental assessment is conducted on eight quality datasets under various missingness scenarios. The study also compared the proposed imputation approach with mean imputation and other three KNN imputation approaches. The results show that our proposed approach is superior to others in general. The relatively optimal fixed parameter settings for KNN imputation for software quality data is also 2 determined. It is observed that the classification accuracy is improved or at least maintained by using our approach for missing data imputation.

Cross-Project Defect Prediction Using a Credibility Theory Based Naive Bayes Classifier

¹

,

²

,

³

et al. 2017

View full text Add to dashboard Cite

An Empirical Analysis of Three-Stage Data-Preprocessing for Analogy-Based Software Effort Estimation on the ISBSG Data

¹

,

²

,

³

et al. 2017

View full text Add to dashboard Cite

An empirical study of the impact of project factors on software economics

¹

,

²

,

³

2015

View full text Add to dashboard Cite

Modularity's impact on the quality and productivity of embedded software development: a case study in a Hong Kong company

¹

,

²

,

³

et al. 2014

Total Quality Management & Business Excellence

View full text Add to dashboard Cite

Grey Relational Analysis Based k Nearest Neighbor Missing Data Imputation for Software Quality Datasets

¹

,

²

2016

View full text Add to dashboard Cite

Analyzing time pressure for software economics

¹

,

²

,

³

et al. 2019

View full text Add to dashboard Cite

Purpose The research on people and project factors is found extensively in general but not specific to software engineering. Secondly, the existing research has not concentrated on the communication and time complexity of the teams on software economics. The purpose this paper is to develop a model to investigate and quantify the impact of time pressure (TP) on software economics through the communication influence of software team sizes (TS). Design/methodology/approach A research model and five hypotheses are developed based on the gaps in the literature. The data set from International Software Benchmarking Standards Group repository is used for testing the hypotheses. Findings Important findings include: smaller TS tends to exert less TP on average; TP is directly proportional to software economics, however; and TP does not affect the productivity required for the software. Research limitations/implications The study has the following implications: Selection of an appropriate TS for project completion that ensures minimum pressure on team members; and maximize software outcomes in stress-free environment. Practical implications This work is useful for organizations carrying out software projects with teamwork. The project managers can benefit from the results while planning the team factors for achieving the project goals. Social implications The results uphold not to exert pressure on the team as it will not only affect the duly completion of the project but also the well-being of employees. Originality/value The paper is the first one where the proposition of TP estimation is done using TS and communication complexity, and empirically evaluating the impact of TP on four major software economics are the major key contributions of this research work.

12

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Copyright © 2024 scite LLC. All rights reserved.

Made with 💙 for researchers

Part of the Research Solutions Family.