Yan‐Fu Li scite author profile

Context: Due to the complex nature of the software development process, traditional parametric models and statistical methods often appear to be inadequate to model the increasingly complicated relationship between project development cost and the project features (or cost drivers). Machine learning (ML) methods, with several reported successful applications, have gained popularity for software cost estimation in recent years. Data preprocessing has been claimed by many researchers as a fundamental stage of ML methods; however, very few works have been focused on the effects of data preprocessing techniques. Objective: This study aims for a systematic assessment of the effectiveness of data preprocessing techniques on ML methods in the context of software cost estimation. Method: In this work, we first conduct a literature survey of the recent publications using data preprocessing techniques, followed by a systematic empirical study to analyze the strengths and weaknesses of individual data preprocessing techniques as well as their combinations. Results: Our results indicate that data preprocessing techniques may significantly influence the final prediction. They sometimes might have negative impacts on prediction performance of ML methods. Conclusion: In order to reduce prediction errors and improve efficiency, a careful selection is necessary according to the characteristics of machine learning methods, as well as the datasets used for software cost estimation.

show abstract

A systematic comparison of metamodeling techniques for simulation optimization in Decision Support Systems

Xie

et al. 2010

Applied Soft Computing

158

View full text Add to dashboard Cite

Reinforcement learning for microgrid energy management

Kuznetsova¹,

Li²,

Ruiz³

et al. 2013

Energy

199

View full text Add to dashboard Cite

An integrated framework of agent-based modelling and robust optimization for microgrid energy management

et al. 2014

View full text Add to dashboard Cite

A multi-state model for the reliability assessment of a distributed generation system via universal generating function

Li¹,

Zio²

2012

Reliability Engineering & System Safety

132

View full text Add to dashboard Cite

International audienceThe current and future developments of electric power systems are pushing the boundaries of reliability assessment to consider distribution networks with renewable generators. Given the stochastic features of these elements, most modeling approaches rely on Monte Carlo simulation. The computational costs associated to the simulation approach force to treating mostly small-sized systems, i.e. with a limited number of lumped components of a given renewable technology (e.g. wind or solar, etc.) whose behavior is described by a binary state, working or failed. In this paper, we propose an analytical multi-state modeling approach for the reliability assessment of distributed generation (DG). The approach allows looking to a number of diverse energy generation technologies distributed on the system. Multiple states are used to describe the randomness in the generation units, due to the stochastic nature of the generation sources and of the mechanical degradation/failure behavior of the generation systems. The universal generating function (UGF) technique is used for the individual component multi-state modeling. A multiplication-type composition operator is introduced to combine the UGFs for the mechanical degradation and renewable generation source states into the UGF of the renewable generator power output. The overall multi-state DG system UGF is then constructed and classical reliability indices (e.g. loss of load expectation (LOLE), expected energy not supplied (EENS)) are computed from the DG system generation and load UGFs. An application of the model is shown on a DG system adapted from the IEEE 34 nodes distribution test feeder

show abstract

A study of the non-linear adjustment for analogy based software cost estimation

Xie

Goh

2009

Empir Software Eng

View full text Add to dashboard Cite

Cost estimation is one of the most important but most difficult tasks in software project management. Many methods have been proposed for software cost estimation. Analogy Based Estimation (ABE), which is essentially a case-based reasoning (CBR) approach, is one popular technique. To improve the accuracy of ABE method, several studies have been focusing on the adjustments to the original solutions. However, most published adjustment mechanisms are based on linear forms and are restricted to numerical type of project features. On the other hand, software project datasets often exhibit nonnormal characteristics with large proportions of categorical features. To explore the possibilities for a better adjustment mechanism, this paper proposes Artificial Neural Network (ANN) for Non-linear adjustment to ABE (NABE) with the learning ability to approximate complex relationships and incorporating the categorical features. The proposed NABE is validated on four real world datasets and compared against the linear adjusted ABEs, CART, ANN and SWR. Subsequently, eight artificial datasets are generated for a systematic investigation on the relationship between model accuracies and dataset properties. The comparisons and analysis show that non-linear adjustment could generally extend ABE's flexibility on complex datasets with large number of categorical features and improve the accuracies of adjustment techniques.

show abstract

A study of mutual information based feature selection for case based reasoning in software cost estimation

Xie

Goh

2009

Expert Systems with Applications

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yan‐Fu Li

A study of project selection and feature weighting for analogy based software cost estimation

An empirical analysis of data preprocessing for machine learning-based software cost estimation

A systematic comparison of metamodeling techniques for simulation optimization in Decision Support Systems

Reinforcement learning for microgrid energy management

An integrated framework of agent-based modelling and robust optimization for microgrid energy management

A multi-state model for the reliability assessment of a distributed generation system via universal generating function

A study of the non-linear adjustment for analogy based software cost estimation

A study of mutual information based feature selection for case based reasoning in software cost estimation

Contact Info

Product

Resources

About