Many comparative studies on the performance of machine learning (ML) techniques for web cost estimation (WCE) have been reported in the literature. However, not much attention have been given to understanding the conceptual differences and similarities that exist in the application of these ML techniques for WCE, which could provide credible guide for upcoming practitioners and researchers in predicting the cost of new web projects. This paper presents a comparative analysis of three prominent machine learning techniques -Case-Based Reasoning (CBR), Support Vector Regression (SVR) and Artificial Neural Network (ANN)in terms of performance, applicability, and their conceptual differences and similarities for WCE by using data obtained from a public dataset (www.tukutuku.com). Results from experiments show that SVR and ANN provides more accurate predictions of effort, although SVR require fewer parameters to generate good predictions than ANN. CBR was not as accurate, but its good explanation attribute gives it a higher descriptive value. The study also outlined specific characteristics of the 3 ML techniques that could foster or inhibit their adoption for WCE.Handling of Non-linear relationship / coarse data: SVR and ANN have more in common in terms of their ability to handle non-linear and coarse data and have significant strength in this area compared to CBR Selection of training parameter: while there is a structural approach for selecting the training parameters for CBR, such activities with regards to SVR and ANN are experimental and based on trial and error. However, SVR require less amount of parameter selection during training compared to ANN Approach to solving Regression: Both SVR and ANN used an optimization approach, while CBR uses nearest neighbourhood derived by computing the degree of similarity Mode of Generalization: While CBR generalizes locally, both SVR and ANN are global.Explanation Capability: CBR have good explanation mechanism, which facilitate understanding of relationship among attributes and eventual results. This enables more user-involvement in performing the estimation task. In contrast, SVR and ANN aside from predicted result that is returned, do not provide any other details pertaining to attribute relationships, influence of attribute on the effort, and relevance of attribute, which can aid user's understanding.
Effectiveness with Sparse Data:Not having large amount of data is not a disadvantage for both SVR and ANN, but this is a weakness for CBR.
Nature of Technique:Generally, the use of CBR is more transparent and it is easier for the user to understand the inner workings of the system in terms of nature of computation and strength of relationship among attributes. SVR and ANN do not reveal the internal workings of the system but only returns relatively accurate estimates. It will be more difficult to ascertain whether the model has been built correctly, and validate whether the correct model has been built.