Although energy efficiency is a hot topic in the context of global climate change, in the European Union directives and in national energy policies, methodology for estimating energy efficiency still relies on standard techniques defined by experts in the field. Recent research shows a potential of machine learning methods that can produce models to assess energy efficiency based on available previous data. In this paper, we analyse a real dataset of public buildings in Croatia, extract their most important features based on the correlation analysis and chi-square tests, cluster the buildings based on three selected features, and create a prediction model of energy efficiency for each cluster of buildings using the artificial neural network (ANN) methodology. The main objective of this research was to investigate whether a clustering procedure improves the accuracy of a neural network prediction model or not. For that purpose, the symmetric mean average percentage error (SMAPE) was used to compare the accuracy of the initial prediction model obtained on the whole dataset and the separate models obtained on each cluster. The results show that the clustering procedure has not increased the prediction accuracy of the models. Those preliminary findings can be used to set goals for future research, which can be focused on estimating clusters using more features, conducted more extensive variable reduction, and testing more machine learning algorithms to obtain more accurate models which will enable reducing costs in the public sector.
Abstract. The paper aims to establish an efficient model for predicting company growth by leveraging the strengths of logistic regression and neural networks. A real dataset of Croatian companies was used which described the relevant industry sector, financial ratios, income, and assets in the input space, with a dependent binomial variable indicating whether a company had high-growth if it had annualized growth in assets by more than 20% a year over a three-year period. Due to a large number of input variables, factor analysis was performed in the pre-processing stage in order to extract the most important input components. Building an efficient model with a high classification rate and explanatory ability required application of two data mining methods: logistic regression as a parametric and neural networks as a non-parametric method. The methods were tested on the models with and without variable reduction. The classification accuracy of the models was compared using statistical tests and ROC curves. The results showed that neural networks produce a significantly higher classification accuracy in the model when incorporating all available variables. The paper further discusses the advantages and disadvantages of both approaches, i.e. logistic regression and neural networks in modelling company growth. The suggested model is potentially of benefit to investors and economic policy makers as it provides support for recognizing companies with growth potential, especially during times of economic downturn.
The major challenge in influenza vaccination is to predict vaccine efficacy. The purpose of this study was to design a model to enable successful prediction of the outcome of influenza vaccination based on real historical medical data. A non-linear neural network approach was used, and its performance compared to logistic regression. The three neural network algorithms were tested: multilayer perceptron, radial basis and probabilistic in conjunction with parameter optimization and regularization techniques in order to create an influenza vaccination model that could be used for prediction purposes in the medical practice of primary health care physicians, where the vaccine is usually dispensed. The selection of input variables was based on a model of the vaccine strain which has frequently been changed and on which a poor influenza vaccine response is expected. The performance of models was measured by the average hit rate of negative and positive vaccine outcome. In order to test the generalization ability of the models, a 10-fold cross-validation procedure revealed that the model obtained by multilayer perceptron produced the highest average hit rate among neural network algorithms, and also outperformed the logistic regression model with regard to sensitivity and specificity. Sensitivity analysis was performed on the best model and the importance of input variables was discussed. Further research should focus on improving the performance of the model by combining neural networks with other intelligent methods in this field.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.