Selecting Representative Data Sets

Borovicka, Tomas; Jirina, Marcel; Kordík, Pavel

doi:10.5772/50787

Cited by 65 publications

(53 citation statements)

References 103 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In [13], the authors evaluated a pool of potential input variables to predict tower top acceleration signal with a wrapper algorithm [22], which includes the predictor model to search for the variables that reduce prediction error (a posteriori approach). They found wind speed, tower acceleration and wind direction relevant.…”

Section: Methods and Resultsmentioning

confidence: 99%

“…For example, were we in posession of abnormalities in our data set, we could have also used a classification algorithm to mark records into either one of two classes (normal, abnormal). In such a case, care must been taken to account for the imbalance of instances in one of the classes [22], and as a consequence, we would have had to select other metrics, such as the F-measure as recommended by the investigation of fault diagnosis in gearboxes [33].…”

Section: Comparing Both Approachesmentioning

confidence: 99%

“…First, we divided data from October 2014 using the hold out method [22] in sub-sets for model training (70%), validation (20%) and test (10%). Hold out was selected also at this stage [16] because of the large number of points available in October and the data hold out from November.…”

Section: Neural Network: a Deterministic Approachmentioning

confidence: 99%

See 2 more Smart Citations

Normal Behaviour Models for Wind Turbine Vibrations: Comparison of Neural Networks and a Stochastic Approach

et al. 2017

View full text Add to dashboard Cite

Abstract:To monitor wind turbine vibrations, normal behaviour models are built to predict tower top accelerations and drive-train vibrations. Signal deviations from model prediction are labelled as anomalies and are further investigated. In this paper we assess a stochastic approach to reconstruct the 1 Hz tower top acceleration signal, which was measured in a wind turbine located at the wind farm Alpha Ventus in the German North Sea. We compare the resulting data reconstruction with that of a model based on a neural network, which has been previously reported as a data-mining algorithm suitable for reconstructing this signal. Our results present evidence that the stochastic approach outperforms the neural network in the high frequency domain (1 Hz). Although neural network retrieves accurate step-forward predictions, with low mean square errors, the stochastic approach predictions better preserve the statistics and the frequency components of the original signal, retaining high accuracy levels. The implementation of our stochastic approach is available as open source code and can easily be adapted for other situations involving stochastic data reconstruction. Based on our findings we argue that such an approach could be implemented in signal reconstruction for monitoring purposes or for abnormal behaviour detection.

show abstract

Section: Methods and Resultsmentioning

confidence: 99%

Section: Comparing Both Approachesmentioning

confidence: 99%

See 1 more Smart Citation

Normal Behaviour Models for Wind Turbine Vibrations: Comparison of Neural Networks and a Stochastic Approach

et al. 2017

View full text Add to dashboard Cite

show abstract

“…The development sample is used to develop the model (learning and estimating parameters of the model), while the validation sample is used to evaluate the model and for fi nal model selection. In a later phase of model development, a third type of sample -the testing sample(s) -can be used for assessing the predictive performance of the model [Borovicka et al, 2012]. If the same dataset would be used for the development, validation and calibration, the estimation of the predictive ability of the model would be overly optimistic.…”

Section: Introductionmentioning

confidence: 99%

Data representativeness problem in credit scoring

Ditrich¹

2015

AOP

View full text Add to dashboard Cite

When building models, it is common to split the whole dataset into a development and a validation sample. In some cases, using random sampling instead of stratified sampling can lead to loss of representativeness of final samples. In such cases, a model built on these data gives different or unexpected results when its performance is measured on the validation sample. In the business area, a lack of representativeness can cause interpretative problems and can have a huge financial impact when a biased model is involved in the credit granting process. The aim of this paper is to examine and understand why representativeness should be checked before the start of modelling. The paper deals with methods of identification of selection bias in time. It recommends using three tests as a common part of the data preparation process.

show abstract

“…Here the complete data set is split into a learning (70 %) and a test (30 %) set (Borovicka et al 2012). In this case there is no need for an evaluation set which is needed for some algorithms and in general accounts for around 10 % of the whole data set.…”

Section: By Creating Accumulated State Vectors Combining Individual mentioning

confidence: 99%

Identifying Product and Process State Drivers in Manufacturing Systems Using Supervised Machine Learning

Wuest¹

2015

Springer Theses

View full text Add to dashboard Cite

Selecting Representative Data Sets

Cited by 65 publications

References 103 publications

Normal Behaviour Models for Wind Turbine Vibrations: Comparison of Neural Networks and a Stochastic Approach

Normal Behaviour Models for Wind Turbine Vibrations: Comparison of Neural Networks and a Stochastic Approach

Data representativeness problem in credit scoring

Identifying Product and Process State Drivers in Manufacturing Systems Using Supervised Machine Learning

Contact Info

Product

Resources

About