Random Forest

Žižka, Ján; Dařena, František; Svoboda, Arnošt

doi:10.1201/9780429469275-8

Cited by 78 publications

(21 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In this paper, RF is used for performing feature selection. RF can be considered as an improved version of bagged decision trees or bootstrap aggregation [23]. Although decision trees provide ease of interpretation and inference compared with other machine learning models, they suffer from high variance and overfitting.…”

Section: B Random Forest Feature Selectionmentioning

confidence: 99%

“…To address this issue, the RF method uses only a random subset of features as split candidates each time a split is built in its base estimators. By doing this, the base estimators will be decorrelated significantly and their complementarity will increase, improving the accuracy of the ensemble model [23]. After implementing the RF method on our data set, the most informative features are recognized based on their prediction ability and can be used to train the SVM-based classification Fig.…”

Section: B Random Forest Feature Selectionmentioning

confidence: 99%

See 1 more Smart Citation

Multi-Sensor Fault Diagnosis of Induction Motors Using Random Forests and Support Vector Machine

Saberi

Sandirasegaram

Belahcen

et al. 2020

2020 International Conference on Electrical Machines (ICEM)

View full text Add to dashboard Cite

This paper presents a fault diagnosis scheme for induction machines (IMs) using Support Vector Machine (SVM) and Random Forests (RFs). First, a number of timedomain and frequency-domain features are extracted from vibration and current signals in different operating conditions of IM. Then, these features are combined and considered as the input of SVM-based classification model. To avoid overfitting, RF is utilized to determine the most dominant features contributing to accurate classification. It is proved that the proposed method is capable of achieving highly accurate fault diagnosis results for broken rotor bar and eccentricity faults and it can appropriately handle the high dimensionality of the combined data.

show abstract

Section: B Random Forest Feature Selectionmentioning

confidence: 99%

Section: B Random Forest Feature Selectionmentioning

confidence: 99%

Multi-Sensor Fault Diagnosis of Induction Motors Using Random Forests and Support Vector Machine

Saberi

Sandirasegaram

Belahcen

et al. 2020

2020 International Conference on Electrical Machines (ICEM)

View full text Add to dashboard Cite

show abstract

“…Random Forest (RF) regression is one of the most widely used non-linear machine learning algorithms (Breiman and Friedman, 1997;Breiman, 2001), and has already found applications in air pollution sensor calibration as well as in other aspects of atmospheric chemistry (Keller and Evans, 2019;Nowack et al, 2018Nowack et al, , 2019Sherwen et al, 2019;Zimmerman et al, 2018;Malings et al, 2019). It follows the idea of ensemble learning where multiple machine learning models together make more reliable predictions than the individual models.…”

Section: Random Forest Regressionmentioning

confidence: 99%

“…By increasing the number of trees in the ensemble, the RF generalization error converges towards a lower limit. We here set the number of trees in all regression tasks to 200 as a compromise between model convergence and computational complexity (Breiman, 2001).…”

Section: Random Forest Regressionmentioning

confidence: 99%

Towards low-cost and high-performance air pollution measurements using machine learning calibration techniques

Nowack

Konstantinovskiy²,

Gardiner³

et al. 2020

Preprint

View full text Add to dashboard Cite

Abstract. Air pollution is a key public health issue in urban areas worldwide. The development of low-cost air pollution sensors is consequently a major research priority. However, low-cost sensors often fail to attain sufficient measurement performance compared to state-of-the-art measurement stations, and typically require calibration procedures in expensive laboratory settings. As a result, there has been much debate about calibration techniques that could make their performance more reliable, while also developing calibration procedures that can be carried out without access to advanced laboratories. One repeatedly proposed strategy is low-cost sensor calibration through co-location with public measurement stations. The idea is that, using a regression function, the low-cost sensor signals can be calibrated against the station reference signal, to be then deployed separately with performances similar to the original stations. Here we test the idea of using machine learning algorithms for such regression tasks using hourly-averaged co-location data for nitrogen dioxide (NO2) and particulate matter of particle sizes smaller than 10 μm (PM10) at three different locations in the urban area of London, UK. Specifically, we compare the performance of Ridge regression, a linear statistical learning algorithm, to two non-linear algorithms in the form of Random Forest (RF) regression and Gaussian Process regression (GPR). We further benchmark the performance of all three machine learning methods to the more common Multiple Linear Regression (MLR). We obtain very good out-of-sample R2-scores (coefficient of determination) > 0.7, frequently exceeding 0.8, for the machine learning calibrated low-cost sensors. In contrast, the performance of MLR is more dependent on random variations in the sensor hardware and co-located signals, and is also more sensitive to the length of the co-location period. We find that, subject to certain conditions, GPR is typically the best performing method in our calibration setting, followed by Ridge regression and RF regression. However, we also highlight several key limitations of the machine learning methods, which will be crucial to consider in any co-location calibration. In particular, none of the methods is able to extrapolate to pollution levels well outside those encountered at training stage. Ultimately, this is one of the key limiting factors when sensors are deployed away from the co-location site itself. Consequently, we find that the linear Ridge method, which best mitigates such extrapolation effects, is typically performing as good as, or even better, than GPR after sensor re-location. Overall, our results highlight the potential of co-location methods paired with machine learning calibration techniques to reduce costs of air pollution measurements, subject to careful consideration of the co-location training conditions, the choice of calibration variables, and the features of the calibration algorithm.

show abstract

“…Random forest has been shown to be the most accurate Machine Learning (ML) model for microbiome data analysis [36] . This method has the ability to discriminate groups, while considering interrelationships in high dimensional data [37] . The trained models resulted in high cross-validation scores for the bacteria test sets (r 2 =0.89 for rumen, r 2 =0.84 for feces), for archaea (r 2 =0.86 for rumen and r 2 =0.82 for feces) but not for protozoa (r 2 =0.57).…”

Section: Discrimination Between Dietary Treatment Groups With Random mentioning

confidence: 99%

Fecal and Ruminal Microbiome Components Associated With Methane Emission in Beef Cattle.

Andrade

Afli

Bressani

et al. 2020

Preprint

View full text Add to dashboard Cite

Background: The impact of extreme changes in weather patterns in the economy and humanity welfare are some of the biggest challenges that our civilization is facing. From the anthropogenic activities that contribute to climate change, reducing the impact of farming activities is a priority, since its responsible for up to 18% of greenhouse gases linked to such activities. To this end, we tested if the ruminal and fecal microbiomes components of 52 Brazilian Nelore bulls, belonging to two experimental groups based on the feed intervention, conventional (A) and byproducts based diet (B), could be used as biomarkers for methane (CH4) emission.Results: We identified a total of 5,693 Amplicon Sequence Variants (ASVs) in the Nelore bulls microbiomes from the experimental group B. Statistical analysis showed that the microbiome populations were significantly different among treatment groups. Differential abundance (DA) analysis with the ANCOM approach identified 30 bacterial and 15 archaea ASVs as DA among treatment groups. Random forest models, using either bacteria or archaea ASVs as predictors, were able to predict the treatment group with high accuracy (r2>0.85). Association analysis using Mixed Linear Models indicate that bacterial and archaea ASVs are linked to the CH4 emission phenotype, of which the most prominent were the ruminal ASV 40 and fecal ASV 35. These ASVs contributed to a 9.7% increase and 7.3% decrease of the variation in CH4 emission, respectively, which indicated their potential as targets for feed interventions and/or biomarkers.Conclusion: The feed composition induced significant differences in abundance and richness of ruminal and fecal microbial populations. The dietary treatment based on industrial byproducts applied had an impact on the microbiome diversity of bacteria and archaea, but not on protozoa. Microbiome components (ASVs) of bacteria and archaea can be successfully used to predict the treatment group, thus giving support to the hypothesis that the feed intervention modulate microbiome abundance and diversity. Microbiome components were associated with CH4 emission in both microbiomes. Therefore, both ruminal and fecal ASVs can be used as biomarkers for methane production and emission.

show abstract

Random Forest

Cited by 78 publications

References 0 publications

Multi-Sensor Fault Diagnosis of Induction Motors Using Random Forests and Support Vector Machine

Multi-Sensor Fault Diagnosis of Induction Motors Using Random Forests and Support Vector Machine

Towards low-cost and high-performance air pollution measurements using machine learning calibration techniques

Fecal and Ruminal Microbiome Components Associated With Methane Emission in Beef Cattle.

Contact Info

Product

Resources

About