Predicting siRNA potency with random forests and support vector machines

Wang, Liangjiang; Huang, Caiyan; Yang, Jack

doi:10.1186/1471-2164-11-s3-s2

Cited by 29 publications

(18 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…SVMs models using all available traits or including five randomly selected root traits (R_5) were not able to increase the overall accuracy, which confirmed the necessity of root traits selection through RF in cultivar differentiation. This finding is in accordance with previous ML approaches in other scientific fields (Wang et al, 2010; Löw et al, 2012; Liu et al, 2014). The improved accuracy probably benefits from alleviating the ‘curse of dimensionality’ through root traits selection, removing non-informative signals (Chu et al, 2012).…”

Section: Discussionsupporting

confidence: 93%

“…The validation accuracy was treated as final prediction accuracy of SVMs/RF classifications. Classifications with an average prediction accuracy ≥80% were regarded as a high accuracy classifications (HACCs); the 80% level was determined acceptable by previous ML studies (Wang et al, 2010; Liu et al, 2014; Shang and Chisholm, 2014; Zheng et al, 2014; Sacchet et al, 2015). The whole process – RF ranking of root traits in each cultivar pair, SVMs and RF classification of pairs using different mtrys and Timp s – was repeated three times; the average accuracy with standard error was calculated.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Phenotyping: Using Machine Learning for Improved Pairwise Genotype Classification Based on Root Traits

2016

View full text Add to dashboard Cite

Phenotyping local crop cultivars is becoming more and more important, as they are an important genetic source for breeding – especially in regard to inherent root system architectures. Machine learning algorithms are promising tools to assist in the analysis of complex data sets; novel approaches are need to apply them on root phenotyping data of mature plants. A greenhouse experiment was conducted in large, sand-filled columns to differentiate 16 European Pisum sativum cultivars based on 36 manually derived root traits. Through combining random forest and support vector machine models, machine learning algorithms were successfully used for unbiased identification of most distinguishing root traits and subsequent pairwise cultivar differentiation. Up to 86% of pea cultivar pairs could be distinguished based on top five important root traits (Timp5) – Timp5 differed widely between cultivar pairs. Selecting top important root traits (Timp) provided a significant improved classification compared to using all available traits or randomly selected trait sets. The most frequent Timp of mature pea cultivars was total surface area of lateral roots originating from tap root segments at 0–5 cm depth. The high classification rate implies that culturing did not lead to a major loss of variability in root system architecture in the studied pea cultivars. Our results illustrate the potential of machine learning approaches for unbiased (root) trait selection and cultivar classification based on rather small, complex phenotypic data sets derived from pot experiments. Powerful statistical approaches are essential to make use of the increasing amount of (root) phenotyping information, integrating the complex trait sets describing crop cultivars.

show abstract

Section: Discussionsupporting

confidence: 93%

Section: Methodsmentioning

confidence: 99%

Phenotyping: Using Machine Learning for Improved Pairwise Genotype Classification Based on Root Traits

2016

View full text Add to dashboard Cite

show abstract

“…It shows high predictive accuracy and is applicable even in high-dimensional problems with highly correlated variables, a situation which often occurs in bioinformatics [56]. Additionally, Random Forests is good in handling redundant features that is reported previously [57], [58]. In this study, 100 trees are utilized to construct a Random Forests classifier, and the number of selected features is set to a default value of the square root of the total number of features [52].…”

Section: Methodsmentioning

confidence: 99%

Prediction and Analysis of Antibody Amyloidogenesis from Sequences

Liaw

Tung

2013

PLoS ONE

View full text Add to dashboard Cite

Antibody amyloidogenesis is the aggregation of soluble proteins into amyloid fibrils that is one of major causes of the failures of humanized antibodies. The prediction and prevention of antibody amyloidogenesis are helpful for restoring and enhancing therapeutic effects. Due to a large number of possible germlines, the existing method is not practical to predict sequences of novel germlines, which establishes individual models for each known germline. This study proposes a first automatic and across-germline prediction method (named AbAmyloid) capable of predicting antibody amyloidogenesis from sequences. Since the amyloidogenesis is determined by a whole sequence of an antibody rather than germline-dependent properties such as mutated residues, this study assess three types of germline-independent sequence features (amino acid composition, dipeptide composition and physicochemical properties). AbAmyloid using a Random Forests classifier with dipeptide composition performs well on a data set of 12 germlines. The within- and across-germline prediction accuracies are 83.10% and 83.33% using Jackknife tests, respectively, and the novel-germline prediction accuracy using a leave-one-germline-out test is 72.22%. A thorough analysis of sequence features is conducted to identify informative properties for further providing insights to antibody amyloidogenesis. Some identified informative physicochemical properties are amphiphilicity, hydrophobicity, reverse turn, helical structure, isoelectric point, net charge, mutability, coil, turn, linker, nuclear protein, etc. Additionally, the numbers of ubiquitylation sites in amyloidogenic and non-amyloidogenic antibodies are found to be significantly different. It reveals that antibodies less likely to be ubiquitylated tend to be amyloidogenic. The method AbAmyloid capable of automatically predicting antibody amyloidogenesis of novel germlines is implemented as a publicly available web server at http://iclab.life.nctu.edu.tw/abamyloid.

show abstract

“…The current neglect of SML techniques in plant phenotyping is partially based on earlier studies who failed to show how plant traits reflect environmental differences (Bari et al, , ) or even produced (partially) misleading results due to a biased trait selection method (Khazaei, Street, Bari, et al, ). Furthermore, widely different classification accuracies have been deemed acceptable in previous studies (Bari et al, ; Liu et al, ; Wang, Huang, & Yang, ; Zheng, Yoon, & Lam, )—restraining “trust” in the resilience of SML‐based data analysis within the scientific community. Because the classification accuracy is a result of both data and analysis method, high generalization accuracies cannot be expected per se and are also only a prerequisite for discovering an important phenotype.…”

Section: Introductionmentioning

confidence: 99%

“…Furthermore, widely different classification accuracies have been deemed acceptable in previous studies (Bari et al, 2016;Liu et al, 2014;Wang, Huang, & Yang, 2010;Zheng, Yoon, & Lam, 2014)restraining "trust" in the resilience of SML-based data analysis within the scientific community. Because the classification accuracy is a result of both data and analysis method, high generalization accuracies cannot be expected per se and are also only a prerequisite for discovering an important phenotype.…”

mentioning

confidence: 99%

Root traits of European Vicia faba cultivars-Using machine learning to explore adaptations to agroclimatic conditions

et al. 2017

View full text Add to dashboard Cite

Faba bean (Vicia faba L.) is an important source of protein, but breeding for increased yield stability and stress tolerance is hampered by the scarcity of phenotyping information. Because comparisons of cultivars adapted to different agroclimatic zones improve our understanding of stress tolerance mechanisms, the root architecture and morphology of 16 European faba bean cultivars were studied at maturity. Different machine learning (ML) approaches were tested in their usefulness to analyse trait variations between cultivars. A supervised, that is, hypothesisdriven, ML approach revealed that cultivars from Portugal feature greater and coarser but less frequent lateral roots at the top of the taproot, potentially enhancing water uptake from deeper soil horizons. Unsupervised clustering revealed that trait differences between northern and southern cultivars are not predominant but that two cultivar groups, independently from major and minor types, differ largely in overall root system size. Methodological guidelines on how to use powerful ML methods such as random forest models for enhancing the phenotypical exploration of plants are given.

show abstract

Predicting siRNA potency with random forests and support vector machines

Cited by 29 publications

References 22 publications

Phenotyping: Using Machine Learning for Improved Pairwise Genotype Classification Based on Root Traits

Phenotyping: Using Machine Learning for Improved Pairwise Genotype Classification Based on Root Traits

Prediction and Analysis of Antibody Amyloidogenesis from Sequences

Root traits of European Vicia faba cultivars-Using machine learning to explore adaptations to agroclimatic conditions

Contact Info

Product

Resources

About