2010
DOI: 10.1186/1471-2164-11-s3-s2
|View full text |Cite
|
Sign up to set email alerts
|

Predicting siRNA potency with random forests and support vector machines

Abstract: BackgroundShort interfering RNAs (siRNAs) can be used to knockdown gene expression in functional genomics. For a target gene of interest, many siRNA molecules may be designed, whereas their efficiency of expression inhibition often varies.ResultsTo facilitate gene functional studies, we have developed a new machine learning method to predict siRNA potency based on random forests and support vector machines. Since there were many potential sequence features, random forests were used to select the most relevant … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

2
16
0

Year Published

2012
2012
2017
2017

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 29 publications
(18 citation statements)
references
References 22 publications
2
16
0
Order By: Relevance
“…SVMs models using all available traits or including five randomly selected root traits (R_5) were not able to increase the overall accuracy, which confirmed the necessity of root traits selection through RF in cultivar differentiation. This finding is in accordance with previous ML approaches in other scientific fields (Wang et al, 2010; Löw et al, 2012; Liu et al, 2014). The improved accuracy probably benefits from alleviating the ‘curse of dimensionality’ through root traits selection, removing non-informative signals (Chu et al, 2012).…”
Section: Discussionsupporting
confidence: 93%
See 1 more Smart Citation
“…SVMs models using all available traits or including five randomly selected root traits (R_5) were not able to increase the overall accuracy, which confirmed the necessity of root traits selection through RF in cultivar differentiation. This finding is in accordance with previous ML approaches in other scientific fields (Wang et al, 2010; Löw et al, 2012; Liu et al, 2014). The improved accuracy probably benefits from alleviating the ‘curse of dimensionality’ through root traits selection, removing non-informative signals (Chu et al, 2012).…”
Section: Discussionsupporting
confidence: 93%
“…The validation accuracy was treated as final prediction accuracy of SVMs/RF classifications. Classifications with an average prediction accuracy ≥80% were regarded as a high accuracy classifications (HACCs); the 80% level was determined acceptable by previous ML studies (Wang et al, 2010; Liu et al, 2014; Shang and Chisholm, 2014; Zheng et al, 2014; Sacchet et al, 2015). The whole process – RF ranking of root traits in each cultivar pair, SVMs and RF classification of pairs using different mtrys and Timp s – was repeated three times; the average accuracy with standard error was calculated.…”
Section: Methodsmentioning
confidence: 99%
“…It shows high predictive accuracy and is applicable even in high-dimensional problems with highly correlated variables, a situation which often occurs in bioinformatics [56]. Additionally, Random Forests is good in handling redundant features that is reported previously [57], [58]. In this study, 100 trees are utilized to construct a Random Forests classifier, and the number of selected features is set to a default value of the square root of the total number of features [52].…”
Section: Methodsmentioning
confidence: 99%
“…The current neglect of SML techniques in plant phenotyping is partially based on earlier studies who failed to show how plant traits reflect environmental differences (Bari et al, , ) or even produced (partially) misleading results due to a biased trait selection method (Khazaei, Street, Bari, et al, ). Furthermore, widely different classification accuracies have been deemed acceptable in previous studies (Bari et al, ; Liu et al, ; Wang, Huang, & Yang, ; Zheng, Yoon, & Lam, )—restraining “trust” in the resilience of SML‐based data analysis within the scientific community. Because the classification accuracy is a result of both data and analysis method, high generalization accuracies cannot be expected per se and are also only a prerequisite for discovering an important phenotype.…”
Section: Introductionmentioning
confidence: 99%
“…Furthermore, widely different classification accuracies have been deemed acceptable in previous studies (Bari et al, 2016;Liu et al, 2014;Wang, Huang, & Yang, 2010;Zheng, Yoon, & Lam, 2014)restraining "trust" in the resilience of SML-based data analysis within the scientific community. Because the classification accuracy is a result of both data and analysis method, high generalization accuracies cannot be expected per se and are also only a prerequisite for discovering an important phenotype.…”
mentioning
confidence: 99%