2019
DOI: 10.1016/j.csbj.2019.05.008
|View full text |Cite
|
Sign up to set email alerts
|

An Evaluation of Machine Learning Approaches for the Prediction of Essential Genes in Eukaryotes Using Protein Sequence-Derived Features

Abstract: The availability of whole-genome sequences and associated multi-omics data sets, combined with advances in gene knockout and knockdown methods, has enabled large-scale annotation and exploration of gene and protein functions in eukaryotes. Knowing which genes are essential for the survival of eukaryotic organisms is paramount for an understanding of the basic mechanisms of life, and could assist in identifying intervention targets in eukaryotic pathogens and cancer. Here, we studied essential gene orthologs am… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
36
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 33 publications
(38 citation statements)
references
References 59 publications
(84 reference statements)
2
36
0
Order By: Relevance
“…Combined with the bioinformatic prediction and prioritisation of essential genes from functional information (e.g., lethality) available for other metazoan organisms, particularly D . melanogaster , using machine learning approaches [ 55 ], RNAi-based screening of S . scabiei stages provides a powerful functional genomics tool to validate prioritised targets.…”
Section: Resultsmentioning
confidence: 99%
“…Combined with the bioinformatic prediction and prioritisation of essential genes from functional information (e.g., lethality) available for other metazoan organisms, particularly D . melanogaster , using machine learning approaches [ 55 ], RNAi-based screening of S . scabiei stages provides a powerful functional genomics tool to validate prioritised targets.…”
Section: Resultsmentioning
confidence: 99%
“…Features were selected by random subsampling from 10 to 90% of data representing ‘ essential ’ or ‘ non-essential ’’ genes (in 10% stepwise increments) based on a consensus between elasticNet (alpha = 0.5) and ensemble Sparse Partial Least Squares (SPLS) methods using ‘glmnet’ and ‘enspls’ in R, respectively ( 26 ). The individual feature values were then normalized by subtracting the mean and dividing by the standard deviation calculated for each feature column.…”
Section: Methodsmentioning
confidence: 99%
“…The individual feature values were then normalized by subtracting the mean and dividing by the standard deviation calculated for each feature column. Normalized features were used to train each of six ML-models (GBM (Gradient Boosting Machine), GLM (Generalized Linear Model), NN (Neural Network—perceptron), Random Forest (RF), SVM (Support-Vector Machine) ( 26 ) and XGB (eXtreme Gradient Boosting—xgbTree) in the ‘caret’ R-package. During the training process, we employed parameter-tuning and 5-fold cross-validation, ultimately selecting the models with highest ROC-AUC.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations