2018
DOI: 10.1186/s13040-018-0170-z
|View full text |Cite
|
Sign up to set email alerts
|

Feature selection for gene prediction in metagenomic fragments

Abstract: BackgroundComputational approaches, specifically machine-learning techniques, play an important role in many metagenomic analysis algorithms, such as gene prediction. Due to the large feature space, current de novo gene prediction algorithms use different combinations of classification algorithms to distinguish between coding and non-coding sequences.ResultsIn this study, we apply a filter method to select relevant features from a large set of known features instead of combining them using linear classifiers o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0
1

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 13 publications
(5 citation statements)
references
References 29 publications
0
4
0
1
Order By: Relevance
“…This can safely be done for redundant features, i.e., features that do not give meaningful information about the class, are correlated, or are derived from other features in the data set. Amani Al-Ajlan and Achraf El Allali proposed a methodology [ 72 ] for feature selection using maximum Relevance Minimum Redundancy (mRMR) to find the most relevant features. The feature extraction algorithm has shown good results in improving classification results from Support Vector Machine (SVM)-based models.…”
Section: Discussionmentioning
confidence: 99%
“…This can safely be done for redundant features, i.e., features that do not give meaningful information about the class, are correlated, or are derived from other features in the data set. Amani Al-Ajlan and Achraf El Allali proposed a methodology [ 72 ] for feature selection using maximum Relevance Minimum Redundancy (mRMR) to find the most relevant features. The feature extraction algorithm has shown good results in improving classification results from Support Vector Machine (SVM)-based models.…”
Section: Discussionmentioning
confidence: 99%
“…In addition, strategies such as principal component analysis (PCA) [37] , n-Grams, minimal-redundance maximum-Relevance (mRMR) [38] are widely used in order to select a subset of the features. Studies show that applying feature selection algorithms produce better performance than using the extracted features directly or applying a multi-layer machine learning approach [39] , [40] . However, feature selection should be performed on a different dataset than the training to avoid biases in the performance analysis during testing.…”
Section: Machine Learningmentioning
confidence: 99%
“…Other tools, often used for metagenomics analysis, are based on vectorization of sequence features of ORF-candidates, which is an efficient conversion of nucleotide sequence into a vector of sequence features (Al-Ajlan and El Allali, 2018;Al-Ajlan and El Allali, 2019;El Allali and Rose, 2013;Hoff, et al, 2008;Trimble, et al, 2012;Zhang, et al, 2017). These features can be evaluated by a machine learning models in order to select true ORFs.…”
Section: Introductionmentioning
confidence: 99%
“…A promising way that could wave the main drawbacks of existing algorithms, is to select the informative parameters describing candidate ORF fragments using advanced algorithms of nucleotide sequence vectorization (Bao, et al, 2014;Mao, et al, 2014), and then to apply an optimal prediction algorithm and identify the most probable or true ORF sequence among candidates. This approach needs only a -4 -limited set of input features (here we used 104), which is significantly less than 4000-5000 considered earlier (Al-Ajlan and El Allali, 2018). The use of a random forest classifier (Breiman, 2001) has several advantages over more complex deep learning techniques (Al-Ajlan and El Allali, 2019;Wen, et al, 2019).…”
Section: Introductionmentioning
confidence: 99%