2022
DOI: 10.1186/s12864-022-08540-6
|View full text |Cite
|
Sign up to set email alerts
|

PhyloMissForest: a random forest framework to construct phylogenetic trees with missing data

Abstract: Background In the pursuit of a better understanding of biodiversity, evolutionary biologists rely on the study of phylogenetic relationships to illustrate the course of evolution. The relationships among natural organisms, depicted in the shape of phylogenetic trees, not only help to understand evolutionary history but also have a wide range of additional applications in science. One of the most challenging problems that arise when building phylogenetic trees is the presence of missing biologic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 43 publications
0
7
0
Order By: Relevance
“…Simple direct deletion of data will result in the loss of valuable information and waste of resources, so the filling of missing data is a reasonable and realistic operation. Conventional data-filling methods such as multiple interpolation and median interpolation are unable to deal with data interactions and nonlinearities of variables in the face of high dimensional and large sample data, so algorithm-based data-filling can more accurately maintain the overall characteristics of the data [ 28 ]. We used the missForest R package (hyperparameters: maxiter = 10, ntree = 1000, verbose = TRUE) to interpolate variables with a few missing data.…”
Section: Methodsmentioning
confidence: 99%
“…Simple direct deletion of data will result in the loss of valuable information and waste of resources, so the filling of missing data is a reasonable and realistic operation. Conventional data-filling methods such as multiple interpolation and median interpolation are unable to deal with data interactions and nonlinearities of variables in the face of high dimensional and large sample data, so algorithm-based data-filling can more accurately maintain the overall characteristics of the data [ 28 ]. We used the missForest R package (hyperparameters: maxiter = 10, ntree = 1000, verbose = TRUE) to interpolate variables with a few missing data.…”
Section: Methodsmentioning
confidence: 99%
“…Several machine learning algorithms proposed so far are inferior to the ML method and only as good as distance-based methods (Zhu & Cai, 2021;Pinheiro et al, 2022). Suvorov Here we propose supervised machine learning using NNs as an alternative to existing model selection and topology reconstruction methods.…”
Section: Introductionmentioning
confidence: 99%
“…Image classification of alignments using deep CNNs has also been suggested for determining the best evolutionary model of sequence evolution (Burgstaller-Muehlbacher et al, preprint 2021). Furthermore, Pinheiro et al (2022) used a RF algorithm for predicting missing sequence regions and using this information to reconstruct phylogenetic trees. However, the accuracy of the RF method could hardly come close to that of the Neighbour Joining method.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The latter also removes data that might not be missing in some sequences, which can reduce phylogeny accuracies [22,25]. Machine learning-based imputation of missing data has been explored specifically for distance-based approaches [27,28]. Data imputation for character-based approaches requires a reference dataset of a comprehensive set of variants that are comparable to those in the study data to extrapolate correlations between the variants.…”
Section: Introductionmentioning
confidence: 99%