2021
DOI: 10.1101/2021.02.08.430199
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Tiara: Deep learning-based classification system for eukaryotic sequences

Abstract: Motivation: With a large number of metagenomic datasets becoming available, the eukaryotic metagenomics emerged as a new challenge. The proper classification of eukaryotic nuclear and organellar genomes is an essential step towards the better understanding of eukaryotic diversity. Results: We developed Tiara, a deep-learning-based approach for identification of eukaryotic sequences in the metagenomic data sets. Its two-step classification process enables the classification of nuclear and organellar eukaryotic… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 53 publications
(52 reference statements)
0
5
0
Order By: Relevance
“…Due to differences in input data for the methods examined, Eukfinder_short, and Refmapping (7) were used to examine short Illumina reads, while Eukfinder_long, EukRep (8) and Tiara (10) were used to examine assembled contigs of these reads. To ensure the comparisons were fair, we first identified the parameters for each program that maximized genome completeness and contiguity.…”
Section: Identifying Optimal Parameters For Alternative Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…Due to differences in input data for the methods examined, Eukfinder_short, and Refmapping (7) were used to examine short Illumina reads, while Eukfinder_long, EukRep (8) and Tiara (10) were used to examine assembled contigs of these reads. To ensure the comparisons were fair, we first identified the parameters for each program that maximized genome completeness and contiguity.…”
Section: Identifying Optimal Parameters For Alternative Methodsmentioning
confidence: 99%
“…We compared the recovered genomes using EukRep with all three stringency cut-off modes and one analysis under lenient mode with the '-tie' set to prokaryotic. Tiara (10) utilizes previously trained neural networks to sort metagenomic reads into multiple categories, representing the taxonomic affiliation of each read within a given dataset. Tiara has two main parameters which can be adjusted in its workflow; the k-mer size, which adjusts the size of the DNA substrings used to compute sequence frequency, and the probability threshold, which disregards results with a probability score lower than that.…”
Section: Identifying Optimal Parameters For Alternative Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Taxonomy prediction using the MMseqs2 taxonomy subprogram for the MetaEukderived proteins identified 44% as belonging to the Phylum Ciliophora and the MAG had a relatively small proportion (0.33%) of interspersed repeat elements (n = 322). A cursory analysis using Tiara (Karlicki et al, 2021) revealed that the 13.5 Mbp MAG in question consisted of DNA sequences whose origins were 28.8% eukaryotic, 28.3% bacterial, 10.2% archaeal, 7.3% prokaryotic, and 25.4% unknown. When protein prediction was performed using Prodigal v2.6.3 (-p meta) (Hyatt et al, 2012), a tool for prokaryotic gene prediction, the number of recovered putative coding sequences increased from 1,943 to 9,020, suggesting that this particular MAG represented a binning error that combined genomic content across Domains.…”
Section: Mainmentioning
confidence: 99%