2020
DOI: 10.1101/2020.01.10.892158
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Viral Sequence Identification in Metagenomes using Natural Language Processing Techniques

Abstract: A B S T R A C TViral reads identification is one of the important steps in metagenomic data analysis. It shows up the diversity of the microbial communities and the functional characteristics of microorganisms. There are various tools that can identify viral reads in mixed metagenomic data using similarity and statistical tools. However, the lack of available genome diversity is a serious limitation to the existing techniques. In this work, we applied natural language processing approaches for document classif… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(9 citation statements)
references
References 58 publications
0
8
0
Order By: Relevance
“…Using this information, the tool recognizes the genomic sequence of the virus with an accuracy of 87.5%. They improved the accuracy reported in [12] by 0.5%. This tool is freely available online.…”
Section: Related Workmentioning
confidence: 75%
See 3 more Smart Citations
“…Using this information, the tool recognizes the genomic sequence of the virus with an accuracy of 87.5%. They improved the accuracy reported in [12] by 0.5%. This tool is freely available online.…”
Section: Related Workmentioning
confidence: 75%
“…This can be a useful clue for the clinicians to find the most effective vaccine or drug for the treatment of 'COVID-19'. The comparison of the proposed CNN model 'GenomeSimilarityPredictor' with models proposed in the literature [10][11][12][13][14][15][16][17][18][19][20][21] shows that model has reported higher accuracy and outperforms the existing techniques as shown in Fig 8. Its effectiveness in dealing with noisy data, low time complexity makes it applicable for the screening of infected genomes in the present situation of 'Global Pandemic'. The zero instance in the FP and only 1 instance in the FN increase the acceptability of this model.…”
Section: Discussionmentioning
confidence: 93%
See 2 more Smart Citations
“…Currently, several approaches for identifying viral sequences in metagenomics data exist and have helped in supersizing viral databases of uncultivated viral genomes (UViGs) over the last few years [20][21][22] . These tools are often based on sequence similarity 23 , sequence composition [24][25][26][27]28,29 , and identification of viral proteins or the lack of cellular ones 28,29 . A common denominator for these tools is their per-contig/sequence virus evaluation approach that is not optimal for addressing fragmented multi-contig virus assemblies.…”
Section: Introductionmentioning
confidence: 99%