Viral Sequence Identification in Metagenomes using Natural Language Processing Techniques

Abdelkareem, Aly; Khalil, Mahmoud; Elbehery, Ali H. A.; Abbas, Hazem M.

doi:10.1101/2020.01.10.892158

Cited by 7 publications

(9 citation statements)

References 58 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Using this information, the tool recognizes the genomic sequence of the virus with an accuracy of 87.5%. They improved the accuracy reported in [12] by 0.5%. This tool is freely available online.…”

Section: Related Workmentioning

confidence: 75%

“…This can be a useful clue for the clinicians to find the most effective vaccine or drug for the treatment of 'COVID-19'. The comparison of the proposed CNN model 'GenomeSimilarityPredictor' with models proposed in the literature [10][11][12][13][14][15][16][17][18][19][20][21] shows that model has reported higher accuracy and outperforms the existing techniques as shown in Fig 8. Its effectiveness in dealing with noisy data, low time complexity makes it applicable for the screening of infected genomes in the present situation of 'Global Pandemic'. The zero instance in the FP and only 1 instance in the FN increase the acceptability of this model.…”

Section: Discussionmentioning

confidence: 93%

“…They claimed that genomic sequence detection is a fast and reliable technique for diagnosis of the disease. The researchers in [10][11][12][13][14][15][16][17][18][19][20][21] proposed the computer-based solutions for the detection of the viral genome or predicting the genomic similarity among viruses. The work proposed in [10] employs the hidden Markov's model and identifies the viral genome from the host cell with an accuracy of 87%.…”

Section: Related Workmentioning

confidence: 99%

“…But, its low accuracy is a point of concern for its use in the screening of patients in real-time. The authors in [12] applied natural language processing to detect the genome of a viral sequence. They considered the genome as a string and detected the sequence of base pairs as a sub-string.…”

Section: Related Workmentioning

confidence: 99%

See 3 more Smart Citations

Applying Deep Learning for Genome Detection of Coronavirus

Rani

Oza

Dhaka

et al. 2020

Preprint

View full text Add to dashboard Cite

Amidst the global pandemic and catastrophe created by ‘COVID-19’, every research institution and scientists are doing their best efforts to invent or find the vaccine or medicine for the disease. The objective of this research is to design and develop a deep learning model for finding the degree of similarity of the genome of the Severe Acute Respiratory Syndrome-Coronavirus 2 (‘SARS-CoV-2’) with a given genome. This research also aims at detecting the genome of ‘SARS-CoV-2’ in the host human beings. The experimental results on the dataset publicly available at National Centre for Biotechnology Information, show that the model is effective in predicting the similarity score of the genomic sequence of ‘SARS-CoV-2’ and other prevalent viruses such as Severe Acute Respiratory Syndrome-Coronavirus, Middle East Respiratory Syndrome Coronavirus, Human Immunodeficiency Virus, and Human T- cell Leukaemia Virus. This is successful in detecting the genome of ‘SARS-CoV-2’ in the host genome with an accuracy of 99.27%. It may prove a useful tool for doctors to quickly classify the infected and non-infected genomes. It can also be useful in finding the most effective drug from the available drugs for the treatment of ‘COVID-19’.

show abstract

Section: Related Workmentioning

confidence: 75%

Section: Discussionmentioning

confidence: 93%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Applying Deep Learning for Genome Detection of Coronavirus

Rani

Oza

Dhaka

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…Currently, several approaches for identifying viral sequences in metagenomics data exist and have helped in supersizing viral databases of uncultivated viral genomes (UViGs) over the last few years [20][21][22] . These tools are often based on sequence similarity 23 , sequence composition [24][25][26][27]28,29 , and identification of viral proteins or the lack of cellular ones 28,29 . A common denominator for these tools is their per-contig/sequence virus evaluation approach that is not optimal for addressing fragmented multi-contig virus assemblies.…”

Section: Introductionmentioning

confidence: 99%

Genome binning of viral entities from bulk metagenomics data

Johansen

Plichta

et al. 2021

Preprint

View full text Add to dashboard Cite

Despite the accelerating number of uncultivated virus sequences discovered in metagenomics and their apparent importance for health and disease, the human gut virome and its interactions with bacteria in the gastrointestinal are not well understood. In addition, a paucity of whole-virome datasets from subjects with gastrointestinal diseases is preventing a deeper understanding of the virome role in disease and in gastrointestinal ecology as a whole. By combining a deep-learning based metagenomics binning algorithm with paired metagenome and metavirome datasets we developed the Phages from Metagenomics Binning (PHAMB) approach for binning thousands of viral genomes directly from bulk metagenomics data. Simultaneously our methodology enables clustering of viral genomes into accurate taxonomic viral populations. We applied this methodology on the Human Microbiome Project 2 (HMP2) cohort and recovered 6,077 HQ genomes from 1,024 viral populations and explored viral-host interactions. We show that binning can be advantageously applied to existing and future metagenomes to illuminate viral ecological dynamics with other microbiome constituents.

show abstract

Applying deep learning-based multi-modal for detection of coronavirus

et al. 2021

View full text Add to dashboard Cite

Amidst the global pandemic and catastrophe created by 'COVID-19', every research institution and scientist are doing their best efforts to invent or find the vaccine or medicine for the disease. The objective of this research is to design and develop a deep learning-based multi-modal for the screening of COVID-19 using chest radiographs and genomic sequences. The modal is also effective in finding the degree of genomic similarity among the Severe Acute Respiratory Syndrome-Coronavirus 2 and other prevalent viruses such as Severe Acute Respiratory Syndrome-Coronavirus, Middle East Respiratory Syndrome-Coronavirus, Human Immunodeficiency Virus, and Human T-cell Leukaemia Virus. The experimental results on the datasets available at National Centre for Biotechnology Information, GitHub, and Kaggle repositories show that it is successful in detecting the genome of 'SARS-CoV-2' in the host genome with an accuracy of 99.27% and screening of chest radiographs into COVID-19, non-COVID pneumonia and healthy with a sensitivity of 95.47%. Thus, it may prove a useful tool for doctors to quickly classify the infected and non-infected genomes. It can also be useful in finding the most effective drug from the available drugs for the treatment of 'COVID-19'.

show abstract

Viral Sequence Identification in Metagenomes using Natural Language Processing Techniques

Cited by 7 publications

References 58 publications

Applying Deep Learning for Genome Detection of Coronavirus

Applying Deep Learning for Genome Detection of Coronavirus

Genome binning of viral entities from bulk metagenomics data

Applying deep learning-based multi-modal for detection of coronavirus

Contact Info

Product

Resources

About