Zurab Bzhalava scite author profile

Despite its clinical importance, detection of highly divergent or yet unknown viruses is a major challenge. When human samples are sequenced, conventional alignments classify many assembled contigs as “unknown” since many of the sequences are not similar to known genomes. In this work, we developed ViraMiner, a deep learning-based method to identify viruses in various human biospecimens. ViraMiner contains two branches of Convolutional Neural Networks designed to detect both patterns and pattern-frequencies on raw metagenomics contigs. The training dataset included sequences obtained from 19 metagenomic experiments which were analyzed and labeled by BLAST. The model achieves significantly improved accuracy compared to other machine learning methods for viral genome classification. Using 300 bp contigs ViraMiner achieves 0.923 area under the ROC curve. To our knowledge, this is the first machine learning methodology that can detect the presence of viral sequences among raw metagenomic contigs from diverse human samples. We suggest that the proposed model captures different types of information of genome composition, and can be used as a recommendation system to further investigate sequences labeled as “unknown” by conventional alignment methods. Exploring these highly-divergent viruses, in turn, can enhance our knowledge of infectious causes of diseases.

show abstract

Increasing participation in cervical screening by targeting long‐term nonattenders: Randomized health services study

Elfström

Sundström

Andersson

et al. 2019

Intl Journal of Cancer

View full text Add to dashboard Cite

High screening participation in the population is essential for optimal prevention of cervical cancer. Offering a high-risk human papillomavirus (HPV) self-test has previously been shown to increase participation. In this randomized health services study, we evaluated four strategies with regard to participation. Women who had not attended organized cervical screening in 10 years were eligible for inclusion. This group comprised 16,437 out of 413,487 resident women ages 33-60 (<4% of the screening target group). Among these 16,437 long-term nonattenders, 8,000 women were randomized to either (i) a HPV self-sampling kit sent directly; (ii) an invitation to order a HPV self-sampling kit using a new open source eHealth web application; (iii) an invitation to call a coordinating midwife with questions and concerns; or (iv) the standard annual renewed invitation letter with prebooked appointment time (routine practice). Overall participation, by arm, was (i) 18.7%; (ii) 10.7%; (iii) 1.9%; and (iv) 1.7%. The relative risk of participation in Arm 1 was 11.0 (95% CI 7.8-15.5), 6.3 (95% CI 4.4-8.9) in Arm 2 and 1.1 (95% CI 0.7-1.7) in Arm 3, compared to Arm 4. High-risk HPV prevalence among women who returned kits in study Arms 1 and 2 was 12.2%. In total, 63 women were directly referred to colposcopy from Arms 1 and 2; of which, 43 (68.3%) attended and 17 had a high-grade cervical lesion (CIN2+) in histology (39.5%). Targeting long-term nonattending women with sending or offering the opportunity to order self-sampling kits further increased the participation in an organized screening program.

show abstract

Machine Learning for detection of viral sequences in human metagenomic datasets

et al. 2018

View full text Add to dashboard Cite

BackgroundDetection of highly divergent or yet unknown viruses from metagenomics sequencing datasets is a major bioinformatics challenge. When human samples are sequenced, a large proportion of assembled contigs are classified as “unknown”, as conventional methods find no similarity to known sequences. We wished to explore whether machine learning algorithms using Relative Synonymous Codon Usage frequency (RSCU) could improve the detection of viral sequences in metagenomic sequencing data.ResultsWe trained Random Forest and Artificial Neural Network using metagenomic sequences taxonomically classified into virus and non-virus classes. The algorithms achieved accuracies well beyond chance level, with area under ROC curve 0.79. Two codons (TCG and CGC) were found to have a particularly strong discriminative capacity.ConclusionRSCU-based machine learning techniques applied to metagenomic sequencing data can help identify a large number of putative viral sequences and provide an addition to conventional methods for taxonomic classification.Electronic supplementary materialThe online version of this article (10.1186/s12859-018-2340-x) contains supplementary material, which is available to authorized users.

show abstract

Extension of the viral ecology in humans using viral profile hidden Markov models

Bzhalava¹,

Hultin²,

Dillner³

2018

PLoS ONE

View full text Add to dashboard Cite

When human samples are sequenced, many assembled contigs are “unknown”, as conventional alignments find no similarity to known sequences. Hidden Markov models (HMM) exploit the positions of specific nucleotides in protein-encoding codons in various microbes. The algorithm HMMER3 implements HMM using a reference set of sequences encoding viral proteins, “vFam”. We used HMMER3 analysis of “unknown” human sample-derived sequences and identified 510 contigs distantly related to viruses (Anelloviridae (n = 1), Baculoviridae (n = 34), Circoviridae (n = 35), Caulimoviridae (n = 3), Closteroviridae (n = 5), Geminiviridae (n = 21), Herpesviridae (n = 10), Iridoviridae (n = 12), Marseillevirus (n = 26), Mimiviridae (n = 80), Phycodnaviridae (n = 165), Poxviridae (n = 23), Retroviridae (n = 6) and 89 contigs related to described viruses not yet assigned to any taxonomic family). In summary, we find that analysis using the HMMER3 algorithm and the “vFam” database greatly extended the detection of viruses in biospecimens from humans.

show abstract

ViraMiner: Deep Learning on Raw DNA Sequences for Identifying Viral Genomes in Human Samples

Tampuu

Bzhalava

Dillner

et al. 2019

Preprint

View full text Add to dashboard Cite

Despite its clinical importance, detection of highly divergent or yet unknown viruses is a major challenge. When human samples are sequenced, conventional alignments classify many assembled contigs as "unknown" since many of the sequences are not similar to known genomes. In this work, we developed ViraMiner, a deep learning-based method to identify viruses in various human biospecimens. ViraMiner contains two branches of Convolutional Neural Networks designed to detect both patterns and pattern-frequencies on raw metagenomics contigs. The training dataset included sequences obtained from 19 metagenomic experiments which were analyzed and labeled by BLAST. The model achieves significantly improved accuracy compared to other machine learning methods for viral genome classification. Using 300 bp contigs ViraMiner achieves 0.923 area under the ROC curve. To our knowledge, this is the first machine learning methodology that can detect the presence of viral sequences among raw metagenomic contigs from diverse human samples. We suggest that the proposed model captures different types of information of genome composition, and can be used as a recommendation system to further investigate sequences labeled as "unknown" by conventional alignment methods. Exploring these highly-divergent viruses, in turn, can enhance our knowledge of infectious causes of diseases.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zurab Bzhalava

ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples

Increasing participation in cervical screening by targeting long‐term nonattenders: Randomized health services study

Machine Learning for detection of viral sequences in human metagenomic datasets

Extension of the viral ecology in humans using viral profile hidden Markov models

ViraMiner: Deep Learning on Raw DNA Sequences for Identifying Viral Genomes in Human Samples

Contact Info

Product

Resources

About