Todd J. Treangen scite author profile

In order to determine the role of the database in taxonomic sequence classification, we examine the influence of the database over time on k-mer-based lowest common ancestor taxonomic classification. We present three major findings: the number of new species added to the NCBI RefSeq database greatly outpaces the number of new genera; as a result, more reads are classified with newer database versions, but fewer are classified at the species level; and Bayesian-based re-estimation mitigates this effect but struggles with novel genomes. These results suggest a need for new classification approaches specially adapted for large databases.

show abstract

Emu: species-level microbial community profiling of full-length 16S rRNA Oxford Nanopore sequencing data

Curry

Wang

Nute

et al. 2022

Nat Methods

View full text Add to dashboard Cite

Current progress and open challenges for applying deep learning across the biosciences

et al. 2022

View full text Add to dashboard Cite

Deep Learning (DL) has recently enabled unprecedented advances in one of the grand challenges in computational biology: the half-century-old problem of protein structure prediction. In this paper we discuss recent advances, limitations, and future perspectives of DL on five broad areas: protein structure prediction, protein function prediction, genome engineering, systems biology and data integration, and phylogenetic inference. We discuss each application area and cover the main bottlenecks of DL approaches, such as training data, problem scope, and the ability to leverage existing DL architectures in new contexts. To conclude, we provide a summary of the subject-specific and general challenges for DL across the biosciences.

show abstract

Infectious SARS-CoV-2 in Exhaled Aerosols and Efficacy of Masks During Early Mild Infection

Adenaiye¹,

Lai

Mesquita³

et al. 2021

Preprint

View full text Add to dashboard Cite

Background: SARS-CoV-2 epidemiology implicates airborne transmission; mask source-control efficacy for, variant impact on, and infectiousness of aerosols are not well understood. Methods: We recruited COVID-19 cases to give blood, saliva, mid-turbinate and fomite (phone) swabs, and 30-minute breath samples while vocalizing into a Gesundheit-II, with and without masks at up to two visits two days apart. We quantified and sequenced viral RNA, cultured virus, and assayed sera for anti-spike and anti-receptor binding domain antibodies. Results: We enrolled 61 participants with active infection, May 2020 through April 2021. Among 49 seronegative cases (mean days post onset 3.8 ±2.1), we detected SARS-CoV-2 RNA in 45% of fine (≥5 μm), 31% of coarse (>5 μm) aerosols, and 65% of fomite samples overall and in all samples from four alpha variant cases. Masks reduced viral RNA by 48% (95% confidence interval [CI], 3 to 72%) in fine and by 77% (95% CI, 51 to 89%) in coarse aerosols. The alpha variant was associated with a 43-fold (95% CI, 6.6 to 280-fold) increase in fine aerosol viral RNA that remained a significant 18-fold (95% CI, 3.4 to 92-fold) increase adjusting for viral RNA in saliva, in mid-turbinate swabs, and other potential confounders. Two fine aerosol samples, collected days 2-3 post illness onset, while participants wore masks, were culture-positive. Conclusion: SARS-CoV-2 is evolving toward more efficient airborne transmission and loose-fitting masks provide significant but only modest source control. Therefore, until vaccination rates are very high, continued layered controls and tight-fitting masks and respirators will be necessary.

show abstract

Systematic Analysis of Mobile Genetic Elements Mediating β-Lactamase Gene Amplification in Noncarbapenemase-Producing Carbapenem-Resistant Enterobacterales Bloodstream Infections

et al. 2022

View full text Add to dashboard Cite

show abstract

Vulcan: Improved long-read mapping and structural variant calling via dual-mode alignment

Mahmoud

Muraliraman

et al. 2021

View full text Add to dashboard Cite

Background Long-read sequencing has enabled unprecedented surveys of structural variation across the entire human genome. To maximize the potential of long-read sequencing in this context, novel mapping methods have emerged that have primarily focused on either speed or accuracy. Various heuristics and scoring schemas have been implemented in widely used read mappers (minimap2 and NGMLR) to optimize for speed or accuracy, which have variable performance across different genomic regions and for specific structural variants. Our hypothesis is that constraining read mapping to the use of a single gap penalty across distinct mutational hot spots reduces read alignment accuracy and impedes structural variant detection. Findings We tested our hypothesis by implementing a read-mapping pipeline called Vulcan that uses two distinct gap penalty modes, which we refer to as dual-mode alignment. The high-level idea is that Vulcan leverages the computed normalized edit distance of the mapped reads via minimap2 to identify poorly aligned reads and realigns them using the more accurate yet computationally more expensive long-read mapper (NGMLR). In support of our hypothesis, we show that Vulcan improves the alignments for Oxford Nanopore Technology long reads for both simulated and real datasets. These improvements, in turn, lead to improved accuracy for structural variant calling performance on human genome datasets compared to either of the read-mapping methods alone. Conclusions Vulcan is the first long-read mapping framework that combines two distinct gap penalty modes for improved structural variant recall and precision. Vulcan is open-source and available under the MIT License at https://gitlab.com/treangenlab/vulcan.

show abstract

Comprehensive analysis and accurate quantification of unintended large gene modifications induced by CRISPR-Cas9 gene editing

et al. 2022

View full text Add to dashboard Cite

Most genome editing analyses to date are based on quantifying small insertions and deletions. Here, we show that CRISPR-Cas9 genome editing can induce large gene modifications, such as deletions, insertions, and complex local rearrangements in different primary cells and cell lines. We analyzed large deletion events in hematopoietic stem and progenitor cells (HSPCs) using different methods, including clonal genotyping, droplet digital polymerase chain reaction, single-molecule real-time sequencing with unique molecular identifier, and long-amplicon sequencing assay. Our results show that large deletions of up to several thousand bases occur with high frequencies at the Cas9 on-target cut sites on the HBB (11.7 to 35.4%), HBG (14.3%), and BCL11A (13.2%) genes in HSPCs and the PD-1 (15.2%) gene in T cells. Our findings have important implications to advancing genome editing technologies for treating human diseases, because unintended large gene modifications may persist, thus altering the biological functions and reducing the available therapeutic alleles.

show abstract

Rapid Core-Genome Alignment and Visualization for Thousands of Intraspecific Microbial Genomes

Treangen¹,

Ondov²,

Koren³

et al. 2014

Preprint

View full text Add to dashboard Cite

Though many microbial species or clades now have hundreds of sequenced genomes, existing whole-genome alignment methods do not efficiently handle comparisons on this scale. Here we present the Harvest suite of core-genome alignment and visualization tools for quickly analyzing thousands of intraspecific microbial strains. Harvest includes Parsnp, a fast core-genome multi-aligner, and Gingr, a dynamic visual platform. Combined they provide interactive core-genome alignments, variant calls, recombination detection, and phylogenetic trees. Using simulated and real data we demonstrate that our approach exhibits unrivaled speed while maintaining the accuracy of existing methods. The Harvest suite is open-source and freely available from: http://github.com/marbl/harvest.

show abstract

12 3 4 5 6

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Todd J. Treangen

RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification

Emu: species-level microbial community profiling of full-length 16S rRNA Oxford Nanopore sequencing data

Current progress and open challenges for applying deep learning across the biosciences

Infectious SARS-CoV-2 in Exhaled Aerosols and Efficacy of Masks During Early Mild Infection

Systematic Analysis of Mobile Genetic Elements Mediating β-Lactamase Gene Amplification in Noncarbapenemase-Producing Carbapenem-Resistant Enterobacterales Bloodstream Infections

Vulcan: Improved long-read mapping and structural variant calling via dual-mode alignment

Comprehensive analysis and accurate quantification of unintended large gene modifications induced by CRISPR-Cas9 gene editing

Rapid Core-Genome Alignment and Visualization for Thousands of Intraspecific Microbial Genomes

Contact Info

Product

Resources

About