Samuel L. Neff scite author profile

Samuel L. Neff

5Publications

47Citation Statements Received

330Citation Statements Given

How they've been cited

How they cite others

253

317

Affiliations

Dartmouth College

Publications

Order By: Most citations

GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses

Koeppen

Holden

et al. 2021

mSystems

View full text Add to dashboard Cite

The NCBI Gene Expression Omnibus (GEO) provides tools to query and download transcriptomic data. However, less than 4% of microbial experiments include the sample group annotations required to assess differential gene expression for high-throughput reanalysis, and data deposited after 2014 universally lack these annotations. Our algorithm GAUGE (general annotation using text/data group ensembles) automatically annotates GEO microbial data sets, including microarray and RNA sequencing studies, increasing the percentage of data sets amenable to analysis from 4% to 33%. Eighty-nine percent of GAUGE-annotated studies matched group assignments generated by human curators. To demonstrate how GAUGE annotation can lead to scientific insight, we created GAPE (GAUGE-annotated Pseudomonas aeruginosa and Escherichia coli transcriptomic compendia for reanalysis), a Shiny Web interface to analyze 73 GAUGE-annotated P. aeruginosa studies, three times more than previously available. GAPE analysis revealed that PA3923, a gene of unknown function, was frequently differentially expressed in more than 50% of studies and significantly coregulated with genes involved in biofilm formation. Follow-up wet-bench experiments demonstrate that PA3923 mutants are indeed defective in biofilm formation, consistent with predictions facilitated by GAUGE and GAPE. We anticipate that GAUGE and GAPE, which we have made freely available, will make publicly available microbial transcriptomic data easier to reuse and lead to new data-driven hypotheses. IMPORTANCE GEO archives transcriptomic data from over 5,800 microbial experiments and allows researchers to answer questions not directly addressed in published papers. However, less than 4% of the microbial data sets include the sample group annotations required for high-throughput reanalysis. This limitation blocks a considerable amount of microbial transcriptomic data from being reused easily. Here, we demonstrate that the GAUGE algorithm could make 33% of microbial data accessible to parallel mining and reanalysis. GAUGE annotations increase statistical power and, thereby, make consistent patterns of differential gene expression easier to identify. In addition, we developed GAPE (GAUGE-annotated Pseudomonas aeruginosa and Escherichia coli transcriptomic compendia for reanalysis), a Shiny Web interface that performs parallel analyses on P. aeruginosa and E. coli compendia. Source code for GAUGE and GAPE is freely available and can be repurposed to create compendia for other bacterial species.

show abstract

Computationally Efficient Assembly of Pseudomonas aeruginosa Gene Expression Compendia

et al. 2023

View full text Add to dashboard Cite

show abstract

ESKAPE Act Plus: Pathway Activation Analysis for Bacterial Pathogens

2022

View full text Add to dashboard Cite

ESKAPE pathogens are bacteria of concern because they develop antibiotic resistance and can cause life-threatening infections, particularly in more susceptible immunocompromised people. ESKAPE Act PLUS is a user-friendly web application that will advance research on ESKAPE and other pathogens commonly studied by the biomedical community by allowing scientists to infer biological phenotypes from the results from high-throughput bacterial gene or protein expression experiments.

show abstract

Compendium-Wide Analysis of Pseudomonas aeruginosa Core and Accessory Genes Reveals Transcriptional Patterns across Strains PAO1 and PA14

et al. 2023

View full text Add to dashboard Cite

show abstract

Computationally efficient assembly of a Pseudomonas aeruginosa gene expression compendium

Doing

Lee

Neff

et al. 2022

Preprint

View full text Add to dashboard Cite

Over the past two decades, thousands of RNA sequencing (RNA-seq) gene expression profiles of Pseudomonas aeruginosa have been made publicly available via the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA). In the work we present here, we draw on over 2,300 P. aeruginosa transcriptomes from hundreds of studies performed by over seventy-five different research groups. We first developed a pipeline, using the Salmon pseudo-aligner and two different P. aeruginosa reference genomes (strains PAO1 and PA14), that transformed raw sequence data into a uniformly processed data in the form of sample-wise normalized counts. In this workflow, P. aeruginosa RNA-seq data are filtered using technically and biologically driven criteria with characteristics tailored to bacterial gene expression and that account for the effects of alignment to different reference genomes. The filtered data are then normalized to enable cross experiment comparisons. Finally, annotations are programmatically collected for those samples with sufficient meta-data and expression-based metrics are used to further enhance strain assignment for each sample. Our processing and quality control methods provide a scalable framework for taking full advantage of the troves of biological information hibernating in the depths of microbial gene expression data. The re-analysis of these data in aggregate is a powerful approach for hypothesis generation and testing, and this approach can be applied to transcriptome datasets in other species.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Samuel L. Neff

GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses

Computationally Efficient Assembly of Pseudomonas aeruginosa Gene Expression Compendia

ESKAPE Act Plus: Pathway Activation Analysis for Bacterial Pathogens

Compendium-Wide Analysis of Pseudomonas aeruginosa Core and Accessory Genes Reveals Transcriptional Patterns across Strains PAO1 and PA14

Computationally efficient assembly of a Pseudomonas aeruginosa gene expression compendium

Contact Info

Product

Resources

About