Rick Stevens scite author profile

In 2004, the SEED (http://pubseed.theseed.org/) was created to provide consistent and accurate genome annotations across thousands of genomes and as a platform for discovering and developing de novo annotations. The SEED is a constantly updated integration of genomic data with a genome database, web front end, API and server scripts. It is used by many scientists for predicting gene functions and discovering new pathways. In addition to being a powerful database for bioinformatics research, the SEED also houses subsystems (collections of functionally related protein families) and their derived FIGfams (protein families), which represent the core of the RAST annotation engine (http://rast.nmpdr.org/). When a new genome is submitted to RAST, genes are called and their annotations are made by comparison to the FIGfam collection. If the genome is made public, it is then housed within the SEED and its proteins populate the FIGfam collection. This annotation cycle has proven to be a robust and scalable solution to the problem of annotating the exponentially increasing number of genomes. To date, >12 000 users worldwide have annotated >60 000 distinct genomes using RAST. Here we describe the interconnectedness of the SEED database and RAST, the RAST annotation pipeline and updates to both resources.

show abstract

The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes

Meyer

et al. 2008

View full text Add to dashboard Cite

show abstract

A communal catalogue reveals Earth’s multiscale microbial diversity

Thompson¹,

Sanders²,

McDonald³

et al. 2017

Nature

1,902

1,615

View full text Add to dashboard Cite

A primary aim of microbial ecology is to determine patterns and drivers of community distribution, interaction, and assembly amidst complexity and uncertainty. Microbial community composition has been shown to change across gradients of environment, geographic distance, salinity, temperature, oxygen, nutrients, pH, day length, and biotic factors 1-6 . These patterns have been identified mostly by focusing on one sample type and region at a time, with insights extra polated across environments and geography to produce generalized principles. To assess how microbes are distributed across environments globally-or whether microbial community dynamics follow funda mental ecological 'laws' at a planetary scale-requires either a massive monolithic cross environment survey or a practical methodology for coordinating many independent surveys. New studies of microbial environments are rapidly accumulating; however, our ability to extract meaningful information from across datasets is outstripped by the rate of data generation. Previous meta analyses have suggested robust gen eral trends in community composition, including the importance of salinity 1 and animal association 2 . These findings, although derived from relatively small and uncontrolled sample sets, support the util ity of meta analysis to reveal basic patterns of microbial diversity and suggest that a scalable and accessible analytical framework is needed.The Earth Microbiome Project (EMP, http://www.earthmicrobiome. org) was founded in 2010 to sample the Earth's microbial communities at an unprecedented scale in order to advance our understanding of the organizing biogeographic principles that govern microbial commu nity structure 7,8 . We recognized that open and collaborative science, including scientific crowdsourcing and standardized methods 8 , would help to reduce technical variation among individual studies, which can overwhelm biological variation and make general trends difficult to detect 9 . Comprising around 100 studies, over half of which have yielded peer reviewed publications (Supplementary Table 1), the EMP has now dwarfed by 100 fold the sampling and sequencing depth of earlier meta analysis efforts 1,2 ; concurrently, powerful analysis tools have been developed, opening a new and larger window into the distri bution of microbial diversity on Earth. In establishing a scalable frame work to catalogue microbiota globally, we provide both a resource for the exploration of myriad questions and a starting point for the guided acquisition of new data to answer them. As an example of using this Our growing awareness of the microbial world's importance and diversity contrasts starkly with our limited understanding of its fundamental structure. Despite recent advances in DNA sequencing, a lack of standardized protocols and common analytical frameworks impedes comparisons among studies, hindering the development of global inferences about microbial life on Earth. Here we present a meta-analysis of microbial community samples collected by hundreds of r...

show abstract

The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes

Overbeek¹,

Begley

Butler

et al. 2005

Nucleic Acids Research

1,742

1,613

View full text Add to dashboard Cite

The release of the 1000th complete microbial genome will occur in the next two to three years. In anticipation of this milestone, the Fellowship for Interpretation of Genomes (FIG) launched the Project to Annotate 1000 Genomes. The project is built around the principle that the key to improved accuracy in high-throughput annotation technology is to have experts annotate single subsystems over the complete collection of genomes, rather than having an annotation expert attempt to annotate all of the genes in a single genome. Using the subsystems approach, all of the genes implementing the subsystem are analyzed by an expert in that subsystem. An annotation environment was created where populated subsystems are curated and projected to new genomes. A portable notion of a populated subsystem was defined, and tools developed for exchanging and curating these objects. Tools were also developed to resolve conflicts between populated subsystems. The SEED is the first annotation environment that supports this model of annotation. Here, we describe the subsystem approach, and offer the first release of our growing library of populated subsystems. The initial release of data includes 180 177 distinct proteins with 2133 distinct functional roles. This data comes from 173 subsystems and 383 different organisms.

show abstract

RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes

Brettin

Davis

Disz³

et al. 2015

Sci Rep

2,014

1,404

View full text Add to dashboard Cite

The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.

show abstract

Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center

et al. 2016

View full text Add to dashboard Cite

The Pathosystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center (https://www.patricbrc.org). Recent changes to PATRIC include a redesign of the web interface and some new services that provide users with a platform that takes them from raw reads to an integrated analysis experience. The redesigned interface allows researchers direct access to tools and data, and the emphasis has changed to user-created genome-groups, with detailed summaries and views of the data that researchers have selected. Perhaps the biggest change has been the enhanced capability for researchers to analyze their private data and compare it to the available public data. Researchers can assemble their raw sequence reads and annotate the contigs using RASTtk. PATRIC also provides services for RNA-Seq, variation, model reconstruction and differential expression analysis, all delivered through an updated private workspace. Private data can be compared by ‘virtual integration’ to any of PATRIC's public data. The number of genomes available for comparison in PATRIC has expanded to over 80 000, with a special emphasis on genomes with antimicrobial resistance data. PATRIC uses this data to improve both subsystem annotation and k-mer classification, and tags new genomes as having signatures that indicate susceptibility or resistance to specific antibiotics.

show abstract

PATRIC, the bacterial bioinformatics database and analysis resource

Wattam¹,

Abraham²,

Dalay³

et al. 2013

Nucl. Acids Res.

1,136

962

View full text Add to dashboard Cite

The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) (http://www.patricbrc.org). A joint effort by two of the original National Institute of Allergy and Infectious Diseases-funded BRCs, PATRIC provides researchers with an online resource that stores and integrates a variety of data types [e.g. genomics, transcriptomics, protein–protein interactions (PPIs), three-dimensional protein structures and sequence typing data] and associated metadata. Datatypes are summarized for individual genomes and across taxonomic levels. All genomes in PATRIC, currently more than 10 000, are consistently annotated using RAST, the Rapid Annotations using Subsystems Technology. Summaries of different data types are also provided for individual genes, where comparisons of different annotations are available, and also include available transcriptomic data. PATRIC provides a variety of ways for researchers to find data of interest and a private workspace where they can store both genomic and gene associations, and their own private data. Both private and public data can be analyzed together using a suite of tools to perform comparative genomic or transcriptomic analysis. PATRIC also includes integrated information related to disease and PPIs. All the data and integrated analysis and visualization tools are freely available. This manuscript describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Rick Stevens

The RAST Server: Rapid Annotations using Subsystems Technology

The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST)

The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes

A communal catalogue reveals Earth’s multiscale microbial diversity

The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes

RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes

Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center

PATRIC, the bacterial bioinformatics database and analysis resource

Contact Info

Product

Resources

About