BackgroundHigh-throughput mass spectrometry (MS) proteomics data is increasingly being used to complement traditional structural genome annotation methods. To keep pace with the high speed of experimental data generation and to aid in structural genome annotation, experimentally observed peptides need to be mapped back to their source genome location quickly and exactly. Previously, the tools to do this have been limited to custom scripts designed by individual research groups to analyze their own data, are generally not widely available, and do not scale well with large eukaryotic genomes.ResultsThe Proteogenomic Mapping Tool includes a Java implementation of the Aho-Corasick string searching algorithm which takes as input standardized file types and rapidly searches experimentally observed peptides against a given genome translated in all 6 reading frames for exact matches. The Java implementation allows the application to scale well with larger eukaryotic genomes while providing cross-platform functionality.ConclusionsThe Proteogenomic Mapping Tool provides a standalone application for mapping peptides back to their source genome on a number of operating system platforms with standard desktop computer hardware and executes very rapidly for a variety of datasets. Allowing the selection of different genetic codes for different organisms allows researchers to easily customize the tool to their own research interests and is recommended for anyone working to structurally annotate genomes using MS derived proteomics data.
Clostridium carboxidivorans strain P7T is a strictly anaerobic acetogenic bacterium that produces acetate, ethanol, butanol, and butyrate. The C. carboxidivorans genome contains all the genes for the carbonyl branch of the Wood-Ljungdahl pathway for CO 2 fixation, and it encodes enzymes for conversion of acetyl coenzyme A into butanol and butyrate.
Clostridium carboxidivorans strain P7T (equivalent to ATCC BAA-624T and DSM 15243 T ) is an obligate anaerobe that can grow autotrophically with H 2 and CO 2 or CO (fixing carbon via the Wood-Ljungdahl pathway), or it can grow chemoorganotrophically with simple sugars (1). Acetate, ethanol, butanol, and butyrate are end products of metabolism.For slow-growing strict anaerobes such as Clostridium carboxidivorans, genome sequencing provides a rapid theoretical characterization of its metabolism compared to traditional methods. We isolated and amplified genomic C. carboxidivorans DNA using the Wizard genomic DNA purification kit (Promega, Madison, WI) and the REPLI-g kit (Qiagen). A single shotgun pyrosequencing run using a Genome Sequencer FLX system (454 Life Sciences, Branford, CT) resulted in 429,680 high-quality reads (mean read length, 231.6 bp) that were assembled using Newbler software (454 Life Sciences) into 225 contigs Ͼ500 bp long. Paired-end sequencing produced 111,154 reads (mean read length, 256.3 bp). Assembly of the paired-end and shotgun reads produced 73 scaffolds containing 216 large contigs with a mean sequence depth of 16.33 reads. PCR amplification and Sanger sequencing were conducted, followed by scaffold assembly using Sequencher
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.