Angel Pizarro scite author profile

Mass spectrometry is a fundamental tool for discovery and analysis in the life sciences. With the rapid advances in mass spectrometry technology and methods, it has become imperative to provide a standard output format for mass spectrometry data that will facilitate data sharing and analysis. Initially, the efforts to develop a standard format for mass spectrometry data resulted in multiple formats, each designed with a different underlying philosophy. To resolve the issues associated with having multiple formats, vendors, researchers, and software developers convened under the banner of the HUPO PSI to develop a single standard. The new data format incorporated many of the desirable technical attributes from the previous data formats, while adding a number of improvements, including features such as a controlled vocabulary with validation tools to ensure consistent usage of the format, improved support for selected reaction monitoring data, and immediately available implementations to facilitate rapid adoption by the community. The resulting standard data format, mzML, is a well tested open-source format for mass spectrometer output files that can be readily utilized by the community and easily adapted for incremental advances in mass spectrometry technology.

show abstract

Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM)

Grant

et al. 2011

View full text Add to dashboard Cite

show abstract

CircaDB: a database of mammalian circadian gene expression profiles

et al. 2012

View full text Add to dashboard Cite

CircaDB (http://circadb.org) is a new database of circadian transcriptional profiles from time course expression experiments from mice and humans. Each transcript’s expression was evaluated by three separate algorithms, JTK_Cycle, Lomb Scargle and DeLichtenberg. Users can query the gene annotations using simple and powerful full text search terms, restrict results to specific data sets and provide probability thresholds for each algorithm. Visualizations of the data are intuitive charts that convey profile information more effectively than a table of probabilities. The CircaDB web application is open source and available at http://github.com/itmat/circadb.

show abstract

The mzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results

Jones¹,

Eisenacher²,

Mayer³

et al. 2012

Molecular & Cellular Proteomics

182

172

View full text Add to dashboard Cite

We report the release of mzIdentML, an exchange standard for peptide and protein identification data, designed by the Proteomics Standards Initiative. The format was developed by the Proteomics Standards Initiative in collaboration with instrument and software vendors, and the developers of the major open-source projects in proteomics. Software implementations have been developed to enable conversion from most popular proprietary and open-source formats, and mzIdentML will soon be supported by the major public repositories. These developments enable proteomics scientists to start working with the standard for exchanging and publishing data sets in support of publications and they provide a stable platform for bioinformatics groups and commercial software vendors to work with a single file format for identification data.

show abstract

Design and implementation of microarray gene expression markup language (MAGE-ML)

et al. 2002

View full text Add to dashboard Cite

show abstract

GA4GH: International policies and standards for data sharing across genomic research and healthcare

Rehm¹,

Page²,

Smith³

et al. 2021

Cell Genomics

121

View full text Add to dashboard Cite

Analysis of the Zebrafish Proteome during Embryonic Development

Lucitt

Price

Pizarro

et al. 2008

Molecular & Cellular Proteomics

116

View full text Add to dashboard Cite

The model organism zebrafish (Danio rerio) is particularly amenable to studies deciphering regulatory genetic networks in vertebrate development, biology, and pharmacology. Unraveling the functional dynamics of such networks requires precise quantitation of protein expression during organismal growth, which is incrementally challenging with progressive complexity of the systems. In an approach toward such quantitative studies of dynamic network behavior, we applied mass spectrometric methodology and rigorous statistical analysis to create comprehensive, high quality profiles of proteins expressed at two stages of zebrafish development. Proteins of embryos 72 and 120 h postfertilization (hpf) were isolated and analyzed both by two-dimensional (2D) LC followed by ESI-MS/MS and by 2D PAGE followed by MALDI-TOF/TOF protein identification. We detected 1384 proteins from 327,906 peptide sequence identifications at 72 and 120 hpf with false identification rates of less than 1% using 2D LC-ESI-MS/MS. These included only ϳ30% of proteins that were identified by 2D PAGE-MALDI-TOF/TOF. Roughly 10% of all detected proteins were derived from hypothetical or predicted gene models or were entirely unannotated. Comparison of proteins expression by 2D DIGE revealed that proteins involved in energy production and transcription/translation were relatively more abundant at 72 hpf consistent with faster synthesis of cellular proteins during organismal growth at this time compared with 120 hpf. The data are accessible in a database that links protein identifications to existing resources including the Zebrafish Information Network database. This new resource should facilitate the selection of candidate proteins for targeted quantitation and refine systematic genetic network analysis in vertebrate development and biology. Molecular & Cellular Proteomics 7:981-994, 2008.

show abstract

IVT-seq reveals extreme bias in RNA sequencing

et al. 2014

View full text Add to dashboard Cite

BackgroundRNA-seq is a powerful technique for identifying and quantifying transcription and splicing events, both known and novel. However, given its recent development and the proliferation of library construction methods, understanding the bias it introduces is incomplete but critical to realizing its value.ResultsWe present a method, in vitro transcription sequencing (IVT-seq), for identifying and assessing the technical biases in RNA-seq library generation and sequencing at scale. We created a pool of over 1,000 in vitro transcribed RNAs from a full-length human cDNA library and sequenced them with polyA and total RNA-seq, the most common protocols. Because each cDNA is full length, and we show in vitro transcription is incredibly processive, each base in each transcript should be equivalently represented. However, with common RNA-seq applications and platforms, we find 50% of transcripts have more than two-fold and 10% have more than 10-fold differences in within-transcript sequence coverage. We also find greater than 6% of transcripts have regions of dramatically unpredictable sequencing coverage between samples, confounding accurate determination of their expression. We use a combination of experimental and computational approaches to show rRNA depletion is responsible for the most significant variability in coverage, and several sequence determinants also strongly influence representation.ConclusionsThese results show the utility of IVT-seq for promoting better understanding of bias introduced by RNA-seq. We find rRNA depletion is responsible for substantial, unappreciated biases in coverage introduced during library preparation. These biases suggest exon-level expression analysis may be inadvisable, and we recommend caution when interpreting RNA-seq results.

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Angel Pizarro

mzML—a Community Standard for Mass Spectrometry Data

Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM)

CircaDB: a database of mammalian circadian gene expression profiles

The mzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results

Design and implementation of microarray gene expression markup language (MAGE-ML)

GA4GH: International policies and standards for data sharing across genomic research and healthcare

Analysis of the Zebrafish Proteome during Embryonic Development

IVT-seq reveals extreme bias in RNA sequencing

Contact Info

Product

Resources

About