Mass spectrometry is a fundamental tool for discovery and analysis in the life sciences. With the rapid advances in mass spectrometry technology and methods, it has become imperative to provide a standard output format for mass spectrometry data that will facilitate data sharing and analysis. Initially, the efforts to develop a standard format for mass spectrometry data resulted in multiple formats, each designed with a different underlying philosophy. To resolve the issues associated with having multiple formats, vendors, researchers, and software developers convened under the banner of the HUPO PSI to develop a single standard. The new data format incorporated many of the desirable technical attributes from the previous data formats, while adding a number of improvements, including features such as a controlled vocabulary with validation tools to ensure consistent usage of the format, improved support for selected reaction monitoring data, and immediately available implementations to facilitate rapid adoption by the community. The resulting standard data format, mzML, is a well tested open-source format for mass spectrometer output files that can be readily utilized by the community and easily adapted for incremental advances in mass spectrometry technology.
High-throughput omics data often contain systematic biases introduced during various steps of sample processing and data generation. As the source of these biases is usually unknown, it is difficult to select an optimal normalization method for a given data set. To facilitate this process, we introduce the open-source tool “Normalyzer”. It normalizes the data with 12 different normalization methods and generates a report with several quantitative and qualitative plots for comparative evaluation of different methods. The usefulness of Normalyzer is demonstrated with three different case studies from quantitative proteomics and transcriptomics. The results from these case studies show that the choice of normalization method strongly influences the outcome of downstream quantitative comparisons. Normalyzer is an R package and can be used locally or through the online implementation at .
Technical biases are introduced in omics data sets during data generation and interfere with the ability to study biological mechanisms. Several normalization approaches have been proposed to minimize the effects of such biases, but fluctuations in the electrospray current during liquid chromatography–mass spectrometry gradients cause local and sample-specific bias not considered by most approaches. Here we introduce a software named NormalyzerDE that includes a generic retention time (RT)-segmented approach compatible with a wide range of global normalization approaches to reduce the effects of time-resolved bias. The software offers straightforward access to multiple normalization methods, allows for data set evaluation and normalization quality assessment as well as subsequent or independent differential expression analysis using the empirical Bayes Limma approach. When evaluated on two spike-in data sets the RT-segmented approaches outperformed conventional approaches by detecting more peptides (8–36%) without loss of precision. Furthermore, differential expression analysis using the Limma approach consistently increased recall (2–35%) compared to analysis of variance. The combination of RT-normalization and Limma was in one case able to distinguish 108% (2597 vs 1249) more spike-in peptides compared to traditional approaches. NormalyzerDE provides widely usable tools for performing normalization and evaluating the outcome and makes calculation of subsequent differential expression statistics straightforward. The program is available as a web server at .
BackgroundIn order to get global molecular understanding of one of the most important crop diseases worldwide, we investigated compatible and incompatible interactions between Phytophthora infestans and potato (Solanum tuberosum). We used the two most field-resistant potato clones under Swedish growing conditions, which have the greatest known local diversity of P. infestans populations, and a reference compatible cultivar.ResultsQuantitative label-free proteomics of 51 apoplastic secretome samples (PXD000435) in combination with genome-wide transcript analysis by 42 microarrays (E-MTAB-1515) were used to capture changes in protein abundance and gene expression at 6, 24 and 72 hours after inoculation with P. infestans. To aid mass spectrometry analysis we generated cultivar-specific RNA-seq data (E-MTAB-1712), which increased peptide identifications by 17%. Components induced only during incompatible interactions, which are candidates for hypersensitive response initiation, include a Kunitz-like protease inhibitor, transcription factors and an RCR3-like protein. More secreted proteins had lower abundance in the compatible interaction compared to the incompatible interactions. Based on this observation and because the well-characterized effector-target C14 protease follows this pattern, we suggest 40 putative effector targets.ConclusionsIn summary, over 17000 transcripts and 1000 secreted proteins changed in abundance in at least one time point, illustrating the dynamics of plant responses to a hemibiotroph. Half of the differentially abundant proteins showed a corresponding change at the transcript level. Many putative hypersensitive and effector-target proteins were single representatives of large gene families.Electronic supplementary materialThe online version of this article (doi:10.1186/1471-2164-15-497) contains supplementary material, which is available to authorized users.
Supplementary data are available at Bioinformatics online.
Proteomic technologies, such as yeast twohybrid, mass spectrometry (MS), protein/peptide arrays and fluorescence microscopy, yield multi-dimensional data sets, which are often quite large and either not published or published as supplementary information that is not easily searchable. Without a system in place for standardizing and sharing data, it is not fruitful for the biomedical community to contribute these types of data to centralized repositories. Even more difficult is the annotation and display of pertinent information in the context of the corresponding proteins. Wikipedia, an online encyclopedia that anyone can edit, has already proven quite successful1 and can be used as a model for sharing biological data. However, the need for experimental evidence, data standardization and ownership of data creates scientific obstacles. Here, we describe Human Proteinpedia (http://www.humanproteinpedia.org/) as a portal that overcomes many of these obstacles to provide an integrated view of the human proteome. Human Proteinpedia also allows users to contribute and edit proteomic data with two significant differences from Wikipedia: first, the contributor is expected to provide experimental evidence for the data annotated; and second, only the original contributor can edit their data. Human Proteinpedia's annotation system provides investigators with multiple options for contributing data including web forms and annotation servers. Although registration is required to contribute data, anyone can freely access the data in the repository. The web forms simplify submission through the use of pull-down menus for certain data fields and pop-up menus for standardized vocabulary terms. Distributed annotation servers using modified protein DAS (distributed annotation system) protocols developed by us (DAS protocols were originally developed for sharing mRNA and DNA data) permit contributing laboratories to maintain protein annotations locally. All protein annotations are visualized in the context of corresponding proteins in the Human Protein Reference Database (HPRD)3. Figure 1 shows tissue expression data for alpha-2-HS glycoprotein derived from three different types of experiments. Our unique effort differs significantly from existing repositories, such as PeptideAtlas and PRIDE5 in several respects. First, most proteomic repositories are restricted to one or two experimental platforms, whereas Human Proteinpedia can accommodate data from diverse platforms, including yeast two-hybrid screens, MS, peptide/protein arrays, immunohistochemistry, western blots, coimmunoprecipitation and fluorescence microscopy-type experiments. Second, Human Proteinpedia allows contributing laboratories to annotate data pertaining to six features of proteins (posttranslational modifications, tissue expression, cell line expression, subcellular localization, enzyme substrates and protein-protein interactions;). No existing repository currently permits annotation of all these features in proteins. Third, all data submitted to Human Proteinpedia...
It is possible that the low levels of production of exopolysaccharides (EPSs) by lactic acid bacteria could be improved by altering the levels of enzymes in the central metabolism that influence the production of precursor nucleotide sugars. To test this hypothesis, we identified and cloned the galU gene, which codes for UDP glucose pyrophosphorylase (GalU) in Streptococcus thermophilus LY03. Homologous overexpression of the gene led to a 10-fold increase in GalU activity but did not have any effect on the EPS yield when lactose was the carbon source. However, when galU was overexpressed in combination with pgmA, which encodes phosphoglucomutase (PGM), the EPS yield increased from 0.17 to 0.31 g/mol of carbon from lactose. A galactose-fermenting LY03 mutant (Gal ؉ ) with increased activities of the Leloir enzymes was also found to have a higher EPS yield (0.24 g/mol of carbon) than the parent strain. The EPS yield was further improved to 0.27 g/mol of carbon by overexpressing galU in this strain. However, the highest EPS yield, 0.36 g/mol of carbon, was obtained when pgmA was knocked out in the Gal ؉ strain. Measurements of the levels of intracellular metabolites in the cultures revealed that the Gal ؉ strains had considerably higher glucose 1-phosphate levels than the other strains, and the strain lacking PGM activity had threefold-higher levels of glucose 1-phosphate than the other Gal ؉ strains. These results show that it is possible to increase EPS production by altering the levels of enzymes in the central carbohydrate metabolism.Exopolysaccharides (EPSs) produced by lactic acid bacteria are important for the texture of many fermented milk products. There is great diversity in the composition and structure of these EPSs, which results in different properties (for a review, see reference 12). A variety of EPSs with different repeating units and chain lengths have been found in Streptococcus thermophilus, and diversity is also found in the increasing number of eps operons that are being sequenced (1, 7, 32). The operons consist of a more conserved region with genes that are thought to be involved in regulation, chain length determination, and export, which is followed by a variable region with genes coding for glycosyltransferases and enzymes involved in polymerization and export of the EPSs. The information being accumulated could form the basis for the construction of new tailor-made polysaccharides by genetic engineering (35). However, before the variety of EPSs from lactic acid bacteria can be fully exploited, ways of enhancing the low levels of production must be found. For example, the amount of EPS produced by S. thermophilus normally is less than 1% of the amount of the carbohydrate source (12). EPSs are synthesized from activated nucleotide sugars, and a potential bottleneck in EPS production could be the low levels of these precursors. In S. thermophilus, which possesses the Leloir pathway, UDP glucose (UDPglc) and UDP galactose (UDPgal) can be formed from either the glucose or the galactose moiety ...
In bottom-up mass spectrometry (MS)-based proteomics, peptide isotopic and chromatographic traces (features) are frequently used for label-free quantification in data-dependent acquisition MS but can also be used for the improved identification of chimeric spectra or sample complexity characterization. Feature detection is difficult because of the high complexity of MS proteomics data from biological samples, which frequently causes features to intermingle. In addition, existing feature detection algorithms commonly suffer from compatibility issues, long computation times, or poor performance on high-resolution data. Because of these limitations, we developed a new tool, Dinosaur, with increased speed and versatility. Dinosaur has the functionality to sample algorithm computations through quality-control plots, which we call a plot trail. From the evaluation of this plot trail, we introduce several algorithmic improvements to further improve the robustness and performance of Dinosaur, with the detection of features for 98% of MS/MS identifications in a benchmark data set, and no other algorithm tested in this study passed 96% feature detection. We finally used Dinosaur to reimplement a published workflow for peptide identification in chimeric spectra, increasing chimeric identification from 26% to 32% over the standard workflow. Dinosaur is operating-system-independent and is freely available as open source on .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.