The MSstats R-Bioconductor family of packages
is widely used for statistical analyses of quantitative bottom-up
mass spectrometry-based proteomic experiments to detect differentially
abundant proteins. It is applicable to a variety of experimental designs
and data acquisition strategies and is compatible with many data processing
tools used to identify and quantify spectral features. In the face
of ever-increasing complexities of experiments and data processing
strategies, the core package of the family, with the same name MSstats, has undergone a series of substantial updates.
Its new version MSstats v4.0 improves the usability,
versatility, and accuracy of statistical methodology, and the usage
of computational resources. New converters integrate the output of
upstream processing tools directly with MSstats,
requiring less manual work by the user. The package’s statistical
models have been updated to a more robust workflow. Finally, MSstats’ code has been substantially refactored to
improve memory use and computation speed. Here we detail these updates,
highlighting methodological differences between the new and old versions.
An empirical comparison of MSstats v4.0 to its previous
implementations, as well as to the packages MSqRob and DEqMS, on controlled mixtures and biological
experiments demonstrated a stronger performance and better usability
of MSstats v4.0 as compared to existing methods.
Liquid chromatography coupled with bottom-up mass spectrometry (LC-MS/ MS)-based proteomics is a versatile technology for identifying and quantifying proteins in complex biological mixtures. Postidentification, analysis of changes in protein abundances between conditions requires increasingly complex and specialized statistical methods. Many of these methods, in particular the family of open-source Bioconductor packages MSstats, are implemented in a coding language such as R. To make the methods in MSstats accessible to users with limited programming and statistical background, we have created MSstatsShiny, an R-Shiny graphical user interface (GUI) integrated with MSstats, MSstatsTMT, and MSstatsPTM. The GUI provides a point and click analysis pipeline applicable to a wide variety of proteomics experimental types, including label-free data-dependent acquisitions (DDAs) or data-independent acquisitions (DIAs), or tandem mass tag (TMT)-based TMT-DDAs, answering questions such as relative changes in the abundance of peptides, proteins, or post-translational modifications (PTMs). To support reproducible research, the application saves user's selections and builds an R script that programmatically recreates the analysis. MSstatsShiny can be installed locally via Github and Bioconductor, or utilized on the cloud at www.msstatsshiny.com. We illustrate the utility of the platform using two experimental data sets (MassIVE IDs MSV000086623 and MSV000085565).
In the version of this article initially published, the citation at the end of the Fig. 6b caption was incorrect. The correct reference -Murtagh, F., Legendre, P. Ward's hierarchical agglomerative clustering method: which algorithms implement Ward's criterion? J. Classif. 31, 274-295 (2014)-has been added and the reference list renumbered. In the second-to-last paragraph of the Introduction, the callout to Fig. 6e originally cited Fig. 7a and has been updated. The text changes have been made to the HTML and PDF versions of the article. Further, the numbered citations to Extended Data figures in the Supplementary Information were incorrect and have been replaced in a revised file online.
Liquid chromatography coupled with bottom up mass spectrometry (LC-MS/MS)-based proteomics is increasingly used to detect changes in post-translational modifications (PTMs) in samples from different conditions. Analysis of data from such experiments faces numerous statistical challenges. These include the low abundance of modified proteoforms, the small number of observed peptides that span modification sites, and confounding between changes in the abundance of PTM and the overall changes in the protein abundance. Therefore, statistical approaches for detecting differential PTM abundance must integrate all the available information pertaining to a PTM site, and consider all the relevant sources of confounding and variation. In this manuscript we propose such a statistical framework, which is versatile, accurate, and leads to reproducible results. The framework requires an experimental design, which quantifies, for each sample, both peptides with post-translational modifications and peptides from the same proteins with no modification sites. The proposed framework supports both label-free and tandem mass tag (TMT)-based LC-MS/MS acquisitions. The statistical methodology separately summarizes the abundances of peptides with and without the modification sites, by fitting separate linear mixed effects models appropriate for the experimental design. Next, model-based inferences regarding the PTM and the protein-level abundances are combined to account for the confounding between these two sources. Evaluations on computer simulations, a spike-in experiment with known ground truth, and three biological experiments with different organisms, modification types and data acquisition types demonstrate the improved fold change estimation and detection of differential PTM abundance, as compared to currently used approaches. The proposed framework is implemented in the free and open-source R/Bioconductor package MSstatsPTM.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.