2014
DOI: 10.1074/mcp.o114.037879
|View full text |Cite
|
Sign up to set email alerts
|

Numerical Compression Schemes for Proteomics Mass Spectrometry Data

Abstract: The open XML format mzML, used for representation of MS data, is pivotal for the development of platform-independent MS analysis software. Although conversion from vendor formats to mzML must take place on a platform on which the vendor libraries are available (i.e. Windows), once mzML files have been generated, they can be used on any platform. However, the mzML format has turned out to be less efficient than vendor formats. In many cases, the naïve mzML representation is fourfold or even up to 18-fold larger… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
69
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 59 publications
(70 citation statements)
references
References 23 publications
0
69
0
Order By: Relevance
“…A custom data access library was developed for BatMass to fulfill the speed requirements (parsing speed is comparable to the C++ implementation from OpenMS) and to automate memory management. It provides a rich API for accessing scan meta-data and spectra, including support for MS-Numpress compression 23 in mzML files. As the API is separated from the implementation, it is possible to add support for other file formats as well.…”
Section: Methodsmentioning
confidence: 99%
“…A custom data access library was developed for BatMass to fulfill the speed requirements (parsing speed is comparable to the C++ implementation from OpenMS) and to automate memory management. It provides a rich API for accessing scan meta-data and spectra, including support for MS-Numpress compression 23 in mzML files. As the API is separated from the implementation, it is possible to add support for other file formats as well.…”
Section: Methodsmentioning
confidence: 99%
“…In order to maintain the positive gains of the adopted format (in the form of desirable features such as universality and human readability), the developers of mzML compliant tools have had to engineer increasingly complex adjunct software in order to keep the resulting systems performant. Examples of this include the storage of image data in adjunct files external to the main mzML file in imzML (Römpp et al, ), the generation of external binary files (“.cachedMzML” files) by OpenMS to support efficient access to SWATH data stored in mzML (Röst et al, ), and the incorporation of the MS‐Numpress (Teleman et al, ) compression schemes into ProteoWizard (Chambers et al, ) to mitigate the up to 18‐fold inflation in mzML file size as compared to the original vendor format (Teleman et al, ). These software engineering fixes are all symptomatic of the technical drift arising from the choice of a text‐based format for the storage of increasingly large volumes of mass spectrometry data.…”
Section: Scaling Beyond Human Readabilitymentioning
confidence: 99%
“…There are several examples of applications in which chemometric tools are used to discriminate between samples and to identify potential biomarkers in biomedical, environmental, or food fields. However, there are extreme cases in which the dimensionality of the data generated makes the direct application of these methods impractical owing to the high computational requirements . An example of this limitation lies in the metabolomic analysis of data obtained by chromatographic techniques coupled with a high‐resolution mass spectrometric detector.…”
Section: Introductionmentioning
confidence: 99%
“…However, there are extreme cases in which the dimensionality of the data generated makes the direct application of these methods impractical owing to the high computational requirements. 9 An example of this limitation lies in the metabolomic analysis of data obtained by chromatographic techniques coupled with a high-resolution mass spectrometric detector. In this case, each sample generates a data matrix in which each retention time (each row) contains a complete mass spectrum formed by several thousands of m/z values (columns).…”
mentioning
confidence: 99%