Many Microbe Microarrays Database (M3D) is designed to facilitate the analysis and visualization of expression data in compendia compiled from multiple laboratories. M3D contains over a thousand Affymetrix microarrays for Escherichia coli, Saccharomyces cerevisiae and Shewanella oneidensis. The expression data is uniformly normalized to make the data generated by different laboratories and researchers more comparable. To facilitate computational analyses, M3D provides raw data (CEL file) and normalized data downloads of each compendium. In addition, web-based construction, visualization and download of custom datasets are provided to facilitate efficient interrogation of the compendium for more focused analyses. The experimental condition metadata in M3D is human curated with each chemical and growth attribute stored as a structured and computable set of experimental features with consistent naming conventions and units. All versions of the normalized compendia constructed for each species are maintained and accessible in perpetuity to facilitate the future interpretation and comparison of results published on M3D data. M3D is accessible at http://m3d.bu.edu/.
Protein biomarker discovery produces lengthy lists of candidates that must subsequently be verified in blood or other accessible biofluids. Use of targeted mass spectrometry (MS) to verify disease-or therapy-related changes in protein levels requires the selection of peptides that are quantifiable surrogates for proteins of interest. Peptides that produce the highest ion-current response (highresponding peptides) are likely to provide the best detection sensitivity. Identification of the most effective signature peptides, particularly in the absence of experimental data, remains a major resource constraint in developing targeted MS-based assays. Here we describe a computational method that uses protein physicochemical properties to select high-responding peptides and demonstrate its utility in identifying signature peptides in plasma, a complex proteome with a wide range of protein concentrations. Our method, which employs a Random Forest classifier, facilitates the development of targeted MS-based assays for biomarker verification or any application where protein levels need to be measured.Proteomic discovery experiments in case-and-control comparisons of tissue or proximal fluids frequently generate lists comprising many tens to hundreds of candidate biomarkers 1 . Integrative genomic approaches incorporating microarray data and literature mining are also increasingly being used to guide identification of candidate protein biomarkers. To further credential biomarker candidates and move them toward possible clinical implementation, it is necessary to determine which of the proteins from lists of candidates differentially abundant in diseased versus healthy patients can be detected in body fluids, such as blood, that can be assayed with minimal invasiveness 1 .This process, termed verification, has historically been approached using antibodies. Highquality, well-characterized collections of antibodies suitable for protein detection in tissue are now being developed 2 . But unfortunately, the required immunoassay-grade antibody pairs necessary for sensitive and specific detection in blood exist for only a tiny percentage of the proteome. Thus, for the majority of proteins, suitable reagents for their detection and quantification in blood (or other biofluids) do not yet exist and alternative technologies are needed to bridge the gap between discovery and clinical-assay development. This problem is NIH Public Access Author ManuscriptNat Biotechnol. Author manuscript; available in PMC 2009 September 28. Published in final edited form as:Nat Biotechnol. 2009 February ; 27(2): 190-198. doi:10.1038/nbt.1524. NIH-PA Author ManuscriptNIH-PA Author Manuscript NIH-PA Author Manuscriptan important aspect of the larger need in biology and medicine for quantitative methods to measure the presence and abundance of any protein of interest.Targeted MS is emerging as an assay technology capable of selective and sensitive detection and quantification of potentially any protein of interest (or modification thereof) in the ...
The Autism Diagnostic Observation Schedule-Generic (ADOS) is one of the most widely used instruments for behavioral evaluation of autism spectrum disorders. It is composed of four modules, each tailored for a specific group of individuals based on their language and developmental level. On average, a module takes between 30 and 60 min to deliver. We used a series of machine-learning algorithms to study the complete set of scores from Module 1 of the ADOS available at the Autism Genetic Resource Exchange (AGRE) for 612 individuals with a classification of autism and 15 non-spectrum individuals from both AGRE and the Boston Autism Consortium (AC). Our analysis indicated that 8 of the 29 items contained in Module 1 of the ADOS were sufficient to classify autism with 100% accuracy. We further validated the accuracy of this eight-item classifier against complete sets of scores from two independent sources, a collection of 110 individuals with autism from AC and a collection of 336 individuals with autism from the Simons Foundation. In both cases, our classifier performed with nearly 100% sensitivity, correctly classifying all but two of the individuals from these two resources with a diagnosis of autism, and with 94% specificity on a collection of observed and simulated non-spectrum controls. The classifier contained several elements found in the ADOS algorithm, demonstrating high test validity, and also resulted in a quantitative score that measures classification confidence and extremeness of the phenotype. With incidence rates rising, the ability to classify autism effectively and quickly requires careful design of assessment and diagnostic tools. Given the brevity, accuracy and quantitative nature of the classifier, results from this study may prove valuable in the development of mobile tools for preliminary evaluation and clinical prioritization—in particular those focused on assessment of short home videos of children—that speed the pace of initial evaluation and broaden the reach to a significantly larger percentage of the population at risk.
Serum proteomic pattern diagnostics is an emerging paradigm employing low-resolution mass spectrometry (MS) to generate a set of biomarker classifiers. In the present study, we utilized a wellcontrolled ovarian cancer serum study set to compare the sensitivity and specificity of serum proteomic diagnostic patterns acquired using a high-resolution versus a low-resolution MS platform. In blinded testing sets, the high-resolution mass spectral data contained multiple diagnostic signatures that were superior to the low-resolution spectra in terms of sensitivity and specificity ðP < 0:00001Þ throughout the range of modeling conditions. Four mass spectral feature set patterns acquired from data obtained exclusively with the high-resolution mass spectrometer were 100% specific and sensitive in their diagnosis of serum samples as being acquired from either unaffected patients or those suffering from ovarian cancer. Important to the future of proteomic pattern diagnostics is the ability to recognize inferior spectra statistically, so that those resulting from a specific process error are recognized prior to their potentially incorrect (and damaging) diagnosis. To meet this need, we have developed a series of quality-assurance and in-process control procedures to (a) globally evaluate sources of sample variability, (b) identify outlying mass spectra, and (c) develop quality-control release specifications. From these quality-assurance and control (QA/QC) specifications, we identified 32 mass spectra out of the total 248 that showed statistically significant differences from the norm. Hence, 216 of the initial 248 high-resolution mass spectra were determined to be of high quality and were remodeled by pattern-recognition analysis. Again, we obtained four mass spectral feature set patterns that also exhibited 100% sensitivity and specificity in blinded validation tests (68/68 cancer: including 18/18 stage I, and 43/43 healthy). We conclude that (a) the use of high-resolution MS yields superior classification patterns as compared with those obtained with lower resolution instrumentation; (b) although the process error that we discovered did not have a deleterious impact on the present results obtained from proteomic pattern analysis, the major source of spectral variability emanated from mass spectral acquisition, and not bias at the clinical collection site; (c) this variability can be reduced and monitored through the use of QA/QC statistical procedures; (d) multiple and distinct proteomic patterns, comprising low molecular weight biomarkers, detected by high-resolution MS achieve accuracies surpassing individual biomarkers, warranting validation in a large clinical study.
Mass spectroscopic analysis of the low molecular mass (LMM) range of the serum/plasma proteome is a rapidly emerging frontier for biomarker discovery. This study examined the proportion of LMM biomarkers, which are bound to circulating carrier proteins. Mass spectroscopic analysis of human serum following molecular mass fractionation, demonstrated that the majority of LMM biomarkers exist bound to carrier proteins. Moreover, the pattern of LMM biomarkers bound specifically to albumin is distinct from those bound to non-albumin carriers. Prominent SELDI-TOF ionic species (m/z 6631.7043) identified to correlate with the presence of ovarian cancer were amplified by albumin capture. Several insights emerged: a) Accumulation of LMM biomarkers on circulating carrier proteins greatly amplifies the total serum/plasma concentration of the measurable biomarker, b) The total serum/plasma biomarker concentration is largely determined by the carrier protein clearance rate, not the unbound biomarker clearance rate itself, and c) Examination of the LMM species bound to a specific carrier protein may contain important diagnostic information. These findings shift the focus of biomarker detection to the carrier protein and its biomarker content.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.