The National Center for Advancing Translational Sciences (NCATS) has developed an online open science data portal for its COVID-19 drug repurposing campaign – named OpenData – with the goal of making data across a range of SARS-CoV-2 related assays available in real-time. The assays developed cover a wide spectrum of the SARS-CoV-2 life cycle, including both viral and human (host) targets. In total, over 10,000 compounds are being tested in full concentration-response ranges from across multiple annotated small molecule libraries, including approved drug, repurposing candidates and experimental therapeutics designed to modulate a wide range of cellular targets. The goal is to support research scientists, clinical investigators and public health officials through open data sharing and analysis tools to expedite the development of SARS-CoV-2 interventions, and to prioritize promising compounds and repurposed drugs for further development in treating COVID-19.
As researchers are increasingly able to collect data on a large scale from multiple clinical and omics modalities, multi-omics integration is becoming a critical component of metabolomics research. This introduces a need for increased understanding by the metabolomics researcher of computational and statistical analysis methods relevant to multi-omics studies. In this review, we discuss common types of analyses performed in multi-omics studies and the computational and statistical methods that can be used for each type of analysis. We pinpoint the caveats and considerations for analysis methods, including required parameters, sample size and data distribution requirements, sources of a priori knowledge, and techniques for the evaluation of model accuracy. Finally, for the types of analyses discussed, we provide examples of the applications of corresponding methods to clinical and basic research. We intend that our review may be used as a guide for metabolomics researchers to choose effective techniques for multi-omics analyses relevant to their field of study.
BackgroundProteomic measurements, which closely reflect phenotypes, provide insights into gene expression regulations and mechanisms underlying altered phenotypes. Further, integration of data on proteome and transcriptome levels can validate gene signatures associated with a phenotype. However, proteomic data is not as abundant as genomic data, and it is thus beneficial to use genomic features to predict protein abundances when matching proteomic samples or measurements within samples are lacking.ResultsWe evaluate and compare four data-driven models for prediction of proteomic data from mRNA measured in breast and ovarian cancers using the 2017 DREAM Proteogenomics Challenge data. Our results show that Bayesian network, random forests, LASSO, and fuzzy logic approaches can predict protein abundance levels with median ground truth-predicted correlation values between 0.2 and 0.5. However, the most accurately predicted proteins differ considerably between approaches.ConclusionsIn addition to benchmarking aforementioned machine learning approaches for predicting protein levels from transcript levels, we discuss challenges and potential solutions in state-of-the-art proteogenomic analyses.
Motivation Functional interpretation of high-throughput metabolomic and transcriptomic results is a crucial step in generating insight from experimental data. However, pathway and functional information for genes and metabolites is distributed among many siloed resources, limiting the scope of analyses that rely on a single knowledge source. Results RaMP-DB 2.0 is a web interface, relational database, API, and R package designed for straightforward and comprehensive functional interpretation of metabolomic and multi-omic data. RaMP-DB 2.0 has been upgraded with an expanded breadth and depth of functional and chemical annotations (ClassyFire, LIPID MAPS, SMILES, InChIs, etc.), with new data types related to metabolites and lipids incorporated. To streamline entity resolution across multiple source databases, we have implemented a new semi-automated process, thereby lessening the burden of harmonization and supporting more frequent updates. The associated RaMP-DB 2.0 R package now supports queries on pathways, common reactions (e.g. metabolite-enzyme relationship), chemical functional ontologies, chemical classes, and chemical structures, as well as enrichment analyses on pathways (multi-omic) and chemical classes. Lastly, the RaMP-DB web interface has been completely redesigned using the Angular framework. Availability The code used to build all components of RaMP-DB 2.0 are freely available on GitHub at https://github.com/ncats/ramp-db, https://github.com/ncats/RaMP-Client/ and https://github.com/ncats/RaMP-Backend. The RaMP-DB front end can be accessed at https://rampdb.nih.gov/. Supplementary information Supplementary data are available at Bioinformatics online.
RaMP-DB 2.0 is a web interface, API, relational database and R package designed for straightforward and comprehensive functional interpretation of metabolomic and multi-omic data. Since its first release in 2018, RaMP-DB 2.0 has been upgraded with an expanded breadth and depth of functional and chemical annotation. Content from the source databases (Reactome, HMDB, and Wikipathways) has been updated, and new data types related to metabolite annotations have been incorporated. Structural information incorporated in RaMP-DB 2.0 includes SMILES strings, InChIs, InChIKeys. Chemical classes have been sourced from ClassyFire and LIPID MAPS. Accordingly, the RaMP-DB 2.0 R package has been updated and supports queries on pathways, common reactions, ontologies, chemical classes, and chemical structures. Additionally, RaMP-DB 2.0 now supports enrichment analyses on pathways and chemical classes. Our process for integrating annotations across resources has also been upgraded to lessen the burden of harmonization, thereby supporting more frequent updates. The code used to build all components of RaMP-DB 2.0 is freely available on GitHub at https://github.com/ncats/ramp-db and https://github.com/ncats/RaMP-Backend.
Background Assigning chromatin states genome-wide (e.g. promoters, enhancers, etc.) is commonly performed to improve functional interpretation of these states. However, computational methods to assign chromatin state suffer from the following drawbacks: they typically require data from multiple assays, which may not be practically feasible to obtain, and they depend on peak calling algorithms, which require careful parameterization and often exclude the majority of the genome. To address these drawbacks, we propose a novel learning technique built upon the Self-Organizing Map (SOM), Self-Organizing Map with Variable Neighborhoods (SOM-VN), to learn a set of representative shapes from a single, genome-wide, chromatin accessibility dataset to associate with a chromatin state assignment in which a particular RE is prevalent. These shapes can then be used to assign chromatin state using our workflow. Results We validate the performance of the SOM-VN workflow on 14 different samples of varying quality, namely one assay each of A549 and GM12878 cell lines and two each of H1 and HeLa cell lines, primary B-cells, and brain, heart, and stomach tissue. We show that SOM-VN learns shapes that are (1) non-random, (2) associated with known chromatin states, (3) generalizable across sets of chromosomes, and (4) associated with magnitude and multimodality. We compare the accuracy of SOM-VN chromatin states against the Clustering Aggregation Tool (CAGT), an unsupervised method that learns chromatin accessibility signal shapes but does not associate these shapes with REs, and we show that overall precision and recall is increased when learning shapes using SOM-VN as compared to CAGT. We further compare enhancer state assignments from SOM-VN in signals above a set threshold to enhancer state assignments from Predicting Enhancers from ATAC-seq Data (PEAS), a deep learning method that assigns enhancer chromatin states to peaks. We show that the precision-recall area under the curve for the assignment of enhancer states is comparable to PEAS. Conclusions Our work shows that the SOM-VN workflow can learn relationships between REs and chromatin accessibility signal shape, which is an important step toward the goal of assigning and comparing enhancer state across multiple experiments and phenotypic states.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.