Food and human health are inextricably linked. As such, revolutionary impacts on health have been derived from advances in the production and distribution of food relating to food safety and fortification with micronutrients. During the past two decades, it has become apparent that the human microbiome has the potential to modulate health, including in ways that may be related to diet and the composition of specific foods. Despite the excitement and potential surrounding this area, the complexity of the gut microbiome, the chemical composition of food, and their interplay in situ remains a daunting task to fully understand. However, recent advances in high-throughput sequencing, metabolomics profiling, compositional analysis of food, and the emergence of electronic health records provide new sources of data that can contribute to addressing this challenge. Computational science will play an essential role in this effort as it will provide the foundation to integrate these data layers and derive insights capable of revealing and understanding the complex interactions between diet, gut microbiome, and health. Here, we review the current knowledge on diet-health-gut microbiota, relevant data sources, bioinformatics tools, machine learning capabilities, as well as the intellectual property and legislative regulatory landscape. We provide guidance on employing machine learning and data analytics, identify gaps in current methods, and describe new scenarios to be unlocked in the next few years in the context of current knowledge.
Protein inference, the identification of the protein set that is the origin of a given peptide profile, is a fundamental challenge in proteomics. We present DeepPep, a deep-convolutional neural network framework that predicts the protein set from a proteomics mixture, given the sequence universe of possible proteins and a target peptide profile. In its core, DeepPep quantifies the change in probabilistic score of peptide-spectrum matches in the presence or absence of a specific protein, hence selecting as candidate proteins with the largest impact to the peptide profile. Application of the method across datasets argues for its competitive predictive ability (AUC of 0.80±0.18, AUPR of 0.84±0.28) in inferring proteins without need of peptide detectability on which the most competitive methods rely. We find that the convolutional neural network architecture outperforms the traditional artificial neural network architectures without convolution layers in protein inference. We expect that similar deep learning architectures that allow learning nonlinear patterns can be further extended to problems in metagenome profiling and cell type inference. The source code of DeepPep and the benchmark datasets used in this study are available at https://deeppep.github.io/DeepPep/.
How to design experiments that accelerate knowledge discovery on complex biological landscapes remains a tantalizing question. We present an optimal experimental design method (coined OPEX) to identify informative omics experiments using machine learning models for both experimental space exploration and model training. OPEX-guided exploration of Escherichia coli’s populations exposed to biocide and antibiotic combinations lead to more accurate predictive models of gene expression with 44% less data. Analysis of the proposed experiments shows that broad exploration of the experimental space followed by fine-tuning emerges as the optimal strategy. Additionally, analysis of the experimental data reveals 29 cases of cross-stress protection and 4 cases of cross-stress vulnerability. Further validation reveals the central role of chaperones, stress response proteins and transport pumps in cross-stress exposure. This work demonstrates how active learning can be used to guide omics data collection for training predictive models, making evidence-driven decisions and accelerating knowledge discovery in life sciences.
The objective of this study is to validate reduced graphene oxide (RGO)-based volatile organic compounds (VOC) sensors, assembled by simple and low-cost manufacturing, for the detection of disease-related VOCs in human breath using machine learning (ML) algorithms. RGO films were functionalized by four different metalloporphryins to assemble cross-sensitive chemiresistive sensors with different sensing properties. This work demonstrated how different ML algorithms affect the discrimination capabilities of RGO–based VOC sensors. In addition, an ML-based disease classifier was derived to discriminate healthy vs. unhealthy individuals based on breath sample data. The results show that our ML models could predict the presence of disease-related VOC compounds of interest with a minimum accuracy and F1-score of 91.7% and 83.3%, respectively, and discriminate chronic kidney disease breath with a high accuracy, 91.7%.
Motivation Gene expression prediction is one of the grand challenges in computational biology. The availability of transcriptomics data combined with recent advances in artificial neural networks provide an unprecedented opportunity to create predictive models of gene expression with far reaching applications. Results We present the Genetic Neural Network (GNN), an artificial neural network for predicting genome-wide gene expression given gene knockouts and master regulator perturbations. In its core, the GNN maps existing gene regulatory information in its architecture and it uses cell nodes that have been specifically designed to capture the dependencies and non-linear dynamics that exist in gene networks. These two key features make the GNN architecture capable to capture complex relationships without the need of large training datasets. As a result, GNNs were 40% more accurate on average than competing architectures (MLP, RNN, BiRNN) when compared on hundreds of curated and inferred transcription modules. Our results argue that GNNs can become the architecture of choice when building predictors of gene expression from exponentially growing corpus of genome-wide transcriptomics data. Availability and implementation https://github.com/IBPA/GNN Supplementary information Supplementary data are available at Bioinformatics online.
Systemic phaeohyphomycosis, aka ‘fluid belly’, is one of the most important emergent diseases in sturgeon Acipenser spp. aquaculture. The etiologic agent is the saprobic, dematiaceous fungus Veronaea botryosa. Effective vaccines and chemotherapeutic treatments are currently unavailable. Additionally, the fungus is a slow-growing organism, taking from 10-15 d for colonies to be observed in agar media. To this end, a specific quantitative PCR (qPCR) targeting the V. botryosa β-tubulin gene was developed and validated. The specificity of the assay to V. botryosa was initially confirmed in silico and in vivo against common fungal fish pathogens, including closely related members of the order Chaetothyriales (Exophiala spp.) and other black pigmented fungi (Alternaria spp. and Cladosporium spp.), as well as tissues from uninfected sturgeon. The assay possessed high clinical specificity (100%) and clinical sensitivity (74%) in detecting V. botryosa DNA in splenic tissues from laboratory-infected sturgeon. Using V. botryosa genomic DNA as a template, the limit of detection was equivalent to 10 conidia, and the method was found suitable for the detection of fungal DNA in fresh and formalin-fixed tissues. In addition, the presence of non-target DNA from white sturgeon did not influence assay sensitivity. The developed qPCR assay is a sensitive, specific, and rapid diagnostic method for the detection and quantification of V. botryosa DNA from white sturgeon tissues.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.