Daly, R., Shen, Q., Aitken, S. (2011). Learning Bayesian networks: approaches and issues. Knowledge Engineering Review, 26 (2), 99-157Bayesian networks have become a widely used method in the modelling of uncertain knowledge. Owing to the difficulty domain experts have in specifying them, techniques that learn Bayesian networks from data have become indispensable. Recently, however, there have been many important new developments in this field. This work takes a broad look at the literature on learning Bayesian networks?in particular their structure?from data. Specific topics are not focused on in detail, but it is hoped that all the major fields in the area are covered. This article is not intended to be a tutorial?for this, there are many books on the topic, which will be presented. However, an effort has been made to locate all the relevant publications, so that this paper can be used as a ready reference to find the works on particular sub-topics.Peer reviewe
MotivationWe recently published MS2LDA, a method for the decomposition of sets of molecular fragment data derived from large metabolomics experiments. To make the method more widely available to the community, here we present ms2lda.org, a web application that allows users to upload their data, run MS2LDA analyses and explore the results through interactive visualizations.ResultsMs2lda.org takes tandem mass spectrometry data in many standard formats and allows the user to infer the sets of fragment and neutral loss features that co-occur together (Mass2Motifs). As an alternative workflow, the user can also decompose a data set onto predefined Mass2Motifs. This is accomplished through the web interface or programmatically from our web service.Availability and implementationThe website can be found at http://ms2lda.org, while the source code is available at https://github.com/sdrogers/ms2ldaviz under the MIT license.Supplementary information Supplementary data are available at Bioinformatics online.
Motivation: The use of liquid chromatography coupled to mass spectrometry has enabled the high-throughput profiling of the metabolite composition of biological samples. However, the large amount of data obtained can be difficult to analyse and often requires computational processing to understand which metabolites are present in a sample. This article looks at the dual problem of annotating peaks in a sample with a metabolite, together with putatively annotating whether a metabolite is present in the sample. The starting point of the approach is a Bayesian clustering of peaks into groups, each corresponding to putative adducts and isotopes of a single metabolite.Results: The Bayesian modelling introduced here combines information from the mass-to-charge ratio, retention time and intensity of each peak, together with a model of the inter-peak dependency structure, to increase the accuracy of peak annotation. The results inherently contain a quantitative estimate of confidence in the peak annotations and allow an accurate trade-off between precision and recall. Extensive validation experiments using authentic chemical standards show that this system is able to produce more accurate putative identifications than other state-of-the-art systems, while at the same time giving a probabilistic measure of confidence in the annotations.Availability and implementation: The software has been implemented as part of the mzMatch metabolomics analysis pipeline, which is available for download at http://mzmatch.sourceforge.net/.Contact: Ronan.Daly@glasgow.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online.
SummaryThe Polyomics integrated Metabolomics Pipeline (PiMP) fulfils an unmet need in metabolomics data analysis. PiMP offers automated and user-friendly analysis from mass spectrometry data acquisition to biological interpretation. Our key innovations are the Summary Page, which provides a simple overview of the experiment in the format of a scientific paper, containing the key findings of the experiment along with associated metadata; and the Metabolite Page, which provides a list of each metabolite accompanied by ‘evidence cards’, which provide a variety of criteria behind metabolite annotation including peak shapes, intensities in different sample groups and database information.Availability and implementationPiMP is available at http://polyomics.mvls.gla.ac.uk, and access is freely available on request. 50 GB of space is allocated for data storage, with unrestricted number of samples and analyses per user. Source code is available at https://github.com/RonanDaly/pimp and licensed under the GPL.Supplementary information Supplementary data are available at Bioinformatics online.
Tandem mass spectrometry (LC-MS/MS) is widely used to identify unknown ions in untargeted metabolomics. Data-dependent acquisition (DDA) chooses which ions to fragment based upon intensities observed in MS1 survey scans and typically only fragments a small subset of the ions present. Despite this inefficiency, relatively little work has addressed the development of new DDA methods, partly due to the high overhead associated with running the many extracts necessary to optimize approaches in busy MS facilities. In this work, we first provide theoretical results that show how much improvement is possible over current DDA strategies. We then describe an in silico framework for fast and cost-efficient development of new DDA strategies using a previously developed virtual metabolomics mass spectrometer (ViMMS). Additional functionality is added to ViMMS to allow methods to be used both in simulation and on real samples via an Instrument Application Programming Interface (IAPI). We demonstrate this framework through the development and optimization of two new DDA methods that introduce new advanced ion prioritization strategies. Upon application of these developed methods to two complex metabolite mixtures, our results show that they are able to fragment more unique ions than standard DDA strategies.
Specialised metabolites from microbial sources are well-known for their wide range of biomedical applications, particularly as antibiotics. When mining paired genomic and metabolomic data sets for novel specialised metabolites, establishing links between Biosynthetic Gene Clusters (BGCs) and metabolites represents a promising way of finding such novel chemistry. However, due to the lack of detailed biosynthetic knowledge for the majority of predicted BGCs, and the large number of possible combinations, this is not a simple task. This problem is becoming ever more pressing with the increased availability of paired omics data sets. Current tools are not effective at identifying valid links automatically, and manual verification is a considerable bottleneck in natural product research. We demonstrate that using multiple link-scoring functions together makes it easier to prioritise true links relative to others. Based on standardising a commonly used score, we introduce a new, more effective score, and introduce a novel score using an Input-Output Kernel Regression approach. Finally, we present NPLinker, a software framework to link genomic and metabolomic data. Results are verified using publicly available data sets that include validated links.
Treatment for human African trypanosomiasis is dependent on the species of trypanosome causing the disease and the stage of the disease (stage 1 defined by parasites being present in blood and lymphatics whilst for stage 2, parasites are found beyond the blood-brain barrier in the cerebrospinal fluid (CSF)). Currently, staging relies upon detecting the very low number of parasites or elevated white blood cell numbers in CSF. Improved staging is desirable, as is the elimination of the need for lumbar puncture. Here we use metabolomics to probe samples of CSF, plasma and urine from 40 Angolan patients infected with Trypanosoma brucei gambiense, at different disease stages. Urine samples provided no robust markers indicative of infection or stage of infection due to inherent variability in urine concentrations. Biomarkers in CSF were able to distinguish patients at stage 1 or advanced stage 2 with absolute specificity. Eleven metabolites clearly distinguished the stage in most patients and two of these (neopterin and 5-hydroxytryptophan) showed 100% specificity and sensitivity between our stage 1 and advanced stage 2 samples. Neopterin is an inflammatory biomarker previously shown in CSF of stage 2 but not stage 1 patients. 5-hydroxytryptophan is an important metabolite in the serotonin synthetic pathway, the key pathway in determining somnolence, thus offering a possible link to the eponymous symptoms of “sleeping sickness”. Plasma also yielded several biomarkers clearly indicative of the presence (87% sensitivity and 95% specificity) and stage of disease (92% sensitivity and 81% specificity). A logistic regression model including these metabolites showed clear separation of patients being either at stage 1 or advanced stage 2 or indeed diseased (both stages) versus control.
Bayesian networks are a useful tool in the representation of uncertain knowledge. This paper proposes a new algorithm called ACO-E, to learn the structure of a Bayesian network. It does this by conducting a search through the space of equivalence classes of Bayesian networks using Ant Colony Optimization (ACO). To this end, two novel extensions of traditional ACO techniques are proposed and implemented. Firstly, multiple types of moves are allowed. Secondly, moves can be given in terms of indices that are not based on construction graph nodes. The results of testing show that ACO-E performs better than a greedy search and other state-of-the-art and metaheuristic algorithms whilst searching in the space of equivalence classes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.