A necessary step in the development of artificial intelligence is to enable a machine to represent how the world works, building an internal structure from data. This structure should hold a good trade-off between expressive power and querying efficiency. Bayesian networks have proven to be an effective and versatile tool for the task at hand. They have been applied to modeling knowledge in a variety of fields, ranging from bioinformatics to law, from image processing to economic risk analysis. A crucial aspect is learning the dependency graph of a Bayesian network from data. This task, called structure learning, is NP-hard and is the subject of intense, cutting-edge research. In short, it can be thought of as choosing one graph over the many candidates, grounding our reasoning over a collection of samples of the distribution generating the data. The number of possible graphs increases very quickly at the increase in the number of variables. Searching in this space, and selecting a graph over the others, becomes quickly burdensome. In this survey, we review the most relevant structure learning algorithms that have been proposed in the literature. We classify them according to the approach they follow for solving the problem and we also show alternatives for handling missing data and continuous variable. An extensive review of existing software tools is also given.
BackgroundMolecular dynamics (MD) simulations are powerful tools to investigate the conformational dynamics of proteins that is often a critical element of their function. Identification of functionally relevant conformations is generally done clustering the large ensemble of structures that are generated. Recently, Self-Organising Maps (SOMs) were reported performing more accurately and providing more consistent results than traditional clustering algorithms in various data mining problems. We present a novel strategy to analyse and compare conformational ensembles of protein domains using a two-level approach that combines SOMs and hierarchical clustering.ResultsThe conformational dynamics of the α-spectrin SH3 protein domain and six single mutants were analysed by MD simulations. The Cα's Cartesian coordinates of conformations sampled in the essential space were used as input data vectors for SOM training, then complete linkage clustering was performed on the SOM prototype vectors. A specific protocol to optimize a SOM for structural ensembles was proposed: the optimal SOM was selected by means of a Taguchi experimental design plan applied to different data sets, and the optimal sampling rate of the MD trajectory was selected. The proposed two-level approach was applied to single trajectories of the SH3 domain independently as well as to groups of them at the same time. The results demonstrated the potential of this approach in the analysis of large ensembles of molecular structures: the possibility of producing a topological mapping of the conformational space in a simple 2D visualisation, as well as of effectively highlighting differences in the conformational dynamics directly related to biological functions.ConclusionsThe use of a two-level approach combining SOMs and hierarchical clustering for conformational analysis of structural ensembles of proteins was proposed. It can easily be extended to other study cases and to conformational ensembles from other sources.
BackgroundDynamic aspects of gene regulatory networks are typically investigated by measuring system variables at multiple time points. Current state-of-the-art computational approaches for reconstructing gene networks directly build on such data, making a strong assumption that the system evolves in a synchronous fashion at fixed points in time. However, nowadays omics data are being generated with increasing time course granularity. Thus, modellers now have the possibility to represent the system as evolving in continuous time and to improve the models’ expressiveness.ResultsContinuous time Bayesian networks are proposed as a new approach for gene network reconstruction from time course expression data. Their performance was compared to two state-of-the-art methods: dynamic Bayesian networks and Granger causality analysis. On simulated data, the methods comparison was carried out for networks of increasing size, for measurements taken at different time granularity densities and for measurements unevenly spaced over time. Continuous time Bayesian networks outperformed the other methods in terms of the accuracy of regulatory interactions learnt from data for all network sizes. Furthermore, their performance degraded smoothly as the size of the network increased. Continuous time Bayesian networks were significantly better than dynamic Bayesian networks for all time granularities tested and better than Granger causality for dense time series. Both continuous time Bayesian networks and Granger causality performed robustly for unevenly spaced time series, with no significant loss of performance compared to the evenly spaced case, while the same did not hold true for dynamic Bayesian networks. The comparison included the IRMA experimental datasets which confirmed the effectiveness of the proposed method. Continuous time Bayesian networks were then applied to elucidate the regulatory mechanisms controlling murine T helper 17 (Th17) cell differentiation and were found to be effective in discovering well-known regulatory mechanisms, as well as new plausible biological insights.ConclusionsContinuous time Bayesian networks were effective on networks of both small and large size and were particularly feasible when the measurements were not evenly distributed over time. Reconstruction of the murine Th17 cell differentiation network using continuous time Bayesian networks revealed several autocrine loops, suggesting that Th17 cells may be auto regulating their own differentiation process.
Recommendation of scientific papers is a task aimed to support researchers in accessing relevant articles from a large pool of unseen articles. When writing a paper, a researcher focuses on the topics related to her/his scientific domain, by using a technical language. The core idea of this paper is to exploit the topics related to the researchers scientific production (authored articles) to formally define her/his profile; in particular we propose to employ topic modeling to formally represent the user profile, and language modeling to formally represent each unseen paper. The recommendation technique we propose relies on the assessment of the closeness of the language used in the researchers papers and the one employed in the unseen papers. The proposed approach exploits a reliable knowledge source for building the user profile, and it alleviates the cold-start problem, typical of collaborative filtering techniques. We also present a preliminary evaluation of our approach on the DBLP.
The class of continuous time Bayesian network classifiers is defined; it solves the problem of supervised classification on multivariate trajectories evolving in continuous time. The trajectory consists of the values of discrete attributes that are measured in continuous time, while the predicted class is expected to occur in the future. Two instances from this class, namely the continuous time naive Bayes classifier and the continuous time tree augmented naive Bayes classifier, are introduced and analyzed. They implement a trade-off between computational complexity and classification accuracy. Learning and inference for the class of continuous time Bayesian network classifiers are addressed, in the case where complete data are available. A learning algorithm for the continuous time naive Bayes classifier and an exact inference algorithm for the class of continuous time Bayesian network classifiers are described. The performance of the continuous time naive Bayes classifier is assessed in the case where real-time feedback to neurological patients undergoing motor rehabilitation must be provided.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.