Summary
Here we present sbml2hyb, an easy-to-use standalone Python tool that facilitates the conversion of existing mechanistic models of biological systems in Systems Biology Markup Language (SBML) into hybrid semiparametric models that combine mechanistic functions with machine learning (ML). The so-formed hybrid models can be trained and stored back in databases in SBML format. The tool supports a user-friendly export interface with an internal format validator. Two case studies illustrate the use of the sbml2hyb tool. Additionally, we describe HMOD, a new model format designed to support and facilitate hybrid models building. It aggregates the mechanistic model information with the ML information and follows as close as possible the SBML rules. We expect the sbml2hyb tool and HMOD to greatly facilitate the widespread usage of hybrid modeling techniques for biological systems analysis.
Availability and implementation
The Python interface, source code and the example models used for the case studies are accessible at: https://github.com/r-costa/sbml2hyb.
Supplementary information
Supplementary data are available at Bioinformatics online.
Background
A considerable number of data mining approaches for biomedical data analysis, including state-of-the-art associative models, require a form of data discretization. Although diverse discretization approaches have been proposed, they generally work under a strict set of statistical assumptions which are arguably insufficient to handle the diversity and heterogeneity of clinical and molecular variables within a given dataset. In addition, although an increasing number of symbolic approaches in bioinformatics are able to assign multiple items to values occurring near discretization boundaries for superior robustness, there are no reference principles on how to perform multi-item discretizations.
Results
In this study, an unsupervised discretization method, DI2, for variables with arbitrarily skewed distributions is proposed. Statistical tests applied to assess differences in performance confirm that DI2 generally outperforms well-established discretizations methods with statistical significance. Within classification tasks, DI2 displays either competitive or superior levels of predictive accuracy, particularly delineate for classifiers able to accommodate border values.
Conclusions
This work proposes a new unsupervised method for data discretization, DI2, that takes into account the underlying data regularities, the presence of outlier values disrupting expected regularities, as well as the relevance of border values. DI2 is available at https://github.com/JupitersMight/DI2
Pattern discovery and subspace clustering play a central role in the biological domain, supporting for instance putative regulatory module discovery from omics data for both descriptive and predictive ends. In the presence of target variables (e.g. phenotypes), regulatory patterns should further satisfy delineate discriminative power properties, well-established in the presence of categorical outcomes, yet largely disregarded for numerical outcomes, such as risk profiles and quantitative phenotypes. DISA (Discriminative and Informative Subspace Assessment), a Python software package, is proposed to evaluate patterns in the presence of numerical outcomes using well-established measures together with a novel principle able to statistically assess the correlation gain of the subspace against the overall space. Results confirm the possibility to soundly extend discriminative criteria towards numerical outcomes without the drawbacks well-associated with discretization procedures. Results from four case studies confirm the validity and relevance of the proposed methods, further unveiling critical directions for research on biotechnology and biomedicine. Availability: DISA is freely available at https://github.com/JupitersMight/DISA under the MIT license.
The aim of the present study was to estimate ground reaction force (GRF) by means of a linear regression equation with input data from footprints. It can be used to provide further information on locomotion of extinct mammals and/or early humans, thus providing important knowledge about human bipedal locomotion evolution. Fossilized footprints contain information about gait dynamics, but their interpretation is difficult, as they are a combined result of foot anatomy, gait dynamics, and substrate properties. Several approaches are used for modeling and estimating data in biomechanics, simple modeling is useful when trying to understand complex events. Force measurements were performed using a force platform; at the same time a footprint was registered on a clay surface. From the measurements of length, width and depth of the footprint it was possible to estimate body height (BH), body mass (BM) and vertical GRF peaks during human walking. The main findings of the present study were two linear regression equations for estimation of GRF peaks from footprint depths (R 2 =0.81, p<0.001; R 2 =0.56, p<0.001). This study accomplishes a first step to a fully understanding of how to estimate GRF from footprint data, and have further application to locomotion evaluations from fossilized footprints.
Motivation: Pattern discovery and subspace clustering play a central role in the biological domain, supporting for instance putative regulatory module discovery from omic data for both descriptive and predictive ends. In the presence of target variables (e.g. phenotypes), regulatory patterns should further satisfy delineate discriminative power properties, well-established in the presence of categorical outcomes, yet largely disregarded for numerical outcomes, such as risk profiles and quantitative phenotypes. Results: DISA (Discriminative and Informative Subspace Assessment), a Python software package, is proposed to assess patterns in the presence of numerical outcomes using well-established measures together with a novel principle able to statistically assess the correlation gain of the subspace against the overall space. Results confirm the possibility to soundly extend discriminative criteria towards numerical outcomes without the drawbacks well-associated with discretization procedures. A case study is provided to show the properties of the proposed method. Availability: DISA is freely available at https://github.com/JupitersMight/DISA under the MIT license.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.