Availability: This is the author's manuscriptAbstract Recent advances in molecular biology and bioinformatics techniques brought to an explosion of the information about the spatial organisation of the DNA in the nucleus of a cell. High-throughput molecular biology techniques provide a genome-wide capture of the spatial organization of chromosomes at unprecedented scales, which permit to identify physical interactions between genetic elements located throughout a genome. This important information is however hampered by the lack of biologists-friendly analysis and visualisation software: these disciplines are literally caught in a flood of data and are now facing many of the scale-out issues that High-Performance Computing (HPC) has been addressing for years. Data must be managed, analysed and integrated, with substantial requirements in speed (in terms of execution time), application scalability and data representation. In this work we present NuChart-II, an efficient and highly optimized tool for genomic data analysis that provides a gene-centric, graph-based representation of genomic information, and proposes an ex-post normalisation technique for Hi-C data. While designing NuChart-II we addressed several common issues in the parallelisation of memory bound algorithms for shared-memory systems.
Protein-protein interactions are strictly correlated to the surface shape because beside a large number of structural amino acids composing the core there are few superficial amino acids that define the functionality. This study concerns the development of a tool that starting from the 3D atomic coordinates of a protein, as retrieved from the Protein Data Bank (PDB), models the macromolecular surface in an implicit way using an approach that is more suitable for this kind of analysis than the parametric one. The Marching Cubes algorithm is used to process the volumetric description of the protein obtaining a precise representation of the corresponding surface. Because of the large amount of data to consider in studying whole protein families this algorithm is implemented in parallel on a computer cluster to improve its performance. The parallel version of Marching Cubes is developed in ASSIST, an high level structured parallel programming system, obtaining a near optimal performance considering computational activities, and acceptable performance including I/O.
In the last decade, different computing paradigms and modelling frameworks for the description and simulation of biochemical systems based on stochastic modelling have been proposed. From a computational point of view, many simulations of the model are necessary to identify the behaviour of the system. The execution of thousands of simulation can require huge amount of time, therefore the parallelization of these algorithms is highly desirable. In this work we discuss the different strategies that can be implemented for the parallelization of a space aware τ -DPP variant, that is proving a C-MPI implementation of the system and discussing its performances according to the simulation of a particle diffusion in a crowded environment. I. INTRODUCTIONMembrane systems, also called P systems, are computing devices inspired to the structure and operation of living cells as well as from the way the cells are organized in tissues and higher order structures.The properties of this class of systems make them suitable also for modelling biological systems [1], in which the different sets of objects represent molecular species and the rewriting rules represent chemical reactions that describe the evolution of the system in the time. However, some features of P systems as non-determinism and maximal parallelism have to be mitigated, while other properties, as physicalbased procedure to describe the time evolution, have to be considered more carefully to ensure the accurateness of the results. Moreover, stochastic methods have gained a great attention since many biological processes are controlled by noisy mechanisms. This is particularly true when the molecular quantities involved are small, as in this case.A membrane system variant which relies on these considerations is called τ -DPP [2], where Dynamical Probabilistic P systems have been coupled with a modified version of the τ leaping stochastic simulation method [3], in order to obtain a quantitative time streamline. A novel variant of τ -DPP, called Sτ -DPP [4], [5], has been introduced to consider the size of volumes and objects involved in a system, in order to better describe systems where the space plays an important role in the dynamics, such as crowded systems.The algorithm can be used in the modelling and simulation of reaction-diffusion (RD) systems in crowded environ-
Molecular dynamics is very important for biomedical research because it makes possible simulation of the behavior of a biological macromolecule in silico. However, molecular dynamics is computationally rather expensive: the simulation of some nanoseconds of dynamics for a large macromolecule such as a protein takes very long time, due to the high number of operations that are needed for solving the Newton's equations in the case of a system of thousands of atoms. In order to obtain biologically significant data, it is desirable to use high-performance computation resources to perform these simulations. Recently, a distributed computing approach based on replacing a single long simulation with many independent short trajectories has been introduced, which in many cases provides valuable results. This study concerns the development of an infrastructure to run molecular dynamics simulations on a grid platform in a distributed way. The implemented software allows the parallel submission of different simulations that are singularly short but together bring important biological information. Moreover, each simulation is divided into a chain of jobs to avoid data loss in case of system failure and to contain the dimension of each data transfer from the grid. The results confirm that the distributed approach on grid computing is particularly suitable for molecular dynamics simulations thanks to the elevated scalability.
A common ongoing task for Functional Genomics is to compare full organisms’ genome with those of related species, to search in huge database for functional annotation of novel sequences and to identify specific patterns of them, such as ESTs, genes, and microRNA. The prediction of these patterns has a relevant computational cost, while public genome archives exceed one billion sequence traces from over 1,000 organisms and this number is increasing rapidly as costs decline, but powerful solution must be enabled in order to perform efficient searches. This means that Functional Genomics applications require significant computational infrastructures, where reusable tools and resources can be accessed. In particular, grid computing seems to fulfill both the computational and data management requirements, even if porting applications on this infrastructure can be difficult. The implementation of a suitable environment for the management of distributed computations can provide reliable advantage, reducing the gap between the requirements of the functional genomic domain and the potential of this technology.
scite is a Brooklyn-based startup that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2023 scite Inc. All rights reserved.
Made with 💙 for researchers