BackgroundGraphs can represent biological networks at the molecular, protein, or species level. An important query is to find all matches of a pattern graph to a target graph. Accomplishing this is inherently difficult (NP-complete) and the efficiency of heuristic algorithms for the problem may depend upon the input graphs. The common aim of existing algorithms is to eliminate unsuccessful mappings as early as and as inexpensively as possible.ResultsWe propose a new subgraph isomorphism algorithm which applies a search strategy to significantly reduce the search space without using any complex pruning rules or domain reduction procedures. We compare our method with the most recent and efficient subgraph isomorphism algorithms (VFlib, LAD, and our C++ implementation of FocusSearch which was originally distributed in Modula2) on synthetic, molecules, and interaction networks data. We show a significant reduction in the running time of our approach compared with these other excellent methods and show that our algorithm scales well as memory demands increase.ConclusionsSubgraph isomorphism algorithms are intensively used by biochemical tools. Our analysis gives a comprehensive comparison of different software approaches to subgraph isomorphism highlighting their weaknesses and strengths. This will help researchers make a rational choice among methods depending on their application. We also distribute an open-source package including our system and our own C++ implementation of FocusSearch together with all the used datasets (http://ferrolab.dmi.unict.it/ri.html). In future work, our findings may be extended to approximate subgraph isomorphism algorithms.
Biological applications, from genomics to ecology, deal with graphs that represents the structure of interactions. Analyzing such data requires searching for subgraphs in collections of graphs. This task is computationally expensive. Even though multicore architectures, from commodity computers to more advanced symmetric multiprocessing (SMP), offer scalable computing power, currently published software implementations for indexing and graph matching are fundamentally sequential. As a consequence, such software implementations (i) do not fully exploit available parallel computing power and (ii) they do not scale with respect to the size of graphs in the database. We present GRAPES, software for parallel searching on databases of large biological graphs. GRAPES implements a parallel version of well-established graph searching algorithms, and introduces new strategies which naturally lead to a faster parallel searching system especially for large graphs. GRAPES decomposes graphs into subcomponents that can be efficiently searched in parallel. We show the performance of GRAPES on representative biological datasets containing antiviral chemical compounds, DNA, RNA, proteins, protein contact maps and protein interactions networks.
AbstractmiRandola (http://mirandola.iit.cnr.it/) is a database of extracellular non-coding RNAs (ncRNAs) that was initially published in 2012, foreseeing the relevance of ncRNAs as non-invasive biomarkers. An increasing amount of experimental evidence shows that ncRNAs are frequently dysregulated in diseases. Further, ncRNAs have been discovered in different extracellular forms, such as exosomes, which circulate in human body fluids. Thus, miRandola 2017 is an effort to update and collect the accumulating information on extracellular ncRNAs that is spread across scientific publications and different databases. Data are manually curated from 314 articles that describe miRNAs, long non-coding RNAs and circular RNAs. Fourteen organisms are now included in the database, and associations of ncRNAs with 25 drugs, 47 sample types and 197 diseases. miRandola also classifies extracellular RNAs based on their extracellular form: Argonaute2 protein, exosome, microvesicle, microparticle, membrane vesicle, high density lipoprotein and circulating. We also implemented a new web interface to improve the user experience.
In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined.
Biomedical and chemical databases are large and rapidly growing in size. Graphs naturally model such kinds of data. To fully exploit the wealth of information in these graph databases, scientists require systems that search for all occurrences of a query graph. To deal efficiently with graph searching, advanced methods for indexing, representation and matching of graphs have been proposed. This paper presents GraphGrepSX. The system implements efficient graph searching algorithms together with an advanced filtering technique. GraphGrepSX is compared with SING, GraphFind, CTree and GCoding. Experiments show that GraphGrepSX outperforms the compared systems on a very large collection of molecular data. In particular, it reduces the size and the time for the construction of large database index and outperforms the most popular systems.
BackgroundMicroRNAs (miRNAs) are small noncoding RNAs that play an important role in the regulation of various biological processes through their interaction with cellular mRNAs. A significant amount of miRNAs has been found in extracellular human body fluids (e.g. plasma and serum) and some circulating miRNAs in the blood have been successfully revealed as biomarkers for diseases including cardiovascular diseases and cancer. Released miRNAs do not necessarily reflect the abundance of miRNAs in the cell of origin. It is claimed that release of miRNAs from cells into blood and ductal fluids is selective and that the selection of released miRNAs may correlate with malignancy. Moreover, miRNAs play a significant role in pharmacogenomics by down-regulating genes that are important for drug function. In particular, the use of drugs should be taken into consideration while analyzing plasma miRNA levels as drug treatment. This may impair their employment as biomarkers.DescriptionWe enriched our manually curated extracellular/circulating microRNAs database, miRandola, by providing (i) a systematic comparison of expression profiles of cellular and extracellular miRNAs, (ii) a miRNA targets enrichment analysis procedure, (iii) information on drugs and their effect on miRNA expression, obtained by applying a natural language processing algorithm to abstracts obtained from PubMed.ConclusionsThis allows users to improve the knowledge about the function, diagnostic potential, and the drug effects on cellular and circulating miRNAs.
Graphs are mathematical structures to model several biological data. Applications to analyze them require to apply solutions for the subgraph isomorphism problem, which is NP-complete. Here, we investigate the existing strategies to reduce the subgraph isomorphism algorithm running time with emphasis on the importance of the order with which the graph vertices are taken into account during the search, called variable ordering, and its incidence on the total running time of the algorithms. We focus on two recent solutions, which are based on an effective variable ordering strategy. We discuss their comparison both with the variable ordering strategies reviewed in the paper and the other algorithms present in the ICPR2014 contest on graph matching algorithms for pattern search in biological databases.
BackgroundThe identification of drug-target interactions (DTI) is a costly and time-consuming step in drug discovery and design. Computational methods capable of predicting reliable DTI play an important role in the field. Algorithms may aim to design new therapies based on a single approved drug or a combination of them. Recently, recommendation methods relying on network-based inference in connection with knowledge coming from the specific domain have been proposed.DescriptionHere we propose a web-based interface to the DT-Hybrid algorithm, which applies a recommendation technique based on bipartite network projection implementing resources transfer within the network. This technique combined with domain-specific knowledge expressing drugs and targets similarity is used to compute recommendations for each drug. Our web interface allows the users: (i) to browse all the predictions inferred by the algorithm; (ii) to upload their custom data on which they wish to obtain a prediction through a DT-Hybrid based pipeline; (iii) to help in the early stages of drug combinations, repositioning, substitution, or resistance studies by finding drugs that can act simultaneously on multiple targets in a multi-pathway environment. Our system is periodically synchronized with DrugBank and updated accordingly. The website is free, open to all users, and available at http://alpha.dmi.unict.it/dtweb/.ConclusionsOur web interface allows users to search and visualize information on drugs and targets eventually providing their own data to compute a list of predictions. The user can visualize information about the characteristics of each drug, a list of predicted and validated targets, associated enzymes and transporters. A table containing key information and GO classification allows the users to perform their own analysis on our data. A special interface for data submission allows the execution of a pipeline, based on DT-Hybrid, predicting new targets with the corresponding p-values expressing the reliability of each group of predictions. Finally, It is also possible to specify a list of genes tracking down all the drugs that may have an indirect influence on them based on a multi-drug, multi-target, multi-pathway analysis, which aims to discover drugs for future follow-up studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.