Background: With the increasing popularity of scientific workflow management systems (SWfMS), more and more workflow specifications are becoming available. Such specifications contain precious knowledge that can be reused to produce new workflows. It is a fact that provenance data can help reusing third party code. However, finding the dependencies among programs without the support of a tool is not a trivial activity and, in many cases, becomes a barrier to build more sophisticated models and analysis. Due to the huge number of task versions available and their configuration parameters, this activity is highly error prone and counterproductive. Methods: In this work, we propose workflow recommender (WR), a recommendation service that aims at suggesting frequent combinations of workflow tasks for reuse. It works similarly to an e-commerce application that applies data mining techniques to help users find items they would like to purchase, predicting a list based on other user's choices. Results: Our experiments show that our approach is effective both in terms of performance and precision of the results. Conclusions:The approach is general in the sense that it can be coupled to any SWfMS.
In recent years, a considerable amount of attention has been devoted to research on complex networks and their properties. Collaborative environments, social networks and recommender systems are popular examples of complex networks that emerged recently and are object of interest in academy and industry. Many studies model complex networks as graphs and tackle the link prediction problem, one major open question in network evolution. It consists in predicting the likelihood of an association between two not interconnected nodes in a graph to appear. One of the approaches to such problem is based on binary classification supervised learning. Although the curse of dimensionality is a historical obstacle in machine learning, little effort has been applied to deal with it in the link prediction scenario. So, this paper evaluates the effects of dimensionality reduction as a preprocessing stage to the binary classifier construction in link prediction applications. Two dimensionality reduction strategies are experimented: Principal Component Analysis (PCA) and Forward Feature Selection (FFS). The results of experiments with three different datasets and four traditional machine learning algorithms show that dimensionality reduction with PCA and FFS can improve model precision in this kind of problem.
Bioinformatics experiments are typically composed of programs in pipelines manipulating an enormous quantity of data. An interesting approach for managing those experiments is through workflow management systems (WfMS). In this work we discuss WfMS features to support genome homology workflows and present some relevant issues for typical genomic experiments. Our evaluation used Kepler WfMS to manage a real genomic pipeline, named OrthoSearch, originally defined as a Perl script. We show a case study detecting distant homologies on trypanomatids metabolic pathways. Our results reinforce the benefits of WfMS over script languages and point out challenges to WfMS in distributed environments.
A key focus in 21 st century integrative biology and drug discovery for neglected tropical and other diseases has been the use of BLAST-based computational methods for identification of orthologous groups in pathogenic organisms to discern orthologs, with a view to evaluate similarities and differences among species, and thus allow the transfer of annotation from known/curated proteins to new/non-annotated ones. We used here a profile-based sensitive methodology to identify distant homologs, coupled to the NCBI's COG (Unicellular orthologs) and KOG (Eukaryote orthologs), permitting us to perform comparative genomics analyses on five protozoan genomes. OrthoSearch was used in five protozoan proteomes showing that 3901 and 7473 orthologs can be identified by comparison with COG and KOG proteomes, respectively. The core protozoa proteome inferred was 418 Protozoa-COG orthologous groups and 704 Protozoa-KOG orthologous groups: (i) 31.58% (132/418) belongs to the category J (translation, ribosomal structure, and biogenesis), and 9.81% (41/418) to the category O (post-translational modification, protein turnover, chaperones) using COG; (ii) 21.45% (151/704) belongs to the categories J, and 13.92% (98/704) to the O using KOG. The phylogenomic analysis showed four well-supported clades for Eukarya, discriminating Multicellular [(i) human, fly, plant and worm] and Unicellular [(ii) yeast, (iii) fungi, and (iv) protozoa] species. These encouraging results attest to the usefulness of the profile-based methodology for comparative genomics to accelerate semi-automatic re-annotation, especially of the protozoan proteomes. This approach may also lend itself for applications in global health, for example, in the case of novel drug target discovery against pathogenic organisms previously considered difficult to research with traditional drug discovery tools.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.