We present a systematic evaluation of state-of-the-art algorithms for inferring gene regulatory networks (GRNs) from single-cell transcriptional data. As the ground truth for assessing accuracy, we use synthetic networks with predictable trajectories, literature-curated Boolean models, and diverse transcriptional regulatory networks. We develop a strategy to simulate single-cell transcriptional data from synthetic and Boolean networks that avoids pitfalls of previously-used methods. Furthermore, we collect networks from multiple experimental single-cell RNA-seq datasets. We develop an evaluation framework called BEELINE. We find that the AUPRC and early precision of the algorithms are moderate. The methods are better in recovering interactions in synthetic networks than Boolean models. The algorithms with the best early precision values for Boolean models also perform well on experimental datasets. Techniques that do not require pseudotime-ordered cells are generally more accurate. Based on these results, we present recommendations to end users. BEELINE will aid the development of GRN inference algorithms.Single-cell RNA-sequencing technology has made it possible to trace cellular lineages during differentiation and to identify new cell types 1,2 . A central question that arises now is whether we can discover the gene regulatory networks (GRNs) that control cellular differentiation and drive transitions from one cell type to another. In such a GRN, each edge connects a transcription factor (TF) to a gene it regulates. Ideally, the edge is directed from the TF to the target gene, represents direct rather than indirect regulation, and corresponds to activation or inhibition.Single-cell expression data are especially promising for computing GRNs because, unlike bulk transcriptomic data, they do not obscure biological signals by averaging over all the cells in a sample. However, these data have features that pose significant difficulties, e.g., Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:
We present a comprehensive evaluation of state-of-the-art algorithms for inferring gene regulatory networks (GRNs) from single-cell gene expression data. Our contributions include a comprehensive evaluation pipeline based on simulated data from "toy", artificial networks with predictable cellular trajectories and on simulated data from carefully-curated Boolean models. We develop a strategy to simulate these two types of data that avoids the pitfalls of existing strategies that have been used to mimic bulk transcriptional data. We found that the accuracy of the algorithms measured in terms of AUROC and AUPRC was moderate, by and large, although the methods were better in recovering interactions in the artificial networks than the Boolean models. Techniques that did not require pseudotime-ordered cells were more accurate, in general. There were an excess of feedforward loops in predicted networks than in the Boolean models. The observation that the endpoints of many false positive edges were connected by paths of length two in the Boolean models suggested that indirect effects may be predominant in the outputs of these algorithms. The outputs of the methods were quite inconsistent with each other, indicating that combining these approaches using ensembles is likely to be challenging. We present recommendations on how to create simulated gene expression datasets for testing GRN inference algorithms. We suggest that new ideas for avoiding the prediction of indirect interactions appear to be necessary to improve the accuracy of GRN inference algorithms for single cell gene expression data. Simulated data from synthetic networks Simulated data from curated models LEAP PIDC GRN inference methods Predicted networks SCODE LEAP PIDC SCODE Run algorithms Parameter search Software run time Evaluate network motifs ROC Early Precision Stability of inferred networks
Analysis of signaling pathways and their crosstalk is a cornerstone of systems biology. Thousands of papers have been published on these topics. Surprisingly, there is no database that carefully and explicitly documents crosstalk between specific pairs of signaling pathways. We have developed XTalkDB (http://www.xtalkdb.org) to fill this very important gap. XTalkDB contains curated information for 650 pairs of pathways from over 1600 publications. In addition, the database reports the molecular components (e.g. proteins, hormones, microRNAs) that mediate crosstalk between a pair of pathways and the species and tissue in which the crosstalk was observed. The XTalkDB website provides an easy-to-use interface for scientists to browse crosstalk information by querying one or more pathways or molecules of interest.
Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.