Single-cell RNA-sequencing technologies suffer from many sources of technical noise, including under-sampling of mRNA molecules, often termed ‘dropout’, which can severely obscure important gene-gene relationships. To address this, we developed MAGIC (Markov Affinity-based Graph Imputation of Cells), a method that shares information across similar cells, via data diffusion, to denoise the cell count matrix and fill in missing transcripts. We validate MAGIC on several biological systems and find it effective at recovering gene-gene relationships and additional structures. MAGIC reveals a phenotypic continuum, with the majority of cells residing in intermediate states that display stem-like signatures and uncovers known and previously uncharacterized regulatory interactions, demonstrating that our approach can successfully uncover regulatory relations without perturbations.
The high-dimensional data created by high-throughput technologies require visualization tools that reveal data structure and patterns in an intuitive form. We present PHATE, a visualization method that captures both local and global nonlinear structure using an information-geometric distance between datapoints. We compared PHATE to other tools on a variety of artificial and biological *
Biomedical researchers are generating high-throughput, high-dimensional single-cell 5 data at a staggering rate. As costs of data generation decrease, experimental design is mov-6 ing towards measurement of many different single-cell samples in the same dataset. These 7 samples can correspond to different patients, conditions, or treatments. While scalability of 8 methods to datasets of these sizes is a challenge on its own, dealing with large-scale exper-9 imental design presents a whole new set of problems, including batch effects and sample 10 1 .
Neuropil is a fundamental form of tissue organization within brains 1 . In neuropils, densely packed neurons synaptically interconnect into precise circuit architecture 2 , 3 , yet the structural and developmental principles governing this nanoscale precision remain largely unknown 4 , 5 . Here, we use diffusion condensation, an iterative data coarse-graining algorithm 6 , to identify nested circuit structures within the C. elegans neuropil (called the nerve ring). We show that the nerve ring neuropil is largely organized into four strata composed of related behavioral circuits. The stratified architecture of the neuropil is a geometrical representation of the functional segregation of sensory information and motor outputs, with specific sensory organs and muscle quadrants mapping onto particular neuropil strata. We identify groups of neurons with unique morphologies that integrate information across strata and that create neural structures that cage the strata within the nerve ring. We use high resolution light-sheet microscopy 7 , 8 , coupled with lineage-tracing and cell-tracking algorithms 9 , 10 , to resolve the developmental sequence and reveal principles of cell position, migration and outgrowth that guide stratified neuropil organization. Our results uncover conserved structural design principles underlying nerve ring neuropil architecture and function, and a pioneer-neuron-based, temporal progression of outgrowth that guides the hierarchical development of the layered neuropil. Our findings provide a systematic blueprint for using structural and developmental approaches to understand neuropil organization within brains.
Handling the vast amounts of single-cell RNA-sequencing and CyTOF data, which are now being generated in patient cohorts, presents a computational challenge due to the noise, complexity, sparsity and batch effects present. Here, we propose a unified deep neural network-based approach to automatically process and extract structure from these massive datasets. Our unsupervised architecture, called SAUCIE (Sparse Autoencoder for Unsupervised Clustering, Imputation, and Embedding), simultaneously performs several key tasks for single-cell data analysis including 1) clustering, 2) batch correction, 3) visualization, and 4) denoising/imputation. SAUCIE is trained to recreate its own input after reducing its dimensionality in a 2-D embedding layer which can be used to visualize the data. Additionally, it uses two novel regularizations: (1) an information dimension regularization to penalize entropy as computed on normalized activation values of the layer, and thereby encourage binary-like encodings that are amenable to clustering and (2) a Maximal Mean Discrepancy penalty to correct batch effects. Thus SAUCIE has a single architecture that denoises, batch-corrects, visualizes and clusters data using a unified 1 . CC-BY 4.0 International license peer-reviewed) is the author/funder. It is made available under a
f -divergence estimation is an important problem in the fields of information theory, machine learning, and statistics. While several divergence estimators exist, relatively few of their convergence rates are known. We derive the MSE convergence rate for a density plug-in estimator of f -divergence. Then by applying the theory of optimally weighted ensemble estimation, we derive a divergence estimator with a convergence rate of O 1 T that is simple to implement and performs well in high dimensions. We validate our theoretical results with experiments.I. INTRODUCTION f -divergence is a measure of the difference between distributions and is important to the fields of information theory, machine learning, and statistics [1]. Many different kinds of f -divergences have been defined including the Kullback-Leibler (KL) [2] and . A special case of the KL divergence is mutual information which gives the capacities in data compression and channel coding [4]. Mutual information estimation has also been used in applications such as feature selection [5], fMRI data processing [6], and clustering [7]. Entropy is also a special case of divergence where one of the distributions is the uniform distribution. Entropy estimation is useful for intrinsic dimension estimation [8], texture classification and image registration [9], and many other applications. Additionally, divergence estimation is useful for empirically estimating the decay rates of error probabilities of hypothesis testing [4] and extending machine learning algorithms to distributional features [10], [11]. For other applications of divergence estimation, see [12].We consider the problem of estimating the f -divergence when only two finite populations of independent and identically distributed (i.i.d.) samples are available from some unknown, nonparametric, smooth, d-dimensional distributions. While several estimators of divergence have been previously defined, the convergence rates are known for only a few of them. Our first contribution is to derive convergence rates for kernel density plug-in f -divergence estimators with an adaptive k-nearest neighbor (k-nn) kernel. Our second contribution is to extend the theory of optimally weighted ensemble entropy estimation developed in [13] to obtain a divergence estimator with a convergence rate of O
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.