Comprehensive and accurate comparisons of transcriptomic distributions of cells from samples taken from two different biological states, such as healthy versus diseased individuals, are an emerging challenge in single-cell RNA sequencing (scRNA-seq) analysis. Current methods for detecting differentially abundant (DA) subpopulations between samples rely heavily on initial clustering of all cells in both samples. Often, this clustering step is inadequate since the DA subpopulations may not align with a clear cluster structure, and important differences between the two biological states can be missed. Here, we introduce DA-seq, a targeted approach for identifying DA subpopulations not restricted to clusters. DA-seq is a multiscale method that quantifies a local DA measure for each cell, which is computed from its k nearest neighboring cells across a range of k values. Based on this measure, DA-seq delineates contiguous significant DA subpopulations in the transcriptomic space. We apply DA-seq to several scRNA-seq datasets and highlight its improved ability to detect differences between distinct phenotypes in severe versus mildly ill COVID-19 patients, melanomas subjected to immune checkpoint therapy comparing responders to nonresponders, embryonic development at two time points, and young versus aging brain tissue. DA-seq enabled us to detect differences between these phenotypes. Importantly, we find that DA-seq not only recovers the DA cell types as discovered in the original studies but also reveals additional DA subpopulations that were not described before. Analysis of these subpopulations yields biological insights that would otherwise be undetected using conventional computational approaches.
Following antigenic challenge, activated B cells rapidly expand and undergo somatic hypermutation, yielding groups of clonally related B cells with diversified immunoglobulin receptors. Inference of clonal relationships based on the receptor sequence is an essential step in many adaptive immune receptor repertoire sequencing studies. These relationships are typically identified by a multi-step process that involves: (i) grouping sequences based on shared V and J gene assignments, and junction lengths and (ii) clustering these sequences using a junction-based distance. However, this approach is sensitive to the initial gene assignments, which are error-prone, and fails to identify clonal relatives whose junction length has changed through accumulation of indels. Through defining a translation-invariant feature space in which we cluster the sequences, we develop an alignment free clonal identification method that does not require gene assignments and is not restricted to a fixed junction length. This alignment free approach has higher sensitivity compared to a typical junction-based distance method without loss of specificity and PPV. While the alignment free procedure identifies clones that are broadly consistent with the junction-based distance method, it also identifies clones with characteristics (multiple V or J gene assignments or junction lengths) that are not detectable with the junction-based distance method.
In this paper, a reduced dimensionality representation is learned from multiple views of the processed data. These multiple views can be obtained, for example, when the same underlying process is observed using several different modalities, or measured with different instrumentation. The goal is to effectively utilize the availability of such multiple views for various purposes such as non-linear embedding, manifold learning, spectral clustering, anomaly detection and non-linear system identification. The proposed method, which is called multi-view, exploits the intrinsic relation within each view as well as the mutual relations between views. This is achieved by defining a cross-view model in which an implied random walk process is restrained to hop between objects in the different views. This multi-view method is robust to scaling and it is insensitive to small structural changes in the data. Within this framework, new diffusion distances are defined to analyze the spectra of the implied kernels. The applicability of the multi-view approach is demonstrated for clustering, classification and manifold learning using both artificial and real data. 2 The problem of learning from two views has been studied in the field of spectral clustering. Most of these studies have been focused on classification and clustering that are based on spectral characteristics of the data while using two or more sampled views. Some approaches, which address this problem, are Bilinear Model [9], Partial Least Squares [10] and Canonical Correlation Analysis [11]. These methods are powerful for learning the relation between different views but do not provide separate insights or combined into the low dimensional geometry or structure of each view. Recently, a few kernel based methods (e.g [12]) propose a model of co-regularizing kernels in both views in a way that resembles joint diagonalization. It is done by searching for an orthogonal transformation that maximizes the diagonal terms of the kernel matrices obtained from all views. A penalty term, which incorporates the disagreement between clusters from the views, was added. Their algorithm is based on alternating maximization procedure. A mixture of Markov chains is proposed in [13] to model multiple views in order to apply spectral clustering. It deals with two cases in graph theory: directed and undirected graph where the second case is related to our work. This approach converges the undirected graph problem to a Markov chains averaging where each is constructed separately within the views. A way to incorporate a given multiple metrics for the same data using a cross diffusion process is described in [14]. They define a new diffusion distance which is useful for classification, clustering or retrieval tasks. However, the proposed process is not symmetrical thus does not allow to compute an embedding. An iterative algorithm for spectral clustering is proposed in [15]. The idea is to iteratively modify each view using the representation of the other view. The problem of two manifolds, ...
No abstract
Comparing the transcriptomic landscapes of two biological conditions has become a key challenge in single cell RNA sequencing analysis. Often, we observe a strong deviation in the number of cells within certain cell subpopulations between the two conditions. Analysis of such differentially abundant (DA) subpopulations may uncover cellular processes that differentiate the biological conditions. Typical methods for identifying DA subpopulations strongly rely on known cell types or unsupervised clustering methods. Here, we develop DA-seq, a multiscale algorithm that detects DA subpopulations not restricted to well separated clusters or known cell types. We applied DA-seq to four scRNA-seq datasets as well as two simulated ones. For the former, we compare our results to previously published findings. We find that for some cases, DA-seq is able to reveal subpopulations undetected in the original studies.
People with HIV (PWH) on antiretroviral therapy (ART) experience elevated rates of neurological impairment, despite controlling for demographic factors and comorbidities, suggesting viral or neuroimmune etiologies for these deficits. Here, we apply multimodal and cross-compartmental single-cell analyses of paired cerebrospinal fluid (CSF) and peripheral blood in PWH and uninfected controls. We demonstrate that a subset of central memory CD4 + T cells in the CSF produced HIV-1 RNA, despite apparent systemic viral suppression, and that HIV-1–infected cells were more frequently found in the CSF than in the blood. Using cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq), we show that the cell surface marker CD204 is a reliable marker for rare microglia-like cells in the CSF, which have been implicated in HIV neuropathogenesis, but which we did not find to contain HIV transcripts. Through a feature selection method for supervised deep learning of single-cell transcriptomes, we find that abnormal CD8 + T cell activation, rather than CD4 + T cell abnormalities, predominated in the CSF of PWH compared with controls. Overall, these findings suggest ongoing CNS viral persistence and compartmentalized CNS neuroimmune effects of HIV infection during ART and demonstrate the power of single-cell studies of CSF to better understand the CNS reservoir during HIV infection.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.