The allele-specific expression phenomenon refers to unbalanced expression from the two parental alleles in a tissue of a diploid organism. AlleleDB is a high-quality resource that reports on about 30,000 ASE variants (ASE-V) from hundreds of human samples. In this study, we present the genomic characteristics and phenotypic implications of ASE. We identified tens of segments with extreme density of ASE-V, many of them are located at the major histocompatibility complex (MHC) locus. Notably, at a resolution of 100 nucleotides, the likelihood of ASE-V increases with the density of polymorphic sites. Another dominant trend of ASE is a strong bias of the expression to the major allele. This observation relies on the known allele frequencies in the healthy human population. Overlap of ASE-V and GWAS associations was calculated for 48 phenotypes from the UK-Biobank. ASE-V were significantly associated with a risk for inflammation (e.g. asthma), autoimmunity (e.g., rheumatoid arthritis, multiple sclerosis, and type 1 diabetes) and several blood cell traits (e.g., red cell distribution width). At the level of the ASE-genes, we seek association with all traits and conditions reported in the GWAS catalog. The statistical significance of ASE-genes to GWAS catalog reveals association with the susceptibility to virus infection, autoimmunity, inflammation, allergies, blood cancer and more. We postulate that ASE determines phenotype diversity between individuals and the risk for a variety of immune-related conditions.
Sex chromosomes pose an inherent genetic imbalance between genders. In mammals, one of the female's X-chromosomes undergoes inactivation (Xi). Indirect measurements estimate that about 20% of Xi genes completely or partially escape inactivation. The identity of these escapee genes and their propensity to escape inactivation remain unsolved. A direct method for identifying escapees was applied by quantifying differential allelic expression from single cells. RNA-Seq fragments were assigned to informative SNPs which were labeled by the appropriate parental haplotype. This method was applied for measuring allelic specific expression from Chromosome-X (ChrX) and an autosomal chromosome as a control. We applied the protocol for measuring biallelic expression from ChrX to 104 primary fibroblasts. Out of 215 genes that were considered, only 13 genes (6%) were associated with biallelic expression. The sensitivity of escapees' identification was increased by combining SNP mapping for parental diploid genomes together with RNA-Seq from clonal single cells (25 lymphoblasts). Using complementary protocols, referred to as strict and relaxed, we confidently identified 25 and 31escapee genes, respectively. When pooled versions of 30 and 100 cells were used, <50% of these genes were revealed. We assessed the generality of our protocols in view of an escapee catalog compiled from indirect methods. The overlap between the escapee catalog and the genes' list from this study is statistically significant (P-value of E-07). We conclude that single cells' expression data are instrumental for studying X-inactivation with an improved sensitivity.Finally, our results support the emerging notion of the non-deterministic nature of genes that escape Xchromosome inactivation.
Summary Current technologies for single-cell transcriptomics allow thousands of cells to be analyzed in a single experiment. The increased scale of these methods raises the risk of cell doublets contamination. Available tools and algorithms for identifying doublets and estimating their occurrence in single-cell experimental data focus on doublets of different species, cell types or individuals. In this study, we analyze transcriptomic data from single cells having an identical genetic background. We claim that the ratio of monoallelic to biallelic expression provides a discriminating power toward doublets’ identification. We present a pipeline called BIallelic Ratio for Doublets (BIRD) that relies on heterologous genetic variations, from single-cell RNA sequencing. For each dataset, doublets were artificially created from the actual data and used to train a predictive model. BIRD was applied on Smart-seq data from 163 primary fibroblast single cells. The model achieved 100% accuracy in annotating the randomly simulated doublets. Bonafide doublets were verified based on a biallelic expression signal amongst X-chromosome of female fibroblasts. Data from 10X Genomics microfluidics of human peripheral blood cells achieved in average 83% (±3.7%) accuracy, and an area under the curve of 0.88 (±0.04) for a collection of ∼13 300 single cells. BIRD addresses instances of doublets, which were formed from cell mixtures of identical genetic background and cell identity. Maximal performance is achieved for high-coverage data from Smart-seq. Success in identifying doublets is data specific which varies according to the experimental methodology, genomic diversity between haplotypes, sequence coverage and depth. Supplementary information Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.