Gene expression in mammals is regulated by noncoding elements that can impact physiology and disease, yet the functions and target genes of most noncoding elements remain unknown. We present a high-throughput approach that uses CRISPR interference (CRISPRi) to discover regulatory elements and identify their target genes. We assess >1 megabase (Mb) of sequence in the vicinity of 2 essential transcription factors, MYC and GATA1, and identify 9 distal enhancers that control gene expression and cellular proliferation. Quantitative features of chromatin state and chromosome conformation distinguish the 7 enhancers that regulate MYC from other elements that do not, suggesting a strategy for predicting enhancer-promoter connectivity. This CRISPRi-based approach can be applied to dissect transcriptional networks and interpret the contributions of noncoding genetic variation to human disease.
Highlights d Optimal transport analysis recovers trajectories from 315,000 scRNA-seq profiles d Induced pluripotent stem cell reprogramming produces diverse developmental programs d Regulatory analysis identifies a series of TFs predictive of specific cell fates d Transcription factor Obox6 and cytokine GDF9 increase reprogramming efficiency
Because microbial plankton in the ocean comprise diverse bacteria, algae, and protists that are subject to environmental forcing on multiple spatial and temporal scales, a fundamental open question is to what extent these organisms form ecologically cohesive communities. Here we show that although all taxa undergo large, near daily fluctuations in abundance, microbial plankton are organized into clearly defined communities whose turnover is rapid and sharp. We analyze a time series of 93 consecutive days of coastal plankton using a technique that allows inference of communities as modular units of interacting taxa by determining positive and negative correlations at different temporal frequencies. This approach shows both coordinated population expansions that demarcate community boundaries and high frequency of positive and negative associations among populations within communities. Our analysis thus highlights that the environmental variability of the coastal ocean is mirrored in sharp transitions of defined but ephemeral communities of organisms.
Analyses of metagenomic datasets that are sequenced to a depth of billions or trillions of bases can uncover hundreds of microbial genomes, but naive assembly of these data is computationally intensive, requiring hundreds of gigabytes to terabytes of RAM. This is a bottleneck in many studies, especially when very deep sequencing is needed to detect low-abundance species and separate strains of the same species. We present Latent Strain Analysis (LSA), a scalable, de novo pre-assembly method that separates reads into biologically informed partitions and thereby enables assembly of individual genomes. LSA is implemented with a streaming calculation of unobserved variables that we call eigengenomes. Eigengenomes reflect covariance in the abundance of short, fixed length sequences, or “k-mers”. Since the abundance of each genome in a sample is reflected in the abundance of each k-mer in that genome, eigengenome analysis can be used to partition reads from different genomes. This partitioning can be done in fixed memory using tens of gigabytes of RAM, which makes assembly and downstream analyses of terabytes of data feasible on commodity hardware. Using LSA, we assemble partial and near-complete genomes of bacterial taxa present at relative abundances as low as 0.00001%. We also show that LSA is sensitive enough to separate reads from several strains of the same species.
NOTE: This protocol has not been validated with clinical samples. To facilitate collaborations with interested parties to jointly advance the fight against the current coronavirus pandemic, wehave set up a public forum on www.LAMP-Seq.org.
Summary
RNA profiles are an informative phenotype of cellular and tissue states, but can be costly to generate at massive scale. Here, we describe how gene expression levels can be efficiently acquired with random composite measurements – in which abundances are combined in a random weighted sum. We show that the similarity between pairs of expression profiles can be approximated with very few composite measurements; that by leveraging sparse, modular representations of gene expression we can use random composite measurements to recover high-dimensional gene expression levels (with 100 times fewer measurements than genes); and that it is possible to blindly recover gene expression from composite measurements, even without access to training data. Our results suggest new compressive modalities as a foundation for massive scaling in high-throughput measurements, and new insights into the interpretation of high-dimensional data.
Resistance to immune checkpoint inhibitors (ICI) that activate T cell mediated anti-tumor immunity is a key challenge in cancer therapy, yet the underlying mechanisms remain poorly understood. To further elucidate those, we developed a new approach, Perturb-CITE-seq, for pooled CRISPR perturbation screens with multi-modal RNA and protein single-cell profiling readout and applied it to screen patient-derived autologous melanoma and tumor infiltrating lymphocyte (TIL) co-cultures. We profiled RNA and 20 surface proteins in over 218,000 cells under ~750 perturbations, chosen by their membership in an immune evasion program that is associated with immunotherapy resistance in patients. Our screen recovered clinically-relevant resistance mechanisms concordantly reflected in RNA, protein and perturbation effects on susceptibility to T cell mediated killing. These were organized in eight co-functional modules whose perturbation distinctly affect four co-regulated programs associated with immune evasion. Among these were defects in the IFNγ-JAK/STAT pathway and in antigen presentation, and several novel mechanisms, including loss or downregulation of CD58, a surface protein without known mouse homolog. Leveraging the rich profiles in our screen, we found that loss of CD58 did not compromise MHC protein expression and that CD58 was not transcriptionally induced by the IFNγ pathway, allowing us to distinguish it as a novel mechanism of immune resistance.We further show that loss of CD58 on cancer cells conferred immune evasion across multiple T cell and Natural Killer cell patient co-culture models. Notably, CD58 is downregulated in tumors with resistance to immunotherapy in melanoma patients. Our work identifies novel mechanisms at the nexus of immune evasion and drug resistance and provides a general framework for deciphering complex mechanisms by large-scale perturbation screens with multi-modal singlecell profiles, including in systems consisting of multiple cell types.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.