Generative pre-trained models have achieved remarkable success in various domains such as natural language processing and computer vision. Specifically, the combination of large-scale diverse datasets and pre-trained transformers has emerged as a promising approach for developing foundation models. While texts are made up of words, cells can be characterized by genes. This analogy inspires us to explore the potential of foundation models for cell and gene biology. By leveraging the exponentially growing single-cell sequencing data, we present the first attempt to construct a single-cell foundation model through generative pre-training on over 10 million cells. We demonstrate that the generative pre-trained transformer, scGPT, effectively captures meaningful biological insights into genes and cells. Furthermore, the model can be readily fine-tuned to achieve state-of-the-art performance across a variety of downstream tasks, including multi-batch integration, multi-omic integration, cell-type annotation, genetic perturbation prediction, and gene network inference. The scGPT codebase is publicly available at https://github.com/bowang-lab/scGPT.
Single-cell sequencing has emerged as a promising technique to decode cellular heterogeneity and analyze gene functions. With the high throughput of modern techniques and resulting large-scale sequencing data, deep learning has been used extensively to learn representations of individual cells for downstream tasks. However, most existing methods rely on fully connected networks and are unable to model complex relationships between both cell and gene representations. We hereby propose scFormer, a novel transformer-based deep learning framework to jointly optimize cell and gene embeddings for single-cell biology in an unsupervised manner. By drawing parallels between natural language processing and genomics, scFormer applies self-attention to learn salient gene and cell embeddings through masked gene modelling. scFormer provides a unified framework to readily address a variety of downstream tasks such as data integration, analysis of gene function, and perturbation response prediction. Extensive experiments using scFormer show state-of-the-art performance on seven datasets across the relevant tasks. The scFormer model implementation is available at https://github.com/bowang-lab/scFormer.
The introduction of RNA velocity in single-cell studies has opened new ways of examining cell differentiation and tissue development. Existing RNA velocity estimation methods are based on strong assumptions of either complete observation of cells in steady states or a predefined dynamics pattern parameterized by constant coefficients. These assumptions are violated in complex and heterogenous single-cell sequencing datasets and thus limit the application of these techniques. Here we present DeepVelo, a novel method that predicts the cell-specific dynamics of splicing kinetics using Graph Convolution Networks (GCNs). DeepVelo generalizes RNA velocity to cell populations containing time-dependent kinetics and multiple lineages, which are common in developmental and pathological systems. We applied DeepVelo to disentangle multifaceted kinetics in the processes of dentate gyrus neurogenesis, pancreatic endocrinogenesis, and hindbrain development. DeepVelo infers time-varying cellular rates of transcription, splicing and degradation, recovers each cell's stage in the underlying differentiation process and detects putative driver genes regulating these processes. DeepVelo relaxes the constraints of previous techniques and facilitates the study of more complex differentiation and lineage decision events in heterogeneous single-cell RNA sequencing data.
The COVID-19 pandemic has highlighted the urgent need for the identification of new antiviral drug therapies for a variety of diseases. COVID-19 is caused by infection with the human coronavirus SARS-CoV-2, while other related human coronaviruses cause diseases ranging from severe respiratory infections to the common cold. We developed a computational approach to identify new antiviral drug targets and repurpose clinically-relevant drug compounds for the treatment of a range of human coronavirus diseases. Our approach is based on graph convolutional networks (GCN) and involves multiscale host-virus interactome analysis coupled to off-target drug predictions. Cell-based experimental assessment reveals several clinically-relevant drug repurposing candidates predicted by the in silico analyses to have antiviral activity against human coronavirus infection. In particular, we identify the MET inhibitor capmatinib as having potent and broad antiviral activity against several coronaviruses in a MET-independent manner, as well as novel roles for host cell proteins such as IRAK1/4 in supporting human coronavirus infection, which can inform further drug discovery studies.
Genome sequences of marine streptomycetes are valuable for the discovery of useful enzymes and bioactive compounds by genome mining. However, publicly available complete genome sequences of marine streptomycetes are still limited. Here, we present the complete genome sequence of a marine streptomycete Streptomyces sp. S063 CGMCC 14582. Species delineation based on the pairwise digital DNA-DNA hybridization and genome comparison ANI (average nucleotide identity) value showed that Streptomyces sp. S063 CGMCC 14582 possesses a unique genome that is clearly different from all of the other available genomes. Bioactivity tests showed that Streptomyces sp. S063 CGMCC 14582 produces metabolites with anti-complement activities, which are useful for treatment of numerous diseases that arise from inappropriate activation of the human complement system. Analysis of the genome reveals no biosynthetic gene cluster (BGC) which shows even low similarity to that of the known anti-complement agents was detected in the genome, indicating that Streptomyces sp. S063 CGMCC 14582 may produce novel anti-complement agents of microbial origin. Four BGCs which are potentially involved in biosynthesis of non-ribosomal peptides were disrupted, but no decrease of anti-complement activities was observed, suggesting that these four BGCs are not involved in biosynthesis of the anti-complement agents. In addition, LC-MS/MS analysis and subsequent alignment through the Global Natural Products Social Molecular Networking (GNPS) platform led to the detection of novel peptides produced by the strain. Streptomyces sp. S063 CGMCC 14582 grows rapidly and is salt tolerant, which benefits efficient secondary metabolite production via seawater-based fermentation. Our results indicate that Streptomyces sp. S063 has great potential to produce novel bioactive compounds, and also is a good host for heterologous production of useful secondary metabolites for drug discovery.
Ionizable lipid nanoparticles (LNPs) have seen widespread use in mRNA delivery for clinical applications, notably in SARS-CoV-2 mRNA vaccines. Despite their successful use, expansion of mRNA therapies beyond COVID-19 is impeded by the absence of LNPs tailored to different target cell types. The traditional process of LNP development remains labor-intensive and cost-inefficient, relying heavily on trial and error. In this study, we present the AI-Guided Ionizable Lipid Engineering (AGILE) platform, a synergistic combination of deep learning and combinatorial chemistry. AGILE streamlines the iterative development of ionizable lipids, crucial components for LNP-mediated mRNA delivery. This approach brings forth three significant features: efficient design and synthesis of combinatorial lipid libraries, comprehensive in silico lipid screening employing deep neural networks, and adaptability to diverse cell lines. Using AGILE, we were able to rapidly design, synthesize, and evaluate new ionizable lipids for mRNA delivery in muscle and immune cells, selecting from a library of over 10,000 candidates. Importantly, AGILE has revealed cell-specific preferences for ionizable lipids, indicating the need for different tail lengths and head groups for optimal delivery to varying cell types. These results underscore the potential of AGILE in expediting the development of customized LNPs. This could significantly contribute to addressing the complex needs of mRNA delivery in clinical practice, thereby broadening the scope and efficacy of mRNA therapies.
The COVID-19 pandemic has led to an urgent need for the identification of new antiviral drug therapies that can be rapidly deployed to treat patients with this disease. COVID-19 is caused by infection with the human coronavirus SARS-CoV-2. We developed a computational approach to identify new antiviral drug targets and repurpose clinically-relevant drug compounds for the treatment of COVID-19. Our approach is based on graph convolutional networks (GCN) and involves multiscale host-virus interactome analysis coupled to off-target drug predictions. Cell-based experimental assessment reveals several clinically-relevant repurposing drug candidates predicted by the in silico analyses to have antiviral activity against human coronavirus infection. In particular, we identify the MET inhibitor capmatinib as having potent and broad antiviral activity against several coronaviruses in a MET-independent manner, as well as novel roles for host cell proteins such as IRAK1/4 in supporting human coronavirus infection, which can inform further drug discovery studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.