Most human genetic variation is classified as variants of uncertain significance. While advances in genome editing have allowed innovation in pooled screening platforms, many screens deal with relatively simple readouts (viability, fluorescence) and cannot identify the complex cellular phenotypes that underlie most human diseases. In this paper, we present a generalizable functional genomics platform that combines high-content imaging, machine learning, and microraft isolation in a method termed “Raft-Seq”. We highlight the efficacy of our platform by showing its ability to distinguish pathogenic point mutations of the mitochondrial regulator Mitofusin 2, even when the cellular phenotype is subtle. We also show that our platform achieves its efficacy using multiple cellular features, which can be configured on-the-fly. Raft-Seq enables a way to perform pooled screening on sets of mutations in biologically relevant cells, with the ability to physically capture any cell with a perturbed phenotype and expand it clonally, directly from the primary screen.
Protein domain-based approaches to analyzing sequence data are valuable tools for examining and exploring genomic architecture across genomes of different organisms. Here, we present a complete dataset of domains from the publicly available sequence data of 9,051 reference viral genomes. The data provided contain information such as sequence position and neighboring domains from 30,947 pHMM-identified domains from each reference viral genome. Domains were identified from viral wholegenome sequence using automated profile Hidden Markov Models (pHMM). This study also describes the framework for constructing "domain neighborhoods", as well as the dataset representing it. These data can be used to examine shared and differing domain architectures across viral genomes, to elucidate potential functional properties of genes, and potentially to classify viruses. Background and Summary Advancements in sequencing technology and the construction of large, publicly available genomic databases have widely expanded the potential for comparative genomics and discovery. But in viruses and bacteria, even protein-coding genomic regions are difficult to functionally characterize. Take E. coli, the best-studied bacteria, where one third of the proteome consists of proteins of unknown function. Here, we ask if (1) genomes can be decomposed into a series of functional building blocks that (2) do not rely on annotated genes and that (3) can be used to classify new species or genes, and if (4) protein domains can serve as these building blocks. Automatically defined protein domains provide just such building blocks and allow the decoding of some of this ambiguity across genomes. This approach will be based off of the identification of viral domains using profile Hidden Markov models (pHMM) with HMMER3 http://hmmer.org/, v3.2.1 1. Unlike sequence alignment, pHMMs are able to link two extremely divergent sequences that belong to the same type of protein domain. We referenced the profile databases PFAM 2 , vFAM 3 , and pVOG 4. Although vFAM and pVOG have not been updated as recently as PFAM, they include many viral-associated domains not found in PFAM. The contents of these three profile-HMM databases form the "PFAM database" referred to throughout this manuscript. Here, we describe the construction of a reference-virus-complete, genome-wide, domain-based database. Domains are identified from the genome sequence, and domain-based "neighborhoods" are constructed. We describe this new dataset, comprising 9,051 viruses, and show some examples of novel queries to answer new biological questions that can be applied to any genome or set of genomes. Domain-based approaches have been previously used in functional studies of mammalian genes, characterization and identification of pathogenic viruses, and phylogenetic analysis in bacteria 5-8. Dissecting the domains of novel proteins has led both to a better evolutionary understanding of the driving forces of the genes 5 , insights into taxonomic characterization and evolution 9,10 and to the discove...
Most human genetic variation is classified as VUS - variants of uncertain significance. While advances in genome editing have allowed innovation in pooled screening platforms, many screens deal with relatively simple readouts (viability, fluorescence) and cannot identify the complex cellular phenotypes that underlie most human diseases. In this paper, we present a generalizable functional genomics platform that combines high-content imaging, machine learning, and microraft isolation in a new method termed “Raft-Seq”. We highlight the efficacy of our platform by showing its ability to distinguish pathogenic point mutations of the mitochondrial regulator MFN2, even when the cellular phenotype is subtle. We also show that our platform achieves its efficacy using multiple cellular features, which can be configured on-the-fly. Raft-Seq enables a new way to perform pooled screening on sets of mutations in biologically relevant cells, with the ability to physically capture any cell with a perturbed phenotype and expand it clonally, directly from the primary screen.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.