We present GuideScan software for the design of CRISPR guide RNA libraries that can be used to edit coding and noncoding genomic regions. GuideScan produces high-density sets of gRNAs for single- and paired-gRNA genome-wide screens. We also show that by using a trie data structure GuideScan designs gRNAs that are more specific than those designed by existing tools.
Here we present HiC-DC, a principled method to estimate the statistical significance (P values) of chromatin interactions from Hi-C experiments. HiC-DC uses hurdle negative binomial regression account for systematic sources of variation in Hi-C read counts—for example, distance-dependent random polymer ligation and GC content and mappability bias—and model zero inflation and overdispersion. Applied to high-resolution Hi-C data in a lymphoblastoid cell line, HiC-DC detects significant interactions at the sub-topologically associating domain level, identifying potential structural and regulatory interactions supported by CTCF binding sites, DNase accessibility, and/or active histone marks. CTCF-associated interactions are most strongly enriched in the middle genomic distance range (∼700 kb–1.5 Mb), while interactions involving actively marked DNase accessible elements are enriched both at short (<500 kb) and longer (>1.5 Mb) genomic distances. There is a striking enrichment of longer-range interactions connecting replication-dependent histone genes on chromosome 6, potentially representing the chromatin architecture at the histone locus body.
Decoding transcription factor (TF) binding signals in genomic DNA is a fundamental problem. Here we present a prediction model called BindSpace that learns to embed DNA sequences and TF class/family labels into the same space. By training on binding data for hundreds of TFs and embedding over 1M DNA sequences, BindSpace achieves state-of-the-art multiclass binding prediction performance, in vitro and in vivo , and can distinguish signals of closely related TFs.
Summary A significant challenge of functional genomics is to develop methods for genome-scale acquisition and analysis of cell biological data. Here, we present an integrated method that combines genome-wide genetic perturbation of Saccharomyces cerevisiae with high-content screening to facilitate the genetic description of sub-cellular structures and compartment morphology. As proof-of-principle, we used a Rad52-GFP marker to examine DNA damage foci in ~20 million single cells from ~5000 different mutant backgrounds in the context of selected genetic or chemical perturbations. Phenotypes were classified using a machine learning-based automated image analysis pipeline. 345 mutants were identified that had elevated numbers of DNA damage foci, almost half of which were identified only in sensitized backgrounds. Subsequent analysis of Vid22, a protein implicated in the DNA damage response, revealed that it acts together with the Sgs1 helicase at sites of DNA damage, and preferentially binds G-quadruplex regions of the genome. This approach is extensible to numerous other cell biological markers and experimental systems.
Explainability for machine learning models has gained considerable attention within our research community given the importance of deploying more reliable machinelearning systems. In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction, providing details about the model's decision-making. Current counterfactual methods make ambiguous interpretations as they combine multiple biases of the model and the data in a single counterfactual interpretation of the model's decision. Moreover, these methods tend to generate trivial counterfactuals about the model's decision, as they often suggest to exaggerate or remove the presence of the attribute being classified. For the machine learning practitioner, these types of counterfactuals offer little value, since they provide no new information about undesired model or data biases. In this work, we propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss to uncover multiple valuable explanations about the model's prediction. Further, we introduce a mechanism to prevent the model from producing trivial explanations. Experiments on CelebA and Synbols demonstrate that our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods. We will publish the code.
Motivation: Fluorescence imaging has become a commonplace for quantitatively measuring mRNA or protein expression in cells and tissues. However, such expression data are usually relative—absolute concentrations or molecular copy numbers are typically not known. While this is satisfactory for many applications, for certain kinds of quantitative network modeling and analysis of expression noise, absolute measures of expression are necessary.Results: We propose two methods for estimating molecular copy numbers from single uncalibrated expression images of tissues. These methods rely on expression variability between cells, due either to steady-state fluctuations or unequal distribution of molecules during cell division, to make their estimates. We apply these methods to 152 protein fluorescence expression images of Drosophila melanogaster embryos during early development, generating copy number estimates for 14 genes in the segmentation network. We also analyze the effects of noise on our estimators and compare with empirical findings. Finally, we confirm an observation of Bar-Even et al., made in the much different setting of Saccharomyces cerevisiae, that steady-state expression variance tends to scale with mean expression.Availability: The data are all drawn from FlyEx (explained within), and is available at http://flyex.ams.sunysb.edu/FlyEx/. Data and MATLAB codes for all algorithms described in this article are available at http://www.perkinslab.ca/pubs/ZP2009.html.Contact: tperkins@ohri.ca
About one-fifth of the genes in the budding yeast are essential for haploid viability and cannot be functionally assessed using standard genetic approaches such as gene deletion. To facilitate genetic analysis of essential genes, we and others have assembled collections of yeast strains expressing temperature-sensitive (ts) alleles of essential genes. To explore the phenotypes caused by essential gene mutation we used a panel of genetically engineered fluorescent markers to explore the morphology of cells in the ts strain collection using high-throughput microscopy. Here, we describe the design and implementation of an online database, PhenoM (Phenomics of yeast Mutants), for storing, retrieving, visualizing and data mining the quantitative single-cell measurements extracted from micrographs of the ts mutant cells. PhenoM allows users to rapidly search and retrieve raw images and their quantified morphological data for genes of interest. The database also provides several data-mining tools, including a PhenoBlast module for phenotypic comparison between mutant strains and a Gene Ontology module for functional enrichment analysis of gene sets showing similar morphological alterations. The current PhenoM version 1.0 contains 78 194 morphological images and 1 909 914 cells covering six subcellular compartments or structures for 775 ts alleles spanning 491 essential genes. PhenoM is freely available at http://phenom.ccbr.utoronto.ca/.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.