BackgroundThe single cell RNA sequencing (scRNA-seq) technique begin a new era by allowing the observation of gene expression at the single cell level. However, there is also a large amount of technical and biological noise. Because of the low number of RNA transcriptomes and the stochastic nature of the gene expression pattern, there is a high chance of missing nonzero entries as zero, which are called dropout events.ResultsWe develop DrImpute to impute dropout events in scRNA-seq data. We show that DrImpute has significantly better performance on the separation of the dropout zeros from true zeros than existing imputation algorithms. We also demonstrate that DrImpute can significantly improve the performance of existing tools for clustering, visualization and lineage reconstruction of nine published scRNA-seq datasets.ConclusionsDrImpute can serve as a very useful addition to the currently existing statistical tools for single cell RNA-seq analysis. DrImpute is implemented in R and is available at https://github.com/gongx030/DrImpute.Electronic supplementary materialThe online version of this article (10.1186/s12859-018-2226-y) contains supplementary material, which is available to authorized users.
Automated image acquisition, a custom analysis algorithm, and a distributed computing resource were used to add time as a third dimension to a quantitative trait locus (QTL) map for plant root gravitropism, a model growth response to an environmental cue. Digital images of Arabidopsis thaliana seedling roots from two independently reared sets of 162 recombinant inbred lines (RILs) and one set of 92 near isogenic lines (NILs) derived from a Cape Verde Islands (Cvi) × Landsberg erecta (Ler) cross were collected automatically every 2 min for 8 hr following induction of gravitropism by 90° reorientation of the sample. High-throughput computing (HTC) was used to measure root tip angle in each of the 1.1 million images acquired and perform statistical regression of tip angle against the genotype at each of the 234 RIL or 102 NIL DNA markers independently at each time point using a standard stepwise procedure. Time-dependent QTL were detected on chromosomes 1, 3, and 4 by this mapping method and by an approach developed to treat the phenotype time course as a function-valued trait. The QTL on chromosome 4 was earliest, appearing at 0.5 hr and remaining significant for 5 hr, while the QTL on chromosome 1 appeared at 3 hr and thereafter remained significant. The Cvi allele generally had a negative effect of 2.6–4.0%. Heritability due to the QTL approached 25%. This study shows how computer vision and statistical genetic analysis by HTC can characterize the developmental timing of genetic architectures.
Most statistical methods for quantitative trait loci (QTL) mapping focus on a single phenotype. However, multiple phenotypes are commonly measured, and recent technological advances have greatly simplified the automated acquisition of numerous phenotypes, including function-valued phenotypes, such as growth measured over time. While methods exist for QTL mapping with function-valued phenotypes, they are generally computationally intensive and focus on single-QTL models. We propose two simple, fast methods that maintain high power and precision and are amenable to extensions with multiple-QTL models using a penalized likelihood approach. After identifying multiple QTL by these approaches, we can view the function-valued QTL effects to provide a deeper understanding of the underlying processes. Our methods have been implemented as a package for R, funqtl.
In spite of the success of genome-wide association studies (GWASs), only a small proportion of heritability for each complex trait has been explained by identified genetic variants, mainly SNPs. Likely reasons include genetic heterogeneity (i.e., multiple causal genetic variants) and small effect sizes of causal variants, for which pathway analysis has been proposed as a promising alternative to the standard single-SNP-based analysis. A pathway contains a set of functionally related genes, each of which includes multiple SNPs. Here we propose a pathway-based test that is adaptive at both the gene and SNP levels, thus maintaining high power across a wide range of situations with varying numbers of the genes and SNPs associated with a trait. The proposed method is applicable to both common variants and rare variants and can incorporate biological knowledge on SNPs and genes to boost statistical power. We use extensively simulated data and a WTCCC GWAS dataset to compare our proposal with several existing pathway-based and SNP-set-based tests, demonstrating its promising performance and its potential use in practice.
Supplementary data are available at Bioinformatics online.
We previously proposed a simple regression-based method to map quantitative trait loci underlying function-valued phenotypes. In order to better handle the case of noisy phenotype measurements and accommodate the correlation structure among time points, we propose an alternative approach that maintains much of the simplicity and speed of the regression-based method. We overcome noisy measurements by replacing the observed data with a smooth approximation. We then apply functional principal component analysis, replacing the smoothed phenotype data with a small number of principal components. Quantitative trait locus mapping is applied to these dimension-reduced data, either with a multi-trait method or by considering the traits individually and then taking the average or maximum LOD score across traits. We apply these approaches to root gravitropism data on Arabidopsis recombinant inbred lines and further investigate their performance in computer simulations. Our methods have been implemented in the R package, funqtl.
Benchmarked approaches for reconstruction of in vitro cell lineages and in silico models of C. elegans and M. musculus developmental trees Graphical abstract Highlights d We organized a DREAM challenge to benchmark methods of cell lineage reconstruction d Using experimental, in silico datasets as ground-truth trees of 10 2 , 10 3 , and 10 4 cells d Smaller trees allowed the training of a machine-learning decision tree approach d These results delineate a potential way forward for solving larger cell lineage trees
The success of high resolution genetic mapping of disease predisposition and quantitative trait loci in humans and experimental animals depends on the positions of key crossover events around the gene of interest. In mammals, the majority of recombination occurs at highly delimited 1–2 kb long sites known as recombination hotspots, whose locations and activities are distributed unevenly along the chromosomes and are tightly regulated in a sex specific manner. The factors determining the location of hotspots started to emerge with the finding of PRDM9 as a major hotspot regulator in mammals, however, additional factors modulating hotspot activity and sex specificity are yet to be defined. To address this limitation, we have collected and mapped the locations of 4829 crossover events occurring on mouse chromosome 11 in 5858 meioses of male and female reciprocal F1 hybrids of C57BL/6J and CAST/EiJ mice. This chromosome was chosen for its medium size and high gene density and provided a comparison with our previous analysis of recombination on the longest mouse chromosome 1. Crossovers were mapped to an average resolution of 127 kb, and thirteen hotspots were mapped to <8 kb. Most crossovers occurred in a small number of the most active hotspots. Females had higher recombination rate than males as a consequence of differences in crossover interference and regional variation of sex specific rates along the chromosome. Comparison with chromosome 1 showed that recombination events tend to be positioned in similar fashion along the centromere-telomere axis but independently of the local gene density. It appears that mammalian recombination is regulated on at least three levels, chromosome-wide, regional, and at individual hotspots, and these regulation levels are influenced by sex and genetic background but not by gene content.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.