Lingxue Zhu scite author profile

Genomic association studies of common or rare protein-coding variation have established robust statistical approaches to account for multiple testing. Here, we present a comparable framework to evaluate rare and de novo noncoding single nucleotide variants, insertion/deletions, and all classes of structural variation from whole-genome sequencing (WGS). Integrating genomic annotations at the level of nucleotides, genes, and regulatory regions, we define 51,801 annotation categories. Analyses of 519 autism spectrum disorder families did not identify association with any categories after correction for 4,123 effective tests. Without appropriate correction, biologically plausible associations are observed in both cases and controls. Despite excluding previously identified gene-disrupting mutations, coding regions still exhibited the strongest associations. Thus, in autism the contribution of de novo noncoding variation is probably modest compared to de novo coding variants. Robust results from future WGS studies will require large cohorts and comprehensive analytical strategies that consider the substantial multiple testing burden.

show abstract

Genome-wide de novo risk score implicates promoter variation in autism spectrum disorder

Lin

Zhu

et al. 2018

Science

265

286

View full text Add to dashboard Cite

Whole-genome sequencing (WGS) has facilitated the first genome-wide evaluations of the contribution of de novo noncoding mutations to complex disorders. Using WGS, we assess genetic variation from 7,608 samples in 1,902 autism spectrum disorder (ASD) families, identifying 255,106 de novo mutations. In contrast to coding mutations, no noncoding functional annotation category, analyzed in isolation, is significantly associated with ASD. Casting noncoding variation in the context of a de novo risk score across multiple annotation categories, however, does demonstrate association with mutations localized to promoter regions. The strongest driver of this promoter signal emanates from evolutionarily conserved transcription factor binding sites distal to the transcription start site. These data suggest that de novo mutations in promoter regions, characterized by evolutionary and functional signatures, contribute to ASD.

show abstract

Deep and Confident Prediction for Time Series at Uber

2017

View full text Add to dashboard Cite

Reliable uncertainty estimation for time series prediction is critical in many fields, including physics, biology, and manufacturing. At Uber, probabilistic time series forecasting is used for robust prediction of number of trips during special events, driver incentive allocation, as well as real-time anomaly detection across millions of metrics. Classical time series models are often used in conjunction with a probabilistic formulation for uncertainty estimation. However, such models are hard to tune, scale, and add exogenous variables to. Motivated by the recent resurgence of Long Short Term Memory networks, we propose a novel end-to-end Bayesian deep model that provides time series prediction along with uncertainty estimation. We provide detailed experiments of the proposed solution on completed trips data, and successfully apply it to large-scale time series anomaly detection at Uber.Comment: To appear in DSBDA-2017 @ ICDM'1

show abstract

Preasymptotic Error Analysis of CIP-FEM and FEM for Helmholtz Equation with High Wave Number. Part II: $hp$ Version

Zhu

2013

SIAM J. Numer. Anal.

114

View full text Add to dashboard Cite

In this paper, which is the second in a series of two, the preasymptotic error analysis of the continuous interior penalty finite element method (CIP-FEM) and the FEM for the Helmholtz equation in two and three dimensions is continued. While Part I contained results on the linear CIP-FEM and FEM, the present part deals with approximation spaces of order p ≥ 1. By using a modified duality argument, preasymptotic error estimates are derived for both methods under the condition of kh, where k is the wave number, h is the mesh size, and C 0 is a constant independent of k, h, p, and the penalty parameters. It is shown that the pollution errors of both methods inif the exact solution u ∈ H 2 (Ω) which coincide with existent dispersion analyses for the FEM on Cartesian grids. Here σ is a constant independent of k, h, p and the penalty parameters. Moreover, it is proved that the CIP-FEM is stable for any k, h, p > 0 and penalty parameters with positive imaginary parts. Besides the advantage of the absolute stability of the CIP-FEM compared to the FEM, the penalty parameters may be tuned to reduce the pollution effects.Key words. Helmholtz equation, large wave number, preasymptotic error estimates, continuous interior penalty finite element methods, finite element methods

show abstract

A unified statistical framework for single cell and bulk RNA sequencing data

Zhu¹,

Lei²,

Devlin³

et al. 2018

Ann. Appl. Stat.

View full text Add to dashboard Cite

Recent advances in technology have enabled the measurement of RNA levels for individual cells. Compared to traditional tissue-level bulk RNA-seq data, single cell sequencing yields valuable insights about gene expression profiles for different cell types, which is potentially critical for understanding many complex human diseases. However, developing quantitative tools for such data remains challenging because of high levels of technical noise, especially the “dropout” events. A “dropout” happens when the RNA for a gene fails to be amplified prior to sequencing, producing a “false” zero in the observed data. In this paper, we propose a Unified RNA-Sequencing Model (URSM) for both single cell and bulk RNA-seq data, formulated as a hierarchical model. URSM borrows the strength from both data sources and carefully models the dropouts in single cell data, leading to a more accurate estimation of cell type specific gene expression profile. In addition, URSM naturally provides inference on the dropout entries in single cell data that need to be imputed for downstream analyses, as well as the mixing proportions of different cell types in bulk samples. We adopt an empirical Bayes’ approach, where parameters are estimated using the EM algorithm and approximate inference is obtained by Gibbs sampling. Simulation results illustrate that URSM outperforms existing approaches both in correcting for dropouts in single cell data, as well as in deconvolving bulk samples. We also demonstrate an application to gene expression data on fetal brains, where our model successfully imputes the dropout genes and reveals cell type specific expression patterns.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Lingxue Zhu

An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder

Genome-wide de novo risk score implicates promoter variation in autism spectrum disorder

Deep and Confident Prediction for Time Series at Uber

Preasymptotic Error Analysis of CIP-FEM and FEM for Helmholtz Equation with High Wave Number. Part II: $hp$ Version

A unified statistical framework for single cell and bulk RNA sequencing data

Contact Info

Product

Resources

About