Qihuang Zhang scite author profile

Cell-type composition of intact bulk tissues can vary across samples. Deciphering cell-type composition and its changes during disease progression is an important step toward understanding disease pathogenesis. To infer cell-type composition, existing cell-type deconvolution methods for bulk RNA sequencing (RNA-seq) data often require matched single-cell RNA-seq (scRNA-seq) data, generated from samples with similar clinical conditions, as reference. However, due to the difficulty of obtaining scRNA-seq data in diseased samples, only limited scRNA-seq data in matched disease conditions are available. Using scRNA-seq reference to deconvolve bulk RNA-seq data from samples with different disease conditions may lead to a biased estimation of cell-type proportions. To overcome this limitation, we propose an iterative estimation procedure, MuSiC2, which is an extension of MuSiC, to perform deconvolution analysis of bulk RNA-seq data generated from samples with multiple clinical conditions where at least one condition is different from that of the scRNA-seq reference. Extensive benchmark evaluations indicated that MuSiC2 improved the accuracy of cell-type proportion estimates of bulk RNA-seq samples under different conditions as compared with the traditional MuSiC deconvolution. MuSiC2 was applied to two bulk RNA-seq datasets for deconvolution analysis, including one from human pancreatic islets and the other from human retina. We show that MuSiC2 improves current deconvolution methods and provides more accurate cell-type proportion estimates when the bulk and single-cell reference differ in clinical conditions. We believe the condition-specific cell-type composition estimates from MuSiC2 will facilitate the downstream analysis and help identify cellular targets of human diseases.

show abstract

Model-based forecasting for Canadian COVID-19 data

Chen

Zhang

et al. 2021

PLoS ONE

View full text Add to dashboard Cite

Background Since March 11, 2020 when the World Health Organization (WHO) declared the COVID-19 pandemic, the number of infected cases, the number of deaths, and the number of affected countries have climbed rapidly. To understand the impact of COVID-19 on public health, many studies have been conducted for various countries. To complement the available work, in this article we examine Canadian COVID-19 data for the period of March 18, 2020 to August 16, 2020 with the aim to forecast the dynamic trend in a short term. Method We focus our attention on Canadian data and analyze the four provinces, Ontario, Alberta, British Columbia, and Quebec, which have the most severe situations in Canada. To build predictive models and conduct prediction, we employ three models, smooth transition autoregressive (STAR) models, neural network (NN) models, and susceptible-infected-removed (SIR) models, to fit time series data of confirmed cases in the four provinces separately. In comparison, we also analyze the data of daily infections in two states of USA, Texas and New York state, for the period of March 18, 2020 to August 16, 2020. We emphasize that different models make different assumptions which are basically difficult to validate. Yet invoking different models allows us to examine the data from different angles, thus, helping reveal the underlying trajectory of the development of COVID-19 in Canada. Finding The examinations of the data dated from March 18, 2020 to August 11, 2020 show that the STAR, NN, and SIR models may output different results, though the differences are small in some cases. Prediction over a short term period incurs smaller prediction variability than over a long term period, as expected. The NN method tends to outperform other two methods. All the methods forecast an upward trend in all the four Canadian provinces for the period of August 12, 2020 to August 23, 2020, though the degree varies from method to method. This research offers model-based insights into the pandemic evolvement in Canada.

show abstract

Multiclass analysis and prediction with network structured covariates

Chen

Zhang

et al. 2019

J Stat Distrib App

View full text Add to dashboard Cite

Technological advances associated with data acquisition are leading to the production of complex structured data sets. The recent development on classification with multiclass responses makes it possible to incorporate the dependence structure of predictors. The available methods, however, are hindered by the restrictive requirements. Those methods basically assume a common network structure for predictors of all subjects without taking into account the heterogeneity existing in different classes. Furthermore, those methods mainly focus on the case where the distribution of predictors is normal. In this paper, we propose classification methods which address these limitations. Our methods are flexible in handling possibly class-dependent network structures of variables and allow the predictors to follow a distribution in the exponential family which includes normal distributions as a special case. Our methods are computationally easy to implement. Numerical studies are conducted to demonstrate the satisfactory performance of the proposed methods.

show abstract

Genetic association studies with bivariate mixed responses subject to measurement error and misclassification

Zhang

2020

Statistics in Medicine

View full text Add to dashboard Cite

In genetic association studies, mixed effects models have been widely used in detecting the pleiotropy effects which occur when one gene affects multiple phenotype traits. In particular, bivariate mixed effects models are useful for describing the association of a gene with a continuous trait and a binary trait. However, such models are inadequate to feature the data with response mismeasurement, a characteristic that is often overlooked. It has been well studied that in univariate settings, ignorance of mismeasurement in variables usually results in biased estimation. In this paper, we consider the setting with a bivariate outcome vector which contains a continuous component and a binary component both subject to mismeasurement. We propose an induced likelihood approach and an EM algorithm method to handle measurement error in continuous response and misclassification in binary response simultaneously. Simulation studies confirm that the proposed methods successfully remove the bias induced from the response mismeasurement.

show abstract

Estimating the Effects of Non-Pharmaceutical Interventions and Population Mobility on Daily COVID-19 Cases: Evidence from Ontario

Stevens

Sen

Kiwon

et al. 2022

Canadian Public Policy

View full text Add to dashboard Cite

This study employs COVID-19 case counts and Google mobility data for twelve of Ontario’s largest Public Health Units from Spring 2020 until the end of January 2021 to evaluate the effects of Non-Pharmaceutical Interventions (NPIs: policy restrictions on business operations and social gatherings) and population mobility on daily cases. Instrumental Variables (IV) estimation is used to account for potential simultaneity bias, as both daily COVID-19 cases and NPIs are dependent on lagged case numbers. IV estimates based on differences in lag lengths to infer causal estimates, imply that the implementation of stricter NPIs and indoor mask mandates are associated with COVID-19 case reductions. Further, estimates based on Google mobility data suggest that increases in workplace attendance are correlated with higher case counts. Finally, from October 2020 to January 2021, daily Ontario forecasts from Box-Jenkins time-series models are more accurate than official forecasts and forecasts from a Susceptible-Infected-Removed (SIR) epidemiology model.

show abstract

Awareness of the Harms of Continued Smoking Among Cancer Survivors

Eng

Alton

Song

et al. 2019

Support Care Cancer

View full text Add to dashboard Cite

Leveraging spatial transcriptomics data to recover cell locations in single-cell RNA-seq with CeLEry

et al. 2023

View full text Add to dashboard Cite

Single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of cellular heterogeneity in health and disease. However, the lack of physical relationships among dissociated cells has limited its applications. To address this issue, we present CeLEry (Cell Location recovEry), a supervised deep learning algorithm that leverages gene expression and spatial location relationships learned from spatial transcriptomics to recover the spatial origins of cells in scRNA-seq. CeLEry has an optional data augmentation procedure via a variational autoencoder, which improves the method’s robustness and allows it to overcome noise in scRNA-seq data. We show that CeLEry can infer the spatial origins of cells in scRNA-seq at multiple levels, including 2D location and spatial domain of a cell, while also providing uncertainty estimates for the recovered locations. Our comprehensive benchmarking evaluations on multiple datasets generated from brain and cancer tissues using Visium, MERSCOPE, MERFISH, and Xenium demonstrate that CeLEry can reliably recover the spatial location information for cells using scRNA-seq data.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Qihuang Zhang

Body mass index and prognosis in patients with head and neck cancer

MuSiC2: cell-type deconvolution for multi-condition bulk RNA-seq data

Model-based forecasting for Canadian COVID-19 data

Multiclass analysis and prediction with network structured covariates

Genetic association studies with bivariate mixed responses subject to measurement error and misclassification

Estimating the Effects of Non-Pharmaceutical Interventions and Population Mobility on Daily COVID-19 Cases: Evidence from Ontario

Awareness of the Harms of Continued Smoking Among Cancer Survivors

Leveraging spatial transcriptomics data to recover cell locations in single-cell RNA-seq with CeLEry

Contact Info

Product

Resources

About