Lung cancer is the world's leading cause of cancer death with strong ancestry disparities. By sequencing and assembling the largest genomic and transcriptomic dataset of lung adenocarcinoma (LUAD) in individuals of East Asian ancestry (EAS; n = 305) to date, we found that East Asian LUADs had more stable genomes characterized by fewer mutations and less copy number alteration than LUADs from individuals of European ancestry (EUR). This difference is much stronger in smokers as compared to non-smokers. Transcriptomic clustering identified a novel EAS-specific LUAD subgroup with a less complex genomic profile and up-regulated immune-related genes, allowing the possibility of immunotherapybased approaches. Integrative analysis across clinical and molecular features showed the importance of molecular phenotypes in patient prognostic stratification. EAS LUADs had better prediction accuracy than those of European ancestry, potentially due to the less complex genomic architecture. This study elucidated a comprehensive genomic landscape of EAS LUADs and highlighted important ancestry differences between the two cohorts.
Integrating results from genome-wide association studies (GWASs) and gene expression studies through transcriptome-wide association study (TWAS) has the potential to shed light on the causal molecular mechanisms underlying disease etiology. Here, we present a probabilistic Mendelian randomization (MR) method, PMR-Egger, for TWAS applications. PMR-Egger relies on a MR likelihood framework that unifies many existing TWAS and MR methods, accommodates multiple correlated instruments, tests the causal effect of gene on trait in the presence of horizontal pleiotropy, and is scalable to hundreds of thousands of individuals. In simulations, PMR-Egger provides calibrated type I error control for causal effect testing in the presence of horizontal pleiotropic effects, is reasonably robust under various types of model misspecifications, is more powerful than existing TWAS/MR approaches, and can directly test for horizontal pleiotropy. We illustrate the benefits of PMR-Egger in applications to 39 diseases and complex traits obtained from three GWASs including the UK Biobank.
High-throughput cancer studies have been extensively conducted, searching for genetic markers associated with outcomes beyond clinical and environmental risk factors. Gene–environment interactions can have important implications beyond main effects. The commonly-adopted single-marker analysis cannot accommodate the joint effects of a large number of markers. The existing joint-effects methods also have limitations. Specifically, they may suffer from high computational cost, do not respect the “main effect, interaction” hierarchical structure, or use ineffective techniques. We develop a penalization method for the identification of important G × E interactions and main effects. It has an intuitive formulation, respects the hierarchical structure, accommodates the joint effects of multiple markers, and is computationally affordable. In numerical study, we analyze prognosis data under the AFT (accelerated failure time) model. Simulation shows satisfactory performance of the proposed method. Analysis of an NHL (non-Hodgkin lymphoma) study with SNP measurements shows that the proposed method identifies markers with important implications and satisfactory prediction performance.
The rubber tree, Hevea brasiliensis, produces natural rubber that serves as an essential industrial raw material. Here, we present a high-quality reference genome for a rubber tree cultivar GT1 using single-molecule real-time sequencing (SMRT) and Hi-C technologies to anchor the $1.47-Gb genome assembly into 18 pseudochromosomes. The chromosome-based genome analysis enabled us to establish a model of spurge chromosome evolution, since the common paleopolyploid event occurred before the split of Hevea and Manihot. We show recent and rapid bursts of the three Hevea-specific LTR-retrotransposon families during the last 10 million years, leading to the massive expansion by $65.88% ($970 Mbp) of the whole rubber tree genome since the divergence from Manihot. We identify large-scale expansion of genes associated with whole rubber biosynthesis processes, such as basal metabolic processes, ethylene biosynthesis, and the activation of polysaccharide and glycoprotein lectin, which are important properties for latex production. A map of genomic variation between the cultivated and wild rubber trees was obtained, which contains $15.7 million high-quality single-nucleotide polymorphisms. We identified hundreds of candidate domestication genes with drastically lowered genomic diversity in the cultivated but not wild rubber trees despite a relatively short domestication history of rubber tree, some of which are involved in rubber biosynthesis. This genome assembly represents key resources for future rubber tree research and breeding, providing novel targets for improving plant biotic and abiotic tolerance and rubber production.
The role of programmed cell death protein-1 (PD-1)/programmed cell death ligand 1 (PD-L1) in triple negative breast cancer (TNBC) remains to be fully understood. In this study, we investigated the role of PD-1 as a prognostic marker for TNBC in an Asian cohort (n = 269). Samples from patients with TNBC were labeled with antibodies against PD-L1 and PD-1, and subjected to NanoString assays to measure the expression of immune-related genes. Associations between disease-free survival (DFS), overall survival (OS) and biomarker expression were investigated. Multivariate analysis showed that tumors with high PD-1+ immune infiltrates harbored significantly increased DFS, and this increase was significant even after controlling for clinicopathological parameters (HR 0.95; P = 0.030). In addition, the density of cells expressing both CD8 and PD-1, but not the density of CD8−PD-1+ immune infiltrates, was associated with improved DFS. Notably, this prognostic significance was independent of clinicopathological parameters and the densities of total CD8+ cell (HR 0.43, P = 0.011). At the transcriptional level, high expression of PDCD1 within the tumor was significantly associated with improved DFS (HR 0.38; P = 0.027). In line with these findings, high expression of IFNG (HR 0.38; P = 0.001) and IFN signaling genes (HR 0.46; p = 0.027) was also associated with favorable DFS. Inclusion of PD-1 immune infiltrates and PDCD1 gene expression added significant prognostic value for DFS (ΔLRχ2 = 6.35; P = 0.041) and OS (ΔLRχ2 = 9.53; P = 0.008), beyond that provided by classical clinicopathological variables. Thus, PD-1 mRNA and protein expression status represent a promising, independent indicator of prognosis in TNBC.Electronic supplementary materialThe online version of this article (10.1186/s40425-019-0499-y) contains supplementary material, which is available to authorized users.
Summary In cancer diagnosis studies, high-throughput gene profiling has been extensively conducted, searching for genes whose expressions may serve as markers. Data generated from such studies have the “large d, small n” feature, with the number of genes profiled much larger than the sample size. Penalization has been extensively adopted for simultaneous estimation and marker selection. Because of small sample sizes, markers identified from the analysis of single datasets can be unsatisfactory. A cost-effective remedy is to conduct integrative analysis of multiple heterogeneous datasets. In this article, we investigate composite penalization methods for estimation and marker selection in integrative analysis. The proposed methods use the minimax concave penalty (MCP) as the outer penalty. Under the homogeneity model, the ridge penalty is adopted as the inner penalty. Under the heterogeneity model, the Lasso penalty and MCP are adopted as the inner penalty. Effective computational algorithms based on coordinate descent are developed. Numerical studies, including simulation and analysis of practical cancer datasets, show satisfactory performance of the proposed methods.
Type 1 diabetes (T1D) is a highly heritable disease with much lower incidence but more adult-onset cases in the Chinese population. Although genome-wide association studies (GWAS) have identified >60 T1D loci in Caucasians, less is known in Asians. RESEARCH DESIGN AND METHODS We performed the first two-stage GWAS of T1D using 2,596 autoantibody-positive T1D case subjects and 5,082 control subjects in a Chinese Han population and evaluated the associations between the identified T1D risk loci and age and fasting C-peptide levels at T1D diagnosis. RESULTS We observed a high genetic correlation between children/adolescents and adult T1D case subjects (r g = 0.87), as well as subgroups of autoantibody status (r g ‡ 0.90). We identified four T1D risk loci reaching genome-wide significance in the Chinese Han population, including two novel loci, rs4320356 near BTN3A1 (odds ratio [OR] 1.26,
In high-throughput studies, an important objective is to identify gene-environment interactions associated with disease outcomes and phenotypes. Many commonly adopted methods assume specific parametric or semiparametric models, which may be subject to model mis-specification. In addition, they usually use significance level as the criterion for selecting important interactions. In this study, we adopt the rank-based estimation, which is much less sensitive to model specification than some of the existing methods and includes several commonly encountered data and models as special cases. Penalization is adopted for the identification of gene-environment interactions. It achieves simultaneous estimation and identification and does not rely on significance level. For computation feasibility, a smoothed rank estimation is further proposed. Simulation shows that under certain scenarios, for example with contaminated or heavy-tailed data, the proposed method can significantly outperform the existing alternatives with more accurate identification. We analyze a lung cancer prognosis study with gene expression measurements under the AFT (accelerated failure time) model. The proposed method identifies interactions different from those using the alternatives. Some of the identified genes have important implications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.