Circulating cell-free DNA (cfDNA) in the bloodstream originates from dying cells and is a promising noninvasive biomarker for cell death. Here, we propose an algorithm, CelFiE, to accurately estimate the relative abundances of cell types and tissues contributing to cfDNA from epigenetic cfDNA sequencing. In contrast to previous work, CelFiE accommodates low coverage data, does not require CpG site curation, and estimates contributions from multiple unknown cell types that are not available in external reference data. In simulations, CelFiE accurately estimates known and unknown cell type proportions from low coverage and noisy cfDNA mixtures, including from cell types composing less than 1% of the total mixture. When used in two clinically-relevant situations, CelFiE correctly estimates a large placenta component in pregnant women, and an elevated skeletal muscle component in amyotrophic lateral sclerosis (ALS) patients, consistent with the occurrence of muscle wasting typical in these patients. Together, these results show how CelFiE could be a useful tool for biomarker discovery and monitoring the progression of degenerative disease.
Background Large medical centers in urban areas, like Los Angeles, care for a diverse patient population and offer the potential to study the interplay between genetic ancestry and social determinants of health. Here, we explore the implications of genetic ancestry within the University of California, Los Angeles (UCLA) ATLAS Community Health Initiative—an ancestrally diverse biobank of genomic data linked with de-identified electronic health records (EHRs) of UCLA Health patients (N=36,736). Methods We quantify the extensive continental and subcontinental genetic diversity within the ATLAS data through principal component analysis, identity-by-descent, and genetic admixture. We assess the relationship between genetically inferred ancestry (GIA) and >1500 EHR-derived phenotypes (phecodes). Finally, we demonstrate the utility of genetic data linked with EHR to perform ancestry-specific and multi-ancestry genome and phenome-wide scans across a broad set of disease phenotypes. Results We identify 5 continental-scale GIA clusters including European American (EA), African American (AA), Hispanic Latino American (HL), South Asian American (SAA) and East Asian American (EAA) individuals and 7 subcontinental GIA clusters within the EAA GIA corresponding to Chinese American, Vietnamese American, and Japanese American individuals. Although we broadly find that self-identified race/ethnicity (SIRE) is highly correlated with GIA, we still observe marked differences between the two, emphasizing that the populations defined by these two criteria are not analogous. We find a total of 259 significant associations between continental GIA and phecodes even after accounting for individuals’ SIRE, demonstrating that for some phenotypes, GIA provides information not already captured by SIRE. GWAS identifies significant associations for liver disease in the 22q13.31 locus across the HL and EAA GIA groups (HL p-value=2.32×10−16, EAA p-value=6.73×10−11). A subsequent PheWAS at the top SNP reveals significant associations with neurologic and neoplastic phenotypes specifically within the HL GIA group. Conclusions Overall, our results explore the interplay between SIRE and GIA within a disease context and underscore the utility of studying the genomes of diverse individuals through biobank-scale genotyping linked with EHR-based phenotyping.
Large medical centers located in urban areas such as Los Angeles care for a diverse patient population and offer the potential to study the interplay between genomic ancestry and social determinants of health within a single medical system. Here, we introduce the UCLA ATLAS Community Health Initiative-- a biobank of genomic data linked with de-identified electronic health records (EHRs) of UCLA Health patients. We leverage the unique genomic diversity of the patient population in ATLAS to explore the interplay between self-reported race/ethnicity and genetic ancestry within a disease context using phenotypes extracted from the EHR. First, we identify an extensive amount of continental and subcontinental genomic diversity within the ATLAS data that is consistent with the global diversity of Los Angeles; this includes clusters of ATLAS individuals corresponding to individuals with Korean, Japanese, Filipino, and Middle Eastern genomic ancestries. Most importantly, we find that common diseases and traits stratify across genomic ancestry clusters, thus suggesting their utility in understanding disease biology across diverse individuals. Next, we showcase the power of genetic data linked with EHR to perform ancestry-specific genome and phenome-wide scans to identify genetic factors for a variety of EHR-derived phenotypes (phecodes). For example, we find ancestry-specific associations for liver disease, and link the genetic variants with neurological and neoplastic phenotypes primarily within individuals of admixed ancestries. Overall, our results underscore the utility of studying the genomes of diverse individuals through biobank-scale genotyping efforts linked with EHR-based phenotyping.
The methylation pattern of cfDNA, isolated from liquid biopsies, is gaining substantial interest for diagnosis and monitoring of diseases. We have evaluated the impact of type of blood collection tube and time delay between blood draw and plasma preparation on bisulphite-based cfDNA methylation profiling. Fifteen tubes of blood were drawn from three healthy volunteer subjects (BD Vacutainer K2E EDTA spray tubes, Streck Cell-Free DNA BCT tubes, PAXgene Blood ccfDNA tubes, Roche Cell-Free DNA Collection tubes and Biomatrica LBgard blood tubes in triplicate). Samples were either immediately processed or stored at room temperature for 24 or 72 hours before plasma preparation. DNA fragment size was evaluated by capillary electrophoresis. Reduced representation bisulphite sequencing was performed on the cell-free DNA isolated from these plasma samples. We evaluated the impact of blood tube and time delay on several quality control metrics. All preservation tubes performed similar on the quality metrics that were evaluated. Furthermore, a considerable increase in cfDNA concentration and the fraction of it derived from NK cells was observed after a 72-hour time delay in EDTA tubes. The methylation pattern of cfDNA is robust and reproducible in between the different preservation tubes. EDTA tubes processed as soon as possible, preferably within 24 hours, are the most cost effective. If immediate processing is not possible, preservation tubes are valid alternatives.
Circulating cell-free DNA (cfDNA) in the bloodstream originates from dying cells and is a promising non-invasive biomarker for cell death. Here, we develop a method to accurately estimate the relative abundances of cell types contributing to cfDNA. We leverage the distinct DNA methylation profile of each cell type throughout the body. Decomposing the cfDNA mixture is difficult, as fragments from relevant cell types may only be present in a small amount. We propose an algorithm, CelFiE, that estimates cell type proportion from both whole genome cfDNA input and reference data. CelFiE accommodates low coverage data, does not rely on CpG site curation, and estimates contributions from multiple unknown cell types that are not available in reference data. In simulations we show that CelFiE can accurately estimate known and unknown cell type of origin of cfDNA mixtures in low coverage and noisy data. Simulations also demonstrate that we can effectively estimate cfDNA originating from rare cell types composing less than 0.01% of the total cfDNA. To validate CelFiE, we use a positive control: cfDNA extracted from pregnant and non-pregnant women. CelFiE estimates a large placenta component specifically in pregnant women (p = 9.1 × 10 −5 ). Finally, we use CelFiE to decompose cfDNA from ALS patients and age matched controls. We find increased cfDNA concentrations in ALS patients (p = 3.0 × 10 −3 ). Specifically, CelFiE estimates increased skeletal muscle component in the cfDNA of ALS patients (p = 2.6 × 10 −3 ), which is consistent with muscle impairment characterizing ALS. Quantification of skeletal muscle death in ALS is novel, and overall suggests that CelFiE may be a useful tool for biomarker discovery and monitoring of disease progression.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.