Purpose: Spinal muscular atrophy (SMA), caused by loss of the SMN1 gene, is a leading cause of early childhood death. Due to the near identical sequences of SMN1 and SMN2, analysis of this region is challenging. Population-wide SMA screening to quantify the SMN1 copy number (CN) is recommended by the American College of Medical Genetics and Genomics. Methods:We developed a method that accurately identifies the CN of SMN1 and SMN2 using genome sequencing (GS) data by analyzing read depth and eight informative reference genome differences between SMN1/2. Results:We characterized SMN1/2 in 12,747 genomes, identified 1568 samples with SMN1 gains or losses and 6615 samples with SMN2 gains or losses, and calculated a pan-ethnic carrier frequency of 2%, consistent with previous studies. Additionally, 99.8% of our SMN1 and 99.7% of SMN2 CN calls agreed with orthogonal methods, with a recall of 100% for SMA and 97.8% for carriers, and a precision of 100% for both SMA and carriers. Conclusion:This SMN copy-number caller can be used to identify both carrier and affected status of SMA, enabling SMA testing to be offered as a comprehensive test in neonatal care and an accurate carrier screening tool in GS sequencing projects.Genetics in Medicine (2020) 22:945-953; https://doi.
Responsible for the metabolism of ~21% of clinically used drugs, CYP2D6 is a critical component of personalized medicine initiatives. Genotyping CYP2D6 is challenging due to sequence similarity with its pseudogene paralog CYP2D7 and a high number and variety of common structural variants (SVs). Here we describe a novel bioinformatics method, Cyrius, that accurately genotypes CYP2D6 using whole-genome sequencing (WGS) data. We show that Cyrius has superior performance (96.5% concordance with truth genotypes) compared to existing methods (84–86.8%). After implementing the improvements identified from the comparison against the truth data, Cyrius’s accuracy has since been improved to 99.3%. Using Cyrius, we built a haplotype frequency database from 2504 ethnically diverse samples and estimate that SV-containing star alleles are more frequent than previously reported. Cyrius will be an important tool to incorporate pharmacogenomics in WGS-based precision medicine initiatives.
Responsible for the metabolism of 25% of all drugs, CYP2D6 is a critical component of personalized medicine initiatives. Genotyping CYP2D6 is challenging due to sequence similarity with its pseudogene paralog CYP2D7 and a high number and variety of common structural variants (SVs). Here we describe a novel bioinformatics method, Cyrius, that accurately genotypes CYP2D6 using whole-genome sequencing (WGS) data. Using a validation data set consisting of reference samples with diverse genotypes as well as PacBio long read data, we show that Cyrius has superior performance (96.5% concordance with truth genotypes) compared to existing methods (83.8-86.6%). After implementing the improvements identified from the comparison against the truth data, Cyrius's accuracy has since been improved to 99.3%. Using Cyrius, we built a haplotype frequency database from 2504 ethnically diverse samples and estimate that SV-containing star alleles are more frequent than previously reported. Cyrius will be a useful tool for pharmacogenomics applications with WGS and help bring the promise of precision medicine one step closer to reality. Running Aldy and StargazerAldy v2.2.5 was run using the command "aldy genotype -p illumina -g CYP2D6".Stargazer v1.0.7 was run to genotype CYP2D6 using VDR as the control gene, with GDF and VCF files as input.The 1kGP GeT-RM samples were originally aligned against hg38. As Aldy and Stargazer only support GRCh37, for comparison between methods, these samples were realigned against GRCh37 using Isaac 26 .
Objective The role of the survival of motor neuron (SMN) gene in amyotrophic lateral sclerosis (ALS) is unclear, with several conflicting reports. A decisive result on this topic is needed, given that treatment options are available now for SMN deficiency. Methods In this largest multicenter case control study to evaluate the effect of SMN1 and SMN2 copy numbers in ALS, we used whole genome sequencing data from Project MinE data freeze 2. SMN copy numbers of 6,375 patients with ALS and 2,412 controls were called from whole genome sequencing data, and the reliability of the calls was tested with multiplex ligation‐dependent probe amplification data. Results The copy number distribution of SMN1 and SMN2 between cases and controls did not show any statistical differences (binomial multivariate logistic regression SMN1 p = 0.54 and SMN2 p = 0.49). In addition, the copy number of SMN did not associate with patient survival (Royston‐Parmar; SMN1 p = 0.78 and SMN2 p = 0.23) or age at onset (Royston‐Parmar; SMN1 p = 0.75 and SMN2 p = 0.63). Interpretation In our well‐powered study, there was no association of SMN1 or SMN2 copy numbers with the risk of ALS or ALS disease severity. This suggests that changing SMN protein levels in the physiological range may not modify ALS disease course. This is an important finding in the light of emerging therapies targeted at SMN deficiencies. ANN NEUROL 2021;89:686–697
Ciliates are microbial eukaryotes that undergo extensive programmed genome rearrangement, a natural genome editing process that converts long germline chromosomes into smaller gene-rich somatic chromosomes. Three well-studied ciliates include Oxytricha trifallax, Tetrahymena thermophila and Paramecium tetraurelia, but only the Oxytricha lineage has a massively scrambled genome, whose assembly during development requires hundreds of thousands of precise programmed DNA joining events, representing the most complex genome dynamics of any known organism. Here we study the emergence of such complex genomes by examining the origin and evolution of discontinuous and scrambled genes in the Oxytricha lineage. This study compares six genomes from three species, the germline and somatic genomes for Euplotes woodruffi, Tetmemena sp., and the model ciliate Oxytricha trifallax. To complement existing data, we sequenced, assembled and annotated the germline and somatic genomes of Euplotes woodruffi, which provides an outgroup, and the germline genome of Tetmemena sp.. We find that the germline genome of Tetmemena is as massively scrambled and interrupted as Oxytricha's : 13.6% of its gene loci require programmed translocations and/or inversions, with some genes requiring hundreds of precise gene editing events during development. This study revealed that the earlier-diverged spirotrich, E. woodruffi, also has a scrambled genome, but only roughly half as many loci (7.3%) are scrambled. Furthermore, its scrambled genes are less complex, together supporting the position of Euplotes as a possible evolutionary intermediate in this lineage, in the process of accumulating complex evolutionary genome rearrangements, all of which require extensive repair to assemble functional coding regions. Comparative analysis also reveals that scrambled loci are often associated with local duplications, supporting a gradual model for the origin of complex, scrambled genomes via many small events of DNA duplication and decay.
The abundance of Lp(a) protein holds significant implications for the risk of cardiovascular disease (CVD), which is directly impacted by the copy number (CN) of KIV-2, a 5.5 kbp sub-region. KIV-2 is highly polymorphic in the population and accurate analysis is challenging. In this study, we present the DRAGEN KIV-2 CN caller, which utilizes short reads. Data across 166 WGS show that the caller has high accuracy, compared to optical mapping and can further phase ~50% of the samples. We compared KIV-2 CN numbers to 24 previously postulated KIV-2 relevant SNVs, revealing that many are ineffective predictors of KIV-2 copy number. Population studies, including USA-based cohorts, showed distinct KIV-2 CN, distributions for European-, African-, and Hispanic-American populations and further underscored the limitations of SNV predictors. We demonstrate that the CN estimates correlate significantly with the available Lp(a) protein levels and that phasing is highly important.
Spinal muscular atrophy, a leading cause of early infant death, is caused by biallelic mutations of theSMN1gene. Sequence analysis ofSMN1is challenging due to high sequence similarity with its paralogSMN2. Both genes have variable copy numbers across populations. Furthermore, without pedigree information, it is impossible to identify silent carriers (2+0) with two copies ofSMN1on one chromosome and zero copies on the other. We developed Paraphase, an informatics method that identifies full-lengthSMN1andSMN2haplotypes, determines the gene copy numbers and calls phased variants using long-read PacBio HiFi data. TheSMN1andSMN2copy number calls by Paraphase are highly concordant with orthogonal methods (99.2% forSMN1and 100% forSMN2). We applied Paraphase to 438 samples across five ethnic populations to conduct a population-wide haplotype analysis of these highly homologous genes. We identified majorSMN1andSMN2haplogroups and characterized their co-segregation through pedigree-based analyses. We identified twoSMN1haplotypes that form a common two-copySMN1allele in African populations. Testing positive for these two haplotypes in an individual with two copies ofSMN1gives a silent carrier risk of 88.5%, which is significantly higher than the currently used marker (1.7-3.0%). Extending beyond simple copy number testing, Paraphase can detect pathogenic variants and enable potential haplotype-based screening of silent carriers through statistical phasing of haplotypes into alleles. Future analysis of larger population data will allow identification of more diverse haplotypes and genetic markers for silent carriers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.