Advantages and diagnostic effectiveness of the two most widely used resequencing approaches, whole exome (WES) and whole genome (WGS) sequencing, are often debated. WES dominated largescale resequencing projects because of lower cost and easier data storage and processing. Rapid development of 3 rd generation sequencing methods and novel exome sequencing kits predicate the need for a robust statistical framework allowing informative and easy performance comparison of the emerging methods. In our study we developed a set of statistical tools to systematically assess coverage of coding regions provided by several modern WES platforms, as well as PCR-free WGS. We identified a substantial problem in most previously published comparisons which did not account for mappability limitations of short reads. Using regression analysis and simple machine learning, as well as several novel metrics of coverage evenness, we analyzed the contribution from the major determinants of CDS coverage. Contrary to a common view, most of the observed bias in modern WES stems from mappability limitations of short reads and exome probe design rather than sequence composition. We also identified the ~ 500 kb region of human exome that could not be effectively characterized using short read technology and should receive special attention during variant analysis. Using our novel metrics of sequencing coverage, we identified main determinants of WES and WGS performance. Overall, our study points out avenues for improvement of enrichment-based methods and development of novel approaches that would maximize variant discovery at optimal cost. Next-generation sequencing (NGS) is rapidly becoming an invaluable tool in human genetics research and clinical diagnostics 1-3. Practical use of NGS methods has dramatically increased with the development of targeted sequencing approaches, such as whole-exome sequencing (WES) or targeted sequencing of gene panels. WES emerged as an efficient alternative to whole-genome sequencing (WGS) due to both lower sequencing cost and simplification of variant analysis and data storage 4. More than 80% of all variants reported in ClinVar, and more than 89% of variants reported to be pathogenic, come from the protein-coding part of the genome; this number increases to 99% when immediate CDS vicinity is included. Even allowing for the sampling bias, there is an overall agreement that most heritable diseases appear to be caused by alterations in the protein-coding regions of the
BackgroundAllele frequency data from large exome and genome aggregation projects such as the Genome Aggregation Database (gnomAD) are of ultimate importance to the interpretation of medical resequencing data. However, allele frequencies might significantly differ in poorly studied populations that are underrepresented in large‐scale projects, such as the Russian population.MethodsIn this work, we leveraged our access to a large dataset of 694 exome samples to analyze genetic variation in the Northwest Russia. We compared the spectrum of genetic variants to the dbSNP build 151, and made estimates of ClinVar‐based autosomal recessive (AR) disease allele prevalence as compared to gnomAD r. 2.1.ResultsAn estimated 9.3% of discovered variants were not present in dbSNP. We report statistically significant overrepresentation of pathogenic variants for several Mendelian disorders, including phenylketonuria (PAH, rs5030858), Wilson's disease (ATP7B, rs76151636), factor VII deficiency (F7, rs36209567), kyphoscoliosis type of Ehlers‐Danlos syndrome (FKBP14, rs542489955), and several other recessive pathologies. We also make primary estimates of monogenic disease incidence in the population, with retinal dystrophy, cystic fibrosis, and phenylketonuria being the most frequent AR pathologies.ConclusionOur observations demonstrate the utility of population‐specific allele frequency data to the diagnosis of monogenic disorders using high‐throughput technologies.
Objectives In March 2020, the World Health Organization declared that an infectious respiratory disease caused by a new severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2, causing coronavirus disease 2019 (COVID-19)] became a pandemic. In our study, we have analyzed a large publicly available dataset, the Genome Aggregation Database (gnomAD), as well as a cohort of 37 Russian patients with COVID-19 to assess the influence of different classes of genetic variants in the angiotensin-converting enzyme-2 ( ACE2 ) gene on the susceptibility to COVID-19 and the severity of disease outcome. Results We demonstrate that the European populations slightly differ in alternative allele frequencies at the 2,754 variant sites in ACE2 identified in the gnomAD database. We find that the Southern European population has a lower frequency of missense variants and slightly higher frequency of regulatory variants. However, we found no statistical support for the significance of these differences. We also show that the Russian population is similar to other European populations when comparing the frequencies of the ACE2 variants. Evaluation of the effect of various classes of ACE2 variants on COVID-19 outcome in a cohort of Russian patients showed that common missense and regulatory variants do not explain the differences in disease severity. At the same time, we find several rare ACE2 variants (including rs146598386, rs73195521, rs755766792, and others) that are likely to affect the outcome of COVID-19. Our results demonstrate that the spectrum of genetic variants in ACE2 may partially explain the differences in severity of the COVID-19 outcome.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.