Oleg S. Glotov scite author profile

Advantages and diagnostic effectiveness of the two most widely used resequencing approaches, whole exome (WES) and whole genome (WGS) sequencing, are often debated. WES dominated largescale resequencing projects because of lower cost and easier data storage and processing. Rapid development of 3 rd generation sequencing methods and novel exome sequencing kits predicate the need for a robust statistical framework allowing informative and easy performance comparison of the emerging methods. In our study we developed a set of statistical tools to systematically assess coverage of coding regions provided by several modern WES platforms, as well as PCR-free WGS. We identified a substantial problem in most previously published comparisons which did not account for mappability limitations of short reads. Using regression analysis and simple machine learning, as well as several novel metrics of coverage evenness, we analyzed the contribution from the major determinants of CDS coverage. Contrary to a common view, most of the observed bias in modern WES stems from mappability limitations of short reads and exome probe design rather than sequence composition. We also identified the ~ 500 kb region of human exome that could not be effectively characterized using short read technology and should receive special attention during variant analysis. Using our novel metrics of sequencing coverage, we identified main determinants of WES and WGS performance. Overall, our study points out avenues for improvement of enrichment-based methods and development of novel approaches that would maximize variant discovery at optimal cost. Next-generation sequencing (NGS) is rapidly becoming an invaluable tool in human genetics research and clinical diagnostics 1-3. Practical use of NGS methods has dramatically increased with the development of targeted sequencing approaches, such as whole-exome sequencing (WES) or targeted sequencing of gene panels. WES emerged as an efficient alternative to whole-genome sequencing (WGS) due to both lower sequencing cost and simplification of variant analysis and data storage 4. More than 80% of all variants reported in ClinVar, and more than 89% of variants reported to be pathogenic, come from the protein-coding part of the genome; this number increases to 99% when immediate CDS vicinity is included. Even allowing for the sampling bias, there is an overall agreement that most heritable diseases appear to be caused by alterations in the protein-coding regions of the

show abstract

Whole‐exome sequencing provides insights into monogenic disease prevalence in Northwest Russia

Barbitoff

Skitchenko

Poleshchuk

et al. 2019

Molec Gen & Gen Med

View full text Add to dashboard Cite

BackgroundAllele frequency data from large exome and genome aggregation projects such as the Genome Aggregation Database (gnomAD) are of ultimate importance to the interpretation of medical resequencing data. However, allele frequencies might significantly differ in poorly studied populations that are underrepresented in large‐scale projects, such as the Russian population.MethodsIn this work, we leveraged our access to a large dataset of 694 exome samples to analyze genetic variation in the Northwest Russia. We compared the spectrum of genetic variants to the dbSNP build 151, and made estimates of ClinVar‐based autosomal recessive (AR) disease allele prevalence as compared to gnomAD r. 2.1.ResultsAn estimated 9.3% of discovered variants were not present in dbSNP. We report statistically significant overrepresentation of pathogenic variants for several Mendelian disorders, including phenylketonuria (PAH, rs5030858), Wilson's disease (ATP7B, rs76151636), factor VII deficiency (F7, rs36209567), kyphoscoliosis type of Ehlers‐Danlos syndrome (FKBP14, rs542489955), and several other recessive pathologies. We also make primary estimates of monogenic disease incidence in the population, with retinal dystrophy, cystic fibrosis, and phenylketonuria being the most frequent AR pathologies.ConclusionOur observations demonstrate the utility of population‐specific allele frequency data to the diagnosis of monogenic disorders using high‐throughput technologies.

show abstract

Analysis of the Spectrum of ACE2 Variation Suggests a Possible Influence of Rare and Common Variants on Susceptibility to COVID-19 and Severity of Outcome

et al. 2020

View full text Add to dashboard Cite

Objectives In March 2020, the World Health Organization declared that an infectious respiratory disease caused by a new severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2, causing coronavirus disease 2019 (COVID-19)] became a pandemic. In our study, we have analyzed a large publicly available dataset, the Genome Aggregation Database (gnomAD), as well as a cohort of 37 Russian patients with COVID-19 to assess the influence of different classes of genetic variants in the angiotensin-converting enzyme-2 ( ACE2 ) gene on the susceptibility to COVID-19 and the severity of disease outcome. Results We demonstrate that the European populations slightly differ in alternative allele frequencies at the 2,754 variant sites in ACE2 identified in the gnomAD database. We find that the Southern European population has a lower frequency of missense variants and slightly higher frequency of regulatory variants. However, we found no statistical support for the significance of these differences. We also show that the Russian population is similar to other European populations when comparing the frequencies of the ACE2 variants. Evaluation of the effect of various classes of ACE2 variants on COVID-19 outcome in a cohort of Russian patients showed that common missense and regulatory variants do not explain the differences in disease severity. At the same time, we find several rare ACE2 variants (including rs146598386, rs73195521, rs755766792, and others) that are likely to affect the outcome of COVID-19. Our results demonstrate that the spectrum of genetic variants in ACE2 may partially explain the differences in severity of the COVID-19 outcome.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Oleg S. Glotov

Systematic dissection of biases in whole-exome and whole-genome sequencing reveals major determinants of coding sequence coverage

Whole‐exome sequencing provides insights into monogenic disease prevalence in Northwest Russia

Analysis of the Spectrum of ACE2 Variation Suggests a Possible Influence of Rare and Common Variants on Susceptibility to COVID-19 and Severity of Outcome

Contact Info

Product

Resources

About