Genome-wide sequence analyses of ethnic populations across Russia

Zhernakova, Daria V.; Brukhin, Vladimir; Malov, Sergey V.; Oleksyk, Tarás K.; Koepfli, Klaus Peter; Zhuk, Anna S.; Dobrynin, Pavel; Kliver, Sergei; Cherkasov, Nikolay; Tamazian, Gaik; Rotkevich, Mikhail; Krasheninnikova, Ksenia; Evsyukov, Igor; Sidorov, Sviatoslav; Gorbunova, A. V.; Chernyaeva, Ekaterina; Shevchenko, Andrey; Kolchanova, Sofia M.; Komissarov, Aleksei S.; Simonov, Serguei; Antonik, Alexey; Logachev, Anton; Polev, Dmitrii E.; Pavlova, Olga A.; Glotov, Andrey S.; Ulantsev, Vladimir; Noskova, Ekaterina; Davydova, Tatyana K.; Sivtseva, Tatyana M.; Limborska, Svetlana A.; Balanovsky, Oleg; Osakovsky, Vladimir L.; Novozhilov, Alexey; Puzyrev, V P; O’Brien, Stephen J.

doi:10.1016/j.ygeno.2019.03.007

Cited by 25 publications

(37 citation statements)

References 92 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…with alternative allele frequencies and annotations are presented in Tables) S4. [12]). Several examples of these variants are presented in Table 4.…”

Section: Collection Of Medically Related Variantsmentioning

confidence: 99%

“…By offering public annotations of functional mutations in a population sampled across the territory of Ukraine, our database contributes a number of candidates to direct future research in medical genomics. We chose only the markers with the highest non-reference allele frequency (NAF) differences compared to the neighboring populations: the combined population from Europe (EUR; [13]) and Russians from Russia (RUS; [12]), evaluated by the Fisher Exact Test (FET) and listed them Table 5. (Figure 2).…”

Section: Collection Of Medically Related Variantsmentioning

confidence: 99%

“…This analysis already shows the potential of the current database in helping to resolve population structure in Eatern Europe, but additional genome wide data from neighboring populations would be very helpful to refine the picture in this geographical region. [13,38]) and Russians from Russia (RUS)(Novgorod and Pskov; [12] as well as the relevant high-coverage human genomes from the Estonian Biocentre Human Genome Diversity Panel (EGDP (EST; [44], and Simmons Genome Diversity project [45]. For identification of the optimal K parameter, we evaluated a range from 2 to 8, with K=3 resulting in the lowest error.…”

Section: Collection Of Medically Related Variantsmentioning

confidence: 99%

“…Principal [13,38]) and Russians from Russia (RUS)(Novgorod and Pskov; [12] as well as the relevant high-coverage human genomes from the Estonian Biocentre Human Genome Diversity Panel (EGDP (EST; [44], and the Simmons Genome Diversity project [45]. The analysis was performed with Eigensoft [46].…”

Section: G) Population Analysismentioning

confidence: 99%

“…For identification of the optimal K parameter, we used the 10-fold cross-validation function of ADMIXTURE in range from 2 to 8, with K=3 resulting in the lowest error, deeming it optimal. The results were visualized using Python programming language, with pandas, matplotlib and seaborn libraries [56,57] to construct a population structure plot using samples from the 1000Genomes Project (Utah Residents (CEU) with Northern and Western European Ancestry, Toscani in Italy (TSI), Finnish in Finland (FIN), British in England and Scotland (GBR), and Iberian Population in Spain (IBS)); [13,38]) and Russians from Russia (RUS)(Novgorod and Pskov; [12] as well as the relevant high-coverage human genomes from the Estonian Biocentre Human Genome Diversity Panel (EGDP (EST; [44], and Simmons Genome Diversity project [45]. The resulting plot with K=3 is presented in Figure 3, and plots with K=4 to K=8 are in the Figure S3.…”

Section: Model-based Population Structure Analysismentioning

confidence: 99%

See 4 more Smart Citations

Genome Diversity in Ukraine

Oleksyk

Wolfsberger

Weber

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

The main goal of this collaborative effort is to provide genome wide data for the previously underrepresented population in Eastern Europe, and to provide cross-validation of the data from genome sequences and genotypes of the same individuals acquired by different technologies. We collected 97 genome-grade DNA samples from consented individuals representing major regions of Ukraine that were consented for the public data release. DNBSEQ-G50 sequences, and genotypes by an Illumina GWAS chip were cross-validated on multiple samples, and additionally referenced to a sample that has been resequenced by Illumina NovaSeq6000 S4 at high coverage. The genome data has been searched for genomic variation represented in this population, and a number of variants have been reported: large structural variants, indels, CNVs, SNPs and microsatellites. This study is providing the largest to-date survey of genetic variation in Ukraine, creating a public reference resource aiming to provide data for historic and medical research in a large understudied population. While most of the common variation is shared with other European populations, this survey of population variation contributes a number of novel SNPs and structural variants that have not been reported in the gnomAD/1KG databases representing global distribution of genomic variation. These endemic variants will become a valuable resource for designing future population and clinical studies, help address questions about ancestry and admixture, and will fill a missing place in the puzzle characterizing human population diversity in Eastern Europe. Our results indicate that genetic diversity of the Ukrainian population is uniquely shaped by the evolutionary and demographic forces, and cannot be ignored in the future genetic and biomedical studies. This data will contribute a wealth of new information bringing forth different risk and/or protective alleles. The newly discovered low frequency and local variants can be added to the current genotyping arrays for genome wide association studies, clinical trials, and in genome assessment of proliferating cancer cells.

show abstract

“…with alternative allele frequencies and annotations are presented in Tables) S4. [12]). Several examples of these variants are presented in Table 4.…”

Section: Collection Of Medically Related Variantsmentioning

confidence: 99%

Section: Collection Of Medically Related Variantsmentioning

confidence: 99%

Section: Collection Of Medically Related Variantsmentioning

confidence: 99%

Section: G) Population Analysismentioning

confidence: 99%

Section: Model-based Population Structure Analysismentioning

confidence: 99%

See 3 more Smart Citations

Genome Diversity in Ukraine

Oleksyk

Wolfsberger

Weber

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

Whole‐exome sequencing provides insights into monogenic disease prevalence in Northwest Russia

Barbitoff

Skitchenko

Poleshchuk

et al. 2019

Molec Gen & Gen Med

Self Cite

View full text Add to dashboard Cite

BackgroundAllele frequency data from large exome and genome aggregation projects such as the Genome Aggregation Database (gnomAD) are of ultimate importance to the interpretation of medical resequencing data. However, allele frequencies might significantly differ in poorly studied populations that are underrepresented in large‐scale projects, such as the Russian population.MethodsIn this work, we leveraged our access to a large dataset of 694 exome samples to analyze genetic variation in the Northwest Russia. We compared the spectrum of genetic variants to the dbSNP build 151, and made estimates of ClinVar‐based autosomal recessive (AR) disease allele prevalence as compared to gnomAD r. 2.1.ResultsAn estimated 9.3% of discovered variants were not present in dbSNP. We report statistically significant overrepresentation of pathogenic variants for several Mendelian disorders, including phenylketonuria (PAH, rs5030858), Wilson's disease (ATP7B, rs76151636), factor VII deficiency (F7, rs36209567), kyphoscoliosis type of Ehlers‐Danlos syndrome (FKBP14, rs542489955), and several other recessive pathologies. We also make primary estimates of monogenic disease incidence in the population, with retinal dystrophy, cystic fibrosis, and phenylketonuria being the most frequent AR pathologies.ConclusionOur observations demonstrate the utility of population‐specific allele frequency data to the diagnosis of monogenic disorders using high‐throughput technologies.

show abstract

Complex trait susceptibilities and population diversity in a sample of 4,145 Russians

Usoltsev,

Kolosov,

Rotar

et al. 2024

Nat Commun

View full text Add to dashboard Cite

The population of Russia consists of more than 150 local ethnicities. The ethnic diversity and geographic origins, which extend from eastern Europe to Asia, make the population uniquely positioned to investigate the shared properties of inherited disease risks between European and Asian ancestries. We present the analysis of genetic and phenotypic data from a cohort of 4,145 individuals collected in three metro areas in western Russia. We show the presence of multiple admixed genetic ancestry clusters spanning from primarily European to Asian and high identity-by-descent sharing with the Finnish population. As a result, there was notable enrichment of Finnish-specific variants in Russia. We illustrate the utility of Russian-descent cohorts for discovery of novel population-specific genetic associations, as well as replication of previously identified associations that were thought to be population-specific in other cohorts. Finally, we provide access to a database of allele frequencies and GWAS results for 464 phenotypes.

show abstract

Genome-wide sequence analyses of ethnic populations across Russia

Cited by 25 publications

References 92 publications

Genome Diversity in Ukraine

Genome Diversity in Ukraine

Whole‐exome sequencing provides insights into monogenic disease prevalence in Northwest Russia

Complex trait susceptibilities and population diversity in a sample of 4,145 Russians

Contact Info

Product

Resources

About