2017
DOI: 10.1101/187096
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

KoVariome: Korean National Standard Reference Variome database of whole genomes with comprehensive SNV, indel, CNV, and SV analyses

Abstract: High-coverage whole-genome sequencing data of a single ethnicity can provide a useful catalogue of population-specific genetic variations. Herein, we report a comprehensive analysis of the Korean population, and present the Korean National Standard Reference Variome (KoVariome). As a part of the Korean Personal Genome Project (KPGP), we constructed the KoVariome database using 5.5 terabases of whole genome sequence data from 50 healthy Korean individuals with an average coverage depth of 31×. In total, KoVario… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
20
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 13 publications
(20 citation statements)
references
References 67 publications
0
20
0
Order By: Relevance
“…To illustrate the robustness of NARD as an imputation reference panel, we built a pseudo-GWAS dataset using an independent cohort of 97 unrelated KOR individuals 14,18,19 . It was generated from WGS data by masking the genotypes that were not included in the sites of Illumina Omni 2.5M array.…”
Section: Evaluation Of Nard Imputation Panelmentioning
confidence: 99%
See 1 more Smart Citation
“…To illustrate the robustness of NARD as an imputation reference panel, we built a pseudo-GWAS dataset using an independent cohort of 97 unrelated KOR individuals 14,18,19 . It was generated from WGS data by masking the genotypes that were not included in the sites of Illumina Omni 2.5M array.…”
Section: Evaluation Of Nard Imputation Panelmentioning
confidence: 99%
“…There are only a few population-scale WGS studies covering Northeast Asians from China, Japan, and Mongolia 6,8,12,13 , and these studies have the several issues for the improved reference panel in Northeast Asia such as public unavailability, inadequate sequencing coverage, and small sample size. Furthermore, although Koreans (KOR) are one of the major population groups in Northeast Asia, previous datasets for KOR [14][15][16] does not have enough number of WGS samples to accurately impute the genome-wide variants of KOR population. Therefore, constructing a large-scale whole-genome reference panel for the diverse population groups in Northeast Asia with deep sequencing coverage is still necessary to allow dense and accurate genotype imputation for the genetic researches in these populations.In this study, we constructed the Northeast Asian Reference Database (NARD), consisting of 1,781 individuals from Korea, Japan, Mongolia, China, and Hong Kong.…”
mentioning
confidence: 99%
“…Sequence and data analysis were performed using Torrent Suite software (5.8.0). Sequencing coverage analysis was performed using coverage Analysis (5.8.0.1) plugins and VCF files were generated using the variantCaller To filter out potential sequencing background noise, we excluded Common Korean SNVs which are included in KoVariome whole genome sequence (WGS) database from 50 healthy unrelated Korean individuals 22,23 were also excluded. We identified possible impact of variants using SIFT, PolyPhen-2 and used OncoKDM to predict the effect of genetic variants on protein function [24][25][26][27] .…”
Section: Sequencing Data Analysismentioning
confidence: 99%
“…Annotation of the variants was obtained using the Ion Reporter (5.10.2.0) software.To filter out potential sequencing background noise, we excluded control variants detected in cfDNA or CTCs samples from 30 healthy individuals. Common Korean SNVs which are included in KoVariome whole genome sequence (WGS) database from 50 healthy unrelated Korean individuals22,23 were also excluded. We identified variants of uncertain significance (VUS) using SIFT, PolyPhen-2 and used…”
mentioning
confidence: 99%
“…The variant coordinates were based on the human genome assembly GRCh37. Because the East Asian data in the 1000 Genomes Project did not include data from Korean populations, we compared the data from ve continents and East Asian countries in the 1000 Genome Projects with data extracted from KRGDB, which included whole-genome sequencing data for 1722 Korean individuals [17]. Data on the population frequencies of the SNPs were downloaded from the web-based database (http://152.99.75.168:9090/KRGDB/menuPages/download.jsp/, last accessed: January 15, 2020).…”
Section: Introductionmentioning
confidence: 99%