2019
DOI: 10.1186/s13073-019-0677-z
|View full text |Cite
|
Sign up to set email alerts
|

NARD: whole-genome reference panel of 1779 Northeast Asians improves imputation accuracy of rare and low-frequency variants

Abstract: Here, we present the Northeast Asian Reference Database (NARD), including whole-genome sequencing data of 1779 individuals from Korea, Mongolia, Japan, China, and Hong Kong. NARD provides the genetic diversity of Korean (n = 850) and Mongolian (n = 384) ancestries that were not present in the 1000 Genomes Project Phase 3 (1KGP3). We combined and re-phased the genotypes from NARD and 1KGP3 to construct a union set of haplotypes. This approach established a robust imputation reference panel for Northeast Asians,… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

2
39
2

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 41 publications
(43 citation statements)
references
References 55 publications
(85 reference statements)
2
39
2
Order By: Relevance
“…There are two strategies for developing reference panels, the first is to use a population with closely matched ancestry to that of the group under study, and the second is to use as many samples as possible. While not evaluated in the context of low-pass sequencing imputation, analysis of DNA arrays shows that reference panels matched to the population of interest outperform diverse reference panels of similar sizes (Mitt et al 2017; Zhou et al 2017; Bai et al 2019; Yoo et al 2019). This would suggest that larger refence panels are preferable as long as they contain sufficient representation of the study population.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…There are two strategies for developing reference panels, the first is to use a population with closely matched ancestry to that of the group under study, and the second is to use as many samples as possible. While not evaluated in the context of low-pass sequencing imputation, analysis of DNA arrays shows that reference panels matched to the population of interest outperform diverse reference panels of similar sizes (Mitt et al 2017; Zhou et al 2017; Bai et al 2019; Yoo et al 2019). This would suggest that larger refence panels are preferable as long as they contain sufficient representation of the study population.…”
Section: Discussionmentioning
confidence: 99%
“…These observed effects were for initial population-matched reference panels of ~100 samples, with additional diverse samples increasing the reference panels to over 860 samples (Bai et al 2019). Other analyses compared references panels of > 1500 samples to the Haplotype Reference Consortium (HRC) reference panel (http://www.haplotype-reference-consortium.org/) (McCarthy et al 2016; Mitt et al 2017; Zhou et al 2017; Yoo et al 2019), which consists of 32,611 samples, indicating the potential for increased resolution in human studies compared to canine studies, which used a panel of just 676 samples from 91 breeds (Piras et al 2020). Altogether, at MAFs > 0.05, human imputation studies conducted using DNA array genotypes show non-reference concordance rates > 97.5% and mean r 2 values > 0.95 (Mitt et al 2017; Zhou et al 2017; Yoo et al 2019).…”
Section: Discussionmentioning
confidence: 99%
“…Asian Reference Database (NARD) has a haplotype reference panel comprised of 1,779 northeast Asian individuals, but the haplotype panel is not publicly available, and thus can be used only in the NARD imputation server [8]. Thus, in order to use publicly unavailable haplotypes for more accurate imputation, we must send the input genotype data to other research institutes having their own closed haplotype data.…”
Section: Introductionmentioning
confidence: 99%
“…Although the imputation methods based on the Li and Stephens model require a haplotype reference panel as in an explicit form, the accessibility of haplotype data is often limited, due to the requirement of agreements from the donors for public use. For example, the Northeast Asian Reference Database (NARD) has a haplotype reference panel comprised of 1,779 northeast Asian individuals, but the haplotype panel is not publicly available, and thus can be used only in the NARD imputation server [ 8 ]. Thus, in order to use publicly unavailable haplotypes for more accurate imputation, we must send the input genotype data to other research institutes having their own closed haplotype data.…”
Section: Introductionmentioning
confidence: 99%
“…In coordination with our packages VCFTools.jl (handling VCF files) and SnpArrays.jl (handling PLINK files), OpenMendel powers a streamlined pipeline for endto-end data analysis. In an era where the cost of genotyping arrays continues to drop faster than Moore's law and telomere-to-telomere reference panels are within reach [23,28], MendelImpute offers a compelling mix of excellent speed, small memory footprint, and simplicity of use.…”
Section: Introductionmentioning
confidence: 99%