2021
DOI: 10.1101/2021.02.06.430068
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

High coverage whole genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios

Abstract: The 1000 Genomes Project (1kGP), launched in 2008, is the largest fully open resource of whole genome sequencing (WGS) data consented for public distribution of raw sequence data without access or use restrictions. The final (phase 3) 2015 release of 1kGP included 2,504 unrelated samples from 26 populations, representing five continental regions of the world and was based on a combination of technologies including low coverage WGS (mean depth 7.4X), high coverage whole exome sequencing (mean depth 65.7X), and … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

6
280
1

Year Published

2021
2021
2022
2022

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 168 publications
(287 citation statements)
references
References 75 publications
6
280
1
Order By: Relevance
“…We next inferred the histories of human populations from large publicly available resequencing data. We computed a -SFS for each of the 26 human populations from five continental ancestries sequenced in the 1KG ( 27 ) using an unphased variant call set (mapped to human genome assembly GRCh38 [hg38]) from the recent high-coverage (30×) resequencing data of 1KG samples from the New York Genome Center ( 28 ). Our bioinformatic pipeline for computing the -SFS for each 1KG population is detailed in Materials and Methods .…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We next inferred the histories of human populations from large publicly available resequencing data. We computed a -SFS for each of the 26 human populations from five continental ancestries sequenced in the 1KG ( 27 ) using an unphased variant call set (mapped to human genome assembly GRCh38 [hg38]) from the recent high-coverage (30×) resequencing data of 1KG samples from the New York Genome Center ( 28 ). Our bioinformatic pipeline for computing the -SFS for each 1KG population is detailed in Materials and Methods .…”
Section: Resultsmentioning
confidence: 99%
“…The recovered MuSH is a rich object that illuminates dimensions of population history and addresses biological questions about the evolution of the mutation process. After validating with data simulated under known histories, we use mushi to independently infer histories for each of the 26 populations (from 5 superpopulations defined by continental ancestry) from the 1000 Genomes Project (1KG) Consortium ( 27 ) using recent high-coverage sequencing data ( 28 ). We demonstrate that mushi is a powerful tool for demographic inference that has several advantages over existing demographic inference methods and then go on to describe the illuminated features of human MuSH.…”
mentioning
confidence: 99%
“…A total of 602 trios from the 1000 Genomes Project were sequenced at the New York Genome Center as described previously (2). The aligned data files (crams) are located at http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/1000G_ 2504_high_coverage.sequence.index and http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/1000G_ 698_related_high_coverage.sequence.index .…”
Section: Trio Datasetmentioning
confidence: 99%
“…The 1000 Genomes Project is a data resource for the study of genetic variation that includes individuals from diverse genetic ancestries (1,2). In this study, we present the first ever assessment of new mutations, termed de novo variants (DNVs) that are only found in children and not their parents, represented in this collection.…”
Section: Introductionmentioning
confidence: 99%
“…We call variants with our pipeline from publicly available long-read data for 31 samples, and generate a panel of long-read SV calls which can be used for screening further samples. Finally, we genotype this SV panel in 444 high coverage short-read samples from the 1000 Genomes Project (Byrska-Bishop et al 2021) and discover thousands of novel SV associations with gene expression. Many of these SVs have CAVIAR posterior probabilities of causality that exceed those of previously reported SNPs, indicating likely functional relevance.…”
mentioning
confidence: 99%