2016
DOI: 10.1101/gr.210500.116
|View full text |Cite
|
Sign up to set email alerts
|

A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree

Abstract: Improvement of variant calling in next-generation sequence data requires a comprehensive, genome-wide catalog of highconfidence variants called in a set of genomes for use as a benchmark. We generated deep, whole-genome sequence data of 17 individuals in a three-generation pedigree and called variants in each genome using a range of currently available algorithms. We used haplotype transmission information to create a phased "Platinum" variant catalog of 4.7 million singlenucleotide variants (SNVs) plus 0.7 mi… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
380
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
7
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 362 publications
(392 citation statements)
references
References 36 publications
2
380
0
Order By: Relevance
“…The 50× paired-end and PCR-free Illumina sequencing data for the trio composed of the individuals NA12891 (father), NA12892 (mother), and daughter (NA12878) were obtained from the European Nucleotide Archive (accession number ERP001960). The data were generated as part of the Illumina Platinum Genomes project 46 , but the data for individual NA12878 were also used to generate the 1000 Genomes Project structural variant calls 7 . The sample input to BayesTyper was obtained by counting the occurrences of all 55-mers in the sequencing data from the three individuals using KMC2 with counting of singletons enabled.…”
Section: Methodsmentioning
confidence: 99%
“…The 50× paired-end and PCR-free Illumina sequencing data for the trio composed of the individuals NA12891 (father), NA12892 (mother), and daughter (NA12878) were obtained from the European Nucleotide Archive (accession number ERP001960). The data were generated as part of the Illumina Platinum Genomes project 46 , but the data for individual NA12878 were also used to generate the 1000 Genomes Project structural variant calls 7 . The sample input to BayesTyper was obtained by counting the occurrences of all 55-mers in the sequencing data from the three individuals using KMC2 with counting of singletons enabled.…”
Section: Methodsmentioning
confidence: 99%
“…In all our analyses, the parameters p and s were set to 0.97 and 5. The values were chosen to maximize Mendelian consistency of genotype calls in Platinum Genome pedigree samples (Eberle et al 2017) on an unrelated set of repeats.…”
mentioning
confidence: 99%
“…Overall, the significant lack of mosaic aneuploidy events in samples from blood (p= 3.08x10 -34 , Binomial Test) and the lack of meiotic events, together with the enrichment of trisomy 12, which is the most common cytogenetic abnormality in chronic B lymphocytic leukemia 28 , combine to suggest that these mosaic aneuploidies arose during cell culture and were either neutral in effect or promoted positive selection for these transformed B lymphocytes. One of the samples in which segmental LOH for distal chromosome 11 was identified was GM12889, for which whole genome sequencing (WGS) to a mean ~50X coverage has been performed to define high-confidence, "platinum" variants 29 . We downloaded those WGS data and re-ran MADSEQ, again predicting the LOH of a 19.3 Mb region, estimated to be present in over 50% of the cells (Figure 2).…”
Section: Application Of Madseq To Sequencing Datamentioning
confidence: 99%
“…Picard (v 1.119) was used to mark duplicates and GATK (v 3.4-46) 25 was used for indel realignment and base recalibration following best practices [27][28][29] .…”
Section: Design Of Multi-ethnic Targeted Locimentioning
confidence: 99%