2017
DOI: 10.1101/193144
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multi-platform discovery of haplotype-resolved structural variation in human genomes

Abstract: The incomplete identification of structural variants from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-and short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three human parent-child trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,181 indel variants (<50 bp) and 31,599 structural variants (≥5… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

9
510
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3

Relationship

3
5

Authors

Journals

citations
Cited by 253 publications
(520 citation statements)
references
References 56 publications
9
510
1
Order By: Relevance
“…In this experiment we built a variation graph of the human chromosome 22 and compared GraphAligner and vg [16] on it. To build the graph, we took the GRCh38 reference and variants [33] called by the Human Genome Structural Variation Consortium [33], and used vg to build the graph from the reference and the variants. We first randomly subsampled the reads [2] to 15x coverage.…”
Section: Variation Graphmentioning
confidence: 99%
See 1 more Smart Citation
“…In this experiment we built a variation graph of the human chromosome 22 and compared GraphAligner and vg [16] on it. To build the graph, we took the GRCh38 reference and variants [33] called by the Human Genome Structural Variation Consortium [33], and used vg to build the graph from the reference and the variants. We first randomly subsampled the reads [2] to 15x coverage.…”
Section: Variation Graphmentioning
confidence: 99%
“…In addition, we use whole human genome PacBio Sequel [3] and Illumina [4] data from HG00733, randomly subsampled to 15x coverage for PacBio and 30x for Illumina. We use the diploid assembly from [33] as the ground truth to evaluate against for HG00733. We did not include LoRDEC in the fruit fly or HG00733 experiments as the results in [34] show that FMLRC outperforms it in both speed and accuracy.…”
Section: Error Correctionmentioning
confidence: 99%
“…This is chiefly because, compared to short reads, long (10-50kbp) reads can be more reliably mapped to such regions and are more likely to span entire SVs [8][9][10] . These technologies combined with data generated by population studies using multiple sequencing platforms, are leading to a rapid and ongoing expansion of the reference SV databases in a variety of species [11][12][13] .…”
mentioning
confidence: 99%
“…This technique is particularly powerful when combined with other data types like linked reads or long reads to create dense long-range haplotypes 18 . Previously, we used this approach for partitioning reads prior to local assembly to improve structural variation sensitivity 19 but read partitioning required mapping to a reference genome as an intermediate step, which can entail biases towards reference alleles and alignment artifacts. Here, we show how this limitation can be removed by exploiting Strand-seq's additional ability to assign contigs to chromosomes in order to phase them and how this linking technology can be coupled with recent advances in highly accurate long-read sequencing.…”
mentioning
confidence: 99%
“…We initially assembled HiFi reads for HG00733 using Canu 26 , into a haplotype-unaware ("collapsed") assembly with contig N50 values of 14.9 Mbp. To scaffold the genome, we aligned 115 single-cell Strand-seq libraries generated for HG00733 in the context of the Human Genome Structural Variation Consortium (HGSVC) 19 to the collapsed assembly. The cumulative depth of Strand-seq reads was 2.87-fold and covered 73% of genomic positions in the assembly.…”
mentioning
confidence: 99%