2019
DOI: 10.1101/635011
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Paragraph: A graph-based structural variant genotyper for short-read sequence data

Abstract: Accurate detection and genotyping of structural variations (SVs) from short-read data is a long-standing area of development in genomics research and clinical sequencing pipelines. We introduce Paragraph, an accurate genotyper that models SVs using sequence graphs and SV annotations. We demonstrate the accuracy of Paragraph on whole-genome sequence data from three samples using long read SV calls as the truth set, and then apply Paragraph at scale to a cohort of 100 short-read sequenced samples of diverse ance… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
64
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 48 publications
(69 citation statements)
references
References 54 publications
0
64
0
Order By: Relevance
“…To assess the population frequency of these variants, we genotyped identified SVs affecting COSMIC genes from the three analyzed cancer samples with Paragraph 54 in the dataset of 2,504 short-read WGS samples from the recent re-sequencing of the 1000 genomes project (1KGP) samples 46 . Paragraph genotypes SVs by constructing localized sequence graphs containing the reference allele and the candidate SV allele and performs a localized realignment of paired-end short reads to the graph.…”
Section: Resultsmentioning
confidence: 99%
“…To assess the population frequency of these variants, we genotyped identified SVs affecting COSMIC genes from the three analyzed cancer samples with Paragraph 54 in the dataset of 2,504 short-read WGS samples from the recent re-sequencing of the 1000 genomes project (1KGP) samples 46 . Paragraph genotypes SVs by constructing localized sequence graphs containing the reference allele and the candidate SV allele and performs a localized realignment of paired-end short reads to the graph.…”
Section: Resultsmentioning
confidence: 99%
“…[52]) is not available in cattle. Recent studies indicated that large structural variants can be identified accurately from genome graphs [32,[53][54][55]. Eventually, a bovine genome graph that unifies multiple breed-specific haplotype-resolved genome assemblies and their sites of variation might provide access to sources of variation that are currently neglected when short sequencing reads are aligned to a linear reference sequence [56,57].…”
Section: Discussionmentioning
confidence: 99%
“…The hypervariability of VNTR sequences prevents a single assembly from serving as an optimal reference. Instead, to improve both alignment and genotyping, multiple assemblies may be combined into a pangenome graph (PGG) (Hickey et al 2020;Eggertsson et al 2019;Garrison et al 2018;Chen et al 2019) composed of sequence-labeled vertices connected by edges such that haplotypes correspond to paths in the graph. Sequences shared by multiple haplotypes are stored in the same vertex, and genetic variation is represented by the branching structure of the graph.…”
Section: Introductionmentioning
confidence: 99%