2017
DOI: 10.1093/nar/gkx1175
|View full text |Cite
|
Sign up to set email alerts
|

Jointly aligning a group of DNA reads improves accuracy of identifying large deletions

Abstract: Performing sequence alignment to identify structural variants, such as large deletions, from genome sequencing data is a fundamental task, but current methods are far from perfect. The current practice is to independently align each DNA read to a reference genome. We show that the propensity of genomic rearrangements to accumulate in repeat-rich regions imposes severe ambiguities in these alignments, and consequently on the variant calls—with current read lengths, this affects more than one third of known larg… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
15
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(17 citation statements)
references
References 24 publications
2
15
0
Order By: Relevance
“…HuRef data also allowed us to reproduce a realistic dataset that would capture the challenges of small indel and large indel calling in human genome. If the indels were to be inserted randomly across the genome it would underestimate the proportion of indels located in ambiguous regions, where the indels may be represented in different positions [39] (S1 and S2 Figs). Here, we reconstructed chromosome 1 of the HuRef genome, based on human reference genome hg19, by similar methods to [39,40] and inserted indels of a real human individual into the corresponding position in reference genome (S1 File).…”
Section: The Semi-simulated Wgs Dataset Covering a Wide Size Range Of Indels With Varying Coverages And Read Lengthsmentioning
confidence: 99%
See 4 more Smart Citations
“…HuRef data also allowed us to reproduce a realistic dataset that would capture the challenges of small indel and large indel calling in human genome. If the indels were to be inserted randomly across the genome it would underestimate the proportion of indels located in ambiguous regions, where the indels may be represented in different positions [39] (S1 and S2 Figs). Here, we reconstructed chromosome 1 of the HuRef genome, based on human reference genome hg19, by similar methods to [39,40] and inserted indels of a real human individual into the corresponding position in reference genome (S1 File).…”
Section: The Semi-simulated Wgs Dataset Covering a Wide Size Range Of Indels With Varying Coverages And Read Lengthsmentioning
confidence: 99%
“…If the indels were to be inserted randomly across the genome it would underestimate the proportion of indels located in ambiguous regions, where the indels may be represented in different positions [39] (S1 and S2 Figs). Here, we reconstructed chromosome 1 of the HuRef genome, based on human reference genome hg19, by similar methods to [39,40] and inserted indels of a real human individual into the corresponding position in reference genome (S1 File). In addition, we reconstructed chromosome 1 with two different haplotypes by randomly selecting variants from different size ranges and only inserting them into one of the haplotypes as heterozygous variants or into both haplotypes as homozygous variants.…”
Section: The Semi-simulated Wgs Dataset Covering a Wide Size Range Of Indels With Varying Coverages And Read Lengthsmentioning
confidence: 99%
See 3 more Smart Citations