2021
DOI: 10.1093/bioinformatics/btab749
|View full text |Cite
|
Sign up to set email alerts
|

Population-scale detection of non-reference sequence variants using colored de Bruijn graphs

Abstract: Motivation With the increasing throughput of sequencing technologies, structural variant (SV) detection has become possible across tens of thousands of genomes. Non-reference sequence (NRS) variants have drawn less attention compared to other types of SVs due to the computational complexity of detecting them. When using short-read data, the detection of NRS variants inevitably involves a de novo assembly which requires high-quality sequence data at high coverage. Previous studies have demonst… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(9 citation statements)
references
References 61 publications
(85 reference statements)
0
6
0
Order By: Relevance
“…The PopIns2 workflow ( Krannich et al 2022 ) was applied to detect the nonreference sequence of the data set from 898 animals. First, the assemble submodule was used to identify reads without high-quality alignment to the reference genome using default parameters.…”
Section: Methodsmentioning
confidence: 99%
“…The PopIns2 workflow ( Krannich et al 2022 ) was applied to detect the nonreference sequence of the data set from 898 animals. First, the assemble submodule was used to identify reads without high-quality alignment to the reference genome using default parameters.…”
Section: Methodsmentioning
confidence: 99%
“…Synthetic long reads were simulated using LRSIM ( 38 ) with 65× sequence coverage. We evaluated Novel-X, PopIns2 ( 30 ), NUI ( 20 ) and Pamir ( 31 ) on these data. Because Pamir is not able to work directly with synthetic long-read data, we simulated matching standard short-read data of the same coverage from the same genome using ART ( 39 ).…”
Section: Resultsmentioning
confidence: 99%
“…This algorithm was applied to the high coverage samples in the 1000 Genomes pilot phase, and a total of 128 NovelSeq calls were reported and validated ( 26 , 27 ). Subsequent short-read methods for this problem such as MindTheGap ( 28 ), ANISE and BASIL ( 29 ), PopIns2 ( 30 ) or Pamir ( 31 ) appended population-based techniques (e.g. pooling multiple low-coverage samples from the same population) or used additional whole-genome signals such as split reads for better breakpoint resolution.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Thus, this wealth of data is, in practice, inaccessible to most researchers, and standard tools need to be established for indexing these data. Sequence graphs are an increasingly prominent model for representing and indexing large collections of sequencing data [43,18], enabling improvements in both the scale and accuracy of many biological analysis tasks (e.g., genotyping [57,25], variant calling [36,13], sequence search [32,57]). We construct a separate De Bruijn graph for each input sample, with nodes weighted by k-mer abundance.…”
Section: Introductionmentioning
confidence: 99%