2022
DOI: 10.1371/journal.pone.0278424
|View full text |Cite
|
Sign up to set email alerts
|

Different structural variant prediction tools yield considerably different results in Caenorhabditis elegans

Abstract: The accurate characterization of structural variation is crucial for our understanding of how large chromosomal alterations affect phenotypic differences and contribute to genome evolution. Whole-genome sequencing is a popular approach for identifying structural variants, but the accuracy of popular tools remains unclear due to the limitations of existing benchmarks. Moreover, the performance of these tools for predicting variants in non-human genomes is less certain, as most tools were developed and benchmark… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 51 publications
1
4
0
Order By: Relevance
“…1 , and supplementary tables S14 and S15, Supplementary Material online). The SV results from long-read sequencing are more reliable than those from short-read sequencing, as shown by the universally high F1 scores of most types of SVs ( table 1 and supplementary table S17, Supplementary Material online), which is consistent with previous studies ( Merker et al 2018 ; Lesack et al 2022 ). We also find that the number of SVs of certain types in the genome can affect the performance of the software to some extent.…”
Section: Resultssupporting
confidence: 89%
“…1 , and supplementary tables S14 and S15, Supplementary Material online). The SV results from long-read sequencing are more reliable than those from short-read sequencing, as shown by the universally high F1 scores of most types of SVs ( table 1 and supplementary table S17, Supplementary Material online), which is consistent with previous studies ( Merker et al 2018 ; Lesack et al 2022 ). We also find that the number of SVs of certain types in the genome can affect the performance of the software to some extent.…”
Section: Resultssupporting
confidence: 89%
“…While differences in library preparation ( Guan & Sung, 2016 ) or sequencing platform can affect the predicted SVs, considerable disparities between the call sets generated by different sequencing centers has been observed when using the same protocols ( Khayat et al, 2021 ). On the computational side, caller choice, parameter settings, and alignment method are known to affect SV calling ( Lesack et al, 2022 ; Liu et al, 2022a ). For short-read data, how software handles ambiguous read-to-genome mappings is a surprising and significant source of variation in SV identification; changing the order of the reads in the FASTQ file led to changes in predicted SVs ( Firtina & Alkan, 2016 ).…”
Section: Introductionmentioning
confidence: 99%
“…SNPs and indels, ranging in size from 1 bp to 50bp, can be identified with high confidence using short sequencing reads that are 100-150bp (Muzzey, Evans, and Lieber 2015). In contrast, SVs are challenging to find using short-read sequencing because the sequencing reads are often smaller than the size of an SV (Sudmant et al 2015; Mahmoud et al 2019; Lesack et al 2022). With the advent of higher quality long-read sequencing technologies which generate ~10kb-30kb reads with lower genomic coverage, the accurate annotation of large regions of genomic variation such as SVs has become easier (Sakamoto et al 2021).…”
Section: Introductionmentioning
confidence: 99%
“…SNPs and indels, ranging in size from 1 bp to 50bp, can be identified with high confidence using short sequencing reads that are 100-150bp (Muzzey, Evans, and Lieber 2015). In contrast, SVs are challenging to annotate using short-read sequencing because the sequencing reads are often smaller than the size of an SV (Sudmant et al 2015; Mahmoud et al 2019; Lesack et al 2022). Similarly, the highly repetitive sequences of TEs present significant challenges to mapping and annotation with traditional short read sequencing methods.…”
Section: Introductionmentioning
confidence: 99%