2022
DOI: 10.1101/2022.11.05.515275
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

WarpSTR: Determining tandem repeat lengths using raw nanopore signals

Abstract: Motivation: Short tandem repeats (STRs) are regions of a genome containing many consecutive copies of the same short motif, possibly with small variations. Analysis of STRs has many clinical uses, but is limited by technology mainly due to STRs surpassing the used read length. Nanopore sequencing, as one of long read sequencing technologies, produces very long reads, thus offering more possibilities to study and analyze STRs. Basecalling of nanopore reads is however particularly unreliable in repeating regions… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 40 publications
0
3
0
Order By: Relevance
“…Future studies desiring to characterize tandem content could test recent software development for comparison. These include nucleotide-based detection softwares, such as TideHunter [ 11 ], NCRF [ 12 ], NanoSTR [ 13 ], mTR [ 14 ], and signal-based softwares, such as DeepRepeat [ 15 ] and WarpSTR [ 16 ], all of which are potentially computationally much faster than TRF [ 9 ]. Their use to develop trimming tools represent a potential avenue of research to remove/mask artifactual tandems from raw reads prior to assembly.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Future studies desiring to characterize tandem content could test recent software development for comparison. These include nucleotide-based detection softwares, such as TideHunter [ 11 ], NCRF [ 12 ], NanoSTR [ 13 ], mTR [ 14 ], and signal-based softwares, such as DeepRepeat [ 15 ] and WarpSTR [ 16 ], all of which are potentially computationally much faster than TRF [ 9 ]. Their use to develop trimming tools represent a potential avenue of research to remove/mask artifactual tandems from raw reads prior to assembly.…”
Section: Discussionmentioning
confidence: 99%
“…Classified reads were sorted via mapping to the curated genomes to determine reads that were correctly assigned (the read barcode is in agreement with the genome it mapped to), those representing barcode leakage (the read barcode is not in agreement with the genome it mapped to) or those unmapped (reads with very low quality or artifactual). See Addtional file 4 detection softwares, such as TideHunter [11], NCRF [12], NanoSTR [13], mTR [14], and signal-based softwares, such as DeepRepeat [15] and WarpSTR [16], all of which are potentially computationally much faster than TRF [9]. Their use to develop trimming tools represent a potential avenue of research to remove/mask artifactual tandems from raw reads prior to assembly.…”
Section: Table 6 Classified Reads Contentmentioning
confidence: 99%
“…However, comparatively high error rates in repetitive regions, which can impede accurate tandem repeat genotyping, necessitate accounting for errors in ONT reads (9). Thus, nanopore-specific, signal-based methods have emerged as a promising approach for directly calling tandem repeat copy numbers from nanopore signal data to minimize errors introduced by basecalling (9, 13, 14). These methods have demonstrated success with targeted sequencing (14, 15, 21).…”
Section: Introductionmentioning
confidence: 99%
“…In 2022, this technique was used to successfully genotype disease-associated tandem repeats using the ReadFish API. The second targeted sequencing approach, termed nanopore Cas9 Targeted sequencing (nCATs), uses CRISPR-Cas9 to selectively sequence regions of interest using RNA guides and was found to outperform computational enrichment for tandem repeat genotyping (9,(13)(14)(15)(16). However, the application of nCATS to comprehensively genotyping disease-associated tandem repeats has yet to be extended beyond common ataxias found in European populations (15).…”
Section: Introductionmentioning
confidence: 99%
“…The combined utilization of these approaches facilitates comprehensive identification and characterization of TRs, aiding in the understanding of their functional and evolutionary roles in genomes. PopAffiliator [25], Tally-2.0 [26], RExPRT [27], and WarpSTR [28] are TR identification methods developed based on machine learning. 6) Deep learning-based methods, which based on deep learning typically involve training models to recognize patterns found in TRs.…”
Section: Introductionmentioning
confidence: 99%