2021
DOI: 10.1101/2021.08.23.457338
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

proovframe: frameshift-correction for long-read (meta)genomics

Abstract: Long-read sequencing technologies hold big promises for the genomic analysis of complex samples such as microbial communities. Yet, despite improving accuracy, basic gene prediction on long-read data is still often impaired by frameshifts resulting from small indels. Consensus polishing using either complementary short reads or to a lesser extent the long reads themselves can mitigate this effect but requires universally high sequencing depth, which is difficult to achieve in complex samples where the majority… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
14
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(18 citation statements)
references
References 51 publications
0
14
0
Order By: Relevance
“…A commonly adopted solution has been to include short-read data for post-assembly error correction 15,22 , although it increases the cost and complexity overhead. Another solution has been to apply reference-based polishing to correct frameshift errors [23][24][25] but, although this provides a practical solution that enables gene calling, it does not provide true near-finished genomes. Finished microbial genomes, as defined by Bowers et al 2017 in the MIMAG (minimum information about a metagenome-assembled genome) standard 26 , are genomes that have "...a single, validated, contiguous sequence per replicon, without gaps or ambiguities" and "a consensus error rate equivalent to Q50 or better".…”
mentioning
confidence: 99%
“…A commonly adopted solution has been to include short-read data for post-assembly error correction 15,22 , although it increases the cost and complexity overhead. Another solution has been to apply reference-based polishing to correct frameshift errors [23][24][25] but, although this provides a practical solution that enables gene calling, it does not provide true near-finished genomes. Finished microbial genomes, as defined by Bowers et al 2017 in the MIMAG (minimum information about a metagenome-assembled genome) standard 26 , are genomes that have "...a single, validated, contiguous sequence per replicon, without gaps or ambiguities" and "a consensus error rate equivalent to Q50 or better".…”
mentioning
confidence: 99%
“…A commonly adopted solution has been to include short-read data for post-assembly error correction 12,19 , although it increases the cost and complexity overhead. Another solution has been to apply reference-based polishing to correct frameshift errors [20][21][22] , but while it provides a practical solution, which allows gene calling, it does not provide true near-perfect genomes.…”
mentioning
confidence: 99%
“…Full details are available in the original PointFinder 61 methods. We base our modifications to PointFinder on the previously demonstrated observation that frameshifts and stop codons in third-generation assemblies are more likely to reflect sequencing and assembly errors than true sequence variation 62,63 . We modify PointFinder to not halt its search for variants along a resistance loci if it encounters a stop codon.…”
Section: Methodsmentioning
confidence: 99%
“…We modify PointFinder to not halt its search for variants along a resistance loci if it encounters a stop codon. We additionally modify PointFinder to shift alignments around indels, maintaining the reading frame, in an approach similar to more general frameshift correction tools 62,63 . Our modified PointFinder has been incorporated into ResFinder version 4.2 and can be activated with the ‘-ii’ (Ignore Indels) and ‘-ic’ (Ignore stop Codons) flags.…”
Section: Methodsmentioning
confidence: 99%