2021
DOI: 10.1109/tcbb.2021.3109557
|View full text |Cite
|
Sign up to set email alerts
|

GapPredict – A Language Model for Resolving Gaps in Draft Genome Assemblies

Abstract: Short-read DNA sequencing instruments can yield over 10 12 bases per run, typically composed of reads 150 bases long. Despite this high throughput, de novo assembly algorithms have difficulty reconstructing contiguous genome sequences using short reads due to both repetitive and difficult-to-sequence regions in these genomes. Some of the short read assembly challenges are mitigated by scaffolding assembled sequences using paired-end reads. However, unresolved sequences in these scaffolds appear as "gaps". Here… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 37 publications
0
1
0
Order By: Relevance
“…Subsequently, Geneious performed autoannotation using the NC_034239.1 reference sequence as a gene transfer guide. To refine the draft genome, GapPredict’s machine learning algorithm filled gaps and resolved ambiguous bases, which were then confirmed manually via raw read alignment and genomic context ( 5 ). Protein-coded gene calls were confirmed to have appropriate open reading frames by using Expasy Translate with the invertebrate mitochondrial code and then queried in the non-redundant GenBank database.…”
Section: Announcementmentioning
confidence: 99%
“…Subsequently, Geneious performed autoannotation using the NC_034239.1 reference sequence as a gene transfer guide. To refine the draft genome, GapPredict’s machine learning algorithm filled gaps and resolved ambiguous bases, which were then confirmed manually via raw read alignment and genomic context ( 5 ). Protein-coded gene calls were confirmed to have appropriate open reading frames by using Expasy Translate with the invertebrate mitochondrial code and then queried in the non-redundant GenBank database.…”
Section: Announcementmentioning
confidence: 99%