2018
DOI: 10.1101/gr.233460.117
|View full text |Cite
|
Sign up to set email alerts
|

Comparative Annotation Toolkit (CAT)—simultaneous clade and personal genome annotation

Abstract: The recent introductions of low-cost, long-read, and read-cloud sequencing technologies coupled with intense efforts to develop efficient algorithms have made affordable, high-quality de novo sequence assembly a realistic proposition. The result is an explosion of new, ultracontiguous genome assemblies. To compare these genomes, we need robust methods for genome annotation. We describe the fully open source Comparative Annotation Toolkit (CAT), which provides a flexible way to simultaneously annotate entire cl… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
96
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
2

Relationship

4
4

Authors

Journals

citations
Cited by 104 publications
(102 citation statements)
references
References 60 publications
0
96
0
Order By: Relevance
“…Comparative Annotation Toolkit (CAT) is a software pipeline that leverages whole-genome alignments, existing annotations, and comparative gene prediction tools to simultaneously annotate multiple genomes, defining orthologous relationships and discovering gene family expansion and contraction 27 . CAT also diagnoses assembly quality by investigating the rate of gene model-breaking indels seen in transcript projections from a reference, as well as looking at the rate of transcript projections that map in a disjointed fashion.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…Comparative Annotation Toolkit (CAT) is a software pipeline that leverages whole-genome alignments, existing annotations, and comparative gene prediction tools to simultaneously annotate multiple genomes, defining orthologous relationships and discovering gene family expansion and contraction 27 . CAT also diagnoses assembly quality by investigating the rate of gene model-breaking indels seen in transcript projections from a reference, as well as looking at the rate of transcript projections that map in a disjointed fashion.…”
Section: Resultsmentioning
confidence: 99%
“…In every place where the alignment indicated a difference in order and orientation of scaffolds between the two assemblies, we used every available data type to resolve the discrepancy and determine which was correct. Our strategies included aligning BAC-end pairs from a half-brother of Twilight 2 to the assemblies using bwa mem with default parameters 34 , assessing concordance with the physical map, looking for split genes predicted by the CAT 27 , aligning coding sequences of any genes in the region to the assemblies using gmap with default parameters 35 , and examining heatmaps of long-range read pairs mapping to the assembly generated by the HiRise and longranger pipelines 24 .…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…( Table 1, Supplementary Note 4). Comparative annotation of our whole-genome assembly also shows a higher agreement of mapped transcripts than previous assemblies and only a slightly increased rate of potential frameshifts compared to GRCh38 23 . Of the 19,618 protein-coding genes annotated in the CHM13 de novo assembly, just 170 (0.86%) contain a predicted frameshift, or, if measured by transcripts, only 334 of 83,332 transcripts (0.40%) contain a predicted frameshift (Supplementary Table 1).…”
Section: Highly Continuous Whole-genome Assemblymentioning
confidence: 65%
“…We ran the Comparative Annotation Toolkit 52 to annotate the polished assemblies in order to analyze how well Shasta assembles transcripts and genes. Each assembly was individually aligned to the GRCh38 reference assembly using Cactus 53 to create the input alignment to CAT.…”
Section: Analysis Methodsmentioning
confidence: 99%