2021
DOI: 10.1093/bib/bbab095
|View full text |Cite
|
Sign up to set email alerts
|

ToxCodAn: a new toxin annotator and guide to venom gland transcriptomics

Abstract: Motivation Next-generation sequencing has become exceedingly common and has transformed our ability to explore nonmodel systems. In particular, transcriptomics has facilitated the study of venom and evolution of toxins in venomous lineages; however, many challenges remain. Primarily, annotation of toxins in the transcriptome is a laborious and time-consuming task. Current annotation software often fails to predict the correct coding sequence and overestimates the number of toxins present in t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 13 publications
(8 citation statements)
references
References 51 publications
0
8
0
Order By: Relevance
“…Toxins are encoded by relatively short coding sequences (CDSs), and while animal genomes typically contain no more than 40–50 000 ORFs, transcriptomes predict far more ORFs, making it challenging to distinguish real ORFs from bioinformatics artifacts. Existing tools like Conodictor 2 [ 14 ], Venomix [ 23 ] or ToxCodAn [ 22 ] are unsuitable for handling large data sets, face issues with maintenance and software compatibility, and lack integration across all steps, from assembly to toxin identification, requiring manual execution of multiple programs. For instance, Venomix necessitates three input files, including alignment against a toxin database, an assembled transcriptome and a gene expression quantification file, complicating the process.…”
Section: Discussionmentioning
confidence: 99%
“…Toxins are encoded by relatively short coding sequences (CDSs), and while animal genomes typically contain no more than 40–50 000 ORFs, transcriptomes predict far more ORFs, making it challenging to distinguish real ORFs from bioinformatics artifacts. Existing tools like Conodictor 2 [ 14 ], Venomix [ 23 ] or ToxCodAn [ 22 ] are unsuitable for handling large data sets, face issues with maintenance and software compatibility, and lack integration across all steps, from assembly to toxin identification, requiring manual execution of multiple programs. For instance, Venomix necessitates three input files, including alignment against a toxin database, an assembled transcriptome and a gene expression quantification file, complicating the process.…”
Section: Discussionmentioning
confidence: 99%
“…De novo assemblies for each B. arietans venom gland transcriptome were merged, and redundancy and contigs less than 150 bp were removed with CD-HIT (v4.8.1) (42,43). Toxins were then annotated using ToxCodAn (44) and Diamond (v2.0.11) (45) BLASTx (E-value 10 -05 cut-off) searches against the National Center for Biotechnology Information (NCBI) non-redundant protein database (accessed February 2023). Toxin transcripts were manually checked to determine that all translations were non-redundant, full-length proteins (methionine start codon to stop codon), had a maximum of three ambiguous amino acid residues, shared sequence identity with currently known toxins, and contained a conserved signal peptide sequence within each venom protein family.…”
Section: Methodsmentioning
confidence: 99%
“…We ran all the assemblies in a standardized way using five different assemblers with different k-mer values and assembly parameters (Trinity: k-mer 31; rnaSPADES: k-mer 31, 75, and 127; Extender: default, overlap 150, and seed size 2000; SeqMan Ngen: k-mer 21; and Bridger: k-mer 30) ( Grabherr et al 2011 ; Rokyta et al 2012 ; Chang et al 2015 ; Holding et al 2018 ; Bushmanova et al 2019 ). Then, we performed toxin annotation using ToxCodan ( Nachtigall, Rautsaw, et al 2021 ) against a curated data set of toxin sequences. Annotated toxin transcripts were manually reviewed and used to purge toxic-like contigs from the Trinity assembly of each individual.…”
Section: Methodsmentioning
confidence: 99%