The advent of next-generation sequencing has resulted in transcriptome-based approaches to investigate functionally significant biological components in a variety of non-model organism. This has resulted in the area of "venomics": a rapidly growing field using combined transcriptomic and proteomic datasets to characterize toxin diversity in a variety of venomous taxa. Ultimately, the transcriptomic portion of these analyses follows very similar pathways after transcriptome assembly: candidate toxin identification using BLAST, expression level screening, protein sequence alignment, gene tree reconstruction, and characterization of potential toxin function. Here we describe the python package Venomix, which streamlines these processes using commonly used bioinformatic tools along with a public, annotated database comprised of characterized venom proteins. In this study, we use the Venomix pipeline to characterize candidate venom diversity in four phylogenetically distinct organisms, a cone snail (Conidae; Conus sponsalis), a snake (Viperidae; Echis coloratus), an ant (Formicidae; Tetramorium bicarinatum), and a scorpion (Scorpionidae; Urodacus yaschenkoi). Data on these organisms was sampled from public databases and thus different approaches to either transcriptome assembly, toxin identification, or gene expression quantification was used for each. Of the organisms used in our analysis, Venomix recovered numerically more candidate toxin transcripts for three of the four transcriptomes than the original analyses. In four of four organisms we identified new toxin candidates that were not reported in the original analysis. In summary, we show that the Venomix package is a useful tool to identify and characterize the diversity of toxin-like transcripts.
Abstract 593The advent of next-generation sequencing has resulted in transcriptome-based approaches to 594 investigate functionally significant biological components in a variety of non-model organism. This has 595 resulted in the area of "venomics": a rapidly growing field using combined transcriptomic and proteomic 596 datasets to characterize toxin diversity in a variety of venomous taxa. Ultimately, the transcriptomic 597 portion of these analyses follows very similar pathways after transcriptome assembly: candidate toxin 598 identification using BLAST, expression level screening, protein sequence alignment, gene tree 599 reconstruction, and characterization of potential toxin function. Here we describe the python package 600Venomix, which streamlines these processes using commonly used bioinformatic tools along with a 601 public, annotated database comprised of characterized venom proteins. In this study, we use the Venomix 602 pipeline to characterize candidate venom diversity in four phylogenetically distinct organisms, a cone 603 snail (Conidae; Conus sponsalis), a snake (Viperidae; Echis coloratus), an ant (Formicidae; Tetramorium 604 bicarinatum), and a scorpion (Scorpionidae; Urodacus yaschenkoi). Data on these organisms was sampled 605 from public databases...