Members of the teleost family Syngnathidae (seahorses, pipefishes and seadragons) (Extended Data Fig. 1), comprising approximately 300 species, display a complex array of morphological innovations and reproductive behaviours. This includes specialized morphological phenotypes such as an elongated snout with a small terminal mouth, fused jaws, absent pelvic and caudal fins, and an extended body covered with an armour of bony plates instead of scales 1 (Fig. 1a). Syngnathids are also unique among vertebrates due to their 'male pregnancy' , whereby males nourish developing embryos in a brood pouch until hatching and parturition occurs 2,3 . In addition, members of the subfamily Hippocampinae (seahorses) exhibit other derived features such as the lack of a caudal fin, a characteristic prehensile tail, and a vertical body axis 4 (Fig. 1a). To understand the genetic basis of the specialized morphology and reproductive system of seahorses, we sequenced the genome of the tiger tail seahorse, H. comes, and carried out comparative genomic analyses with the genome sequences of other ray-finned fishes (Actinopterygii).
Genome assembly and annotationThe genome of a male H. comes individual was sequenced using the Illumina HiSeq 2000 platform. After filtering low-quality and duplicate reads, 132.13 Gb (approximately 190-fold coverage of the estimated 695 Mb genome) of reads from libraries with insert sizes ranging from 170 bp to 20 kb were retained for assembly. The filtered reads were assembled using SOAPdenovo (version 2.04) to yield a 501.6 Mb assembly with an N50 contig size and N50 scaffold size of 34.7 kb and 1.8 Mb, respectively. Total RNA from combined soft tissues of H. comes was sequenced using RNA-sequencing (RNA-seq) and assembled de novo. The H. comes genome assembly is of high quality, as > 99% of the de novo assembled transcripts (76,757 out of 77,040) could be mapped to the assembly; and 243 out of 248 core eukaryotic genes mapping approach (CEGMA) genes are complete in the assembly.We predicted 23,458 genes in the genome of H. comes based on homology and by mapping the RNA-seq data of H. comes and a closely related species, the lined seahorse, Hippocampus erectus, to the genome assembly (see Methods and Supplementary Information). More than 97% of the predicted genes (22,941 genes) either have homologues in public databases (Swissprot, Trembl and the Kyoto Encyclopedia of Genes and Genomes (KEGG)) or are supported by assembled RNAseq transcripts. Analysis of gene family evolution using a maximum likelihood framework identified an expansion of 25 gene families (261 genes; 1.11%) and contraction of 54 families (96 genes; 0.41%) in the H. comes lineage (Extended Data Fig. 2 and Supplementary Tables 4.1, 4.2). Transposable elements comprise around 24.8% (124.5 Mb) of the H. comes genome, with class II DNA transposons being the most abundant class (9%; 45 Mb). Only one wave of transposable element expansion was identified, with no evidence for a recent transposable element burst (Kimura divergence ≤ 5) (Extended D...