Background The field of bee genomics has been considerably advanced in recent years, however, the most diverse group of honey producers on the planet, the stingless bees (SLBs), are still largely neglected. In fact, only ten of the ~600 described SLB species have been sequenced, and only one using a long-read (LR) sequencing technology. Here, we sequenced the complete genome of the most common, widespread and broadly reared SLB species in Brazil—Tetragonisca angustula (popularly known as jataí).
Results A total of 48.01 Gb of DNA data were generated, including 2.31 Gb of Pacific Bioscience LRs and 45.70 Gb of Illumina short reads (SRs). Our preferred assembly comprised 705 contigs encompassing 283.99 Mb, 65.94 Mb of which (23.22%) corresponded to 462,261 repetitive elements. N50, L50 and complete BUSCOs reached 1.01 Mb, 94 contigs and 97%, respectively. We predicted that the genome of T. angustula comprises 16,958 protein-coding genes and 1,866 non-coding RNAs. The mitogenome consisted of 17,410 bp, and all 37 genes were found to be on the positive strand, an unusual feature among bees. A phylogenomic analysis of 26 hymenopteran species revealed that six of the 64 orthogroups experiencing rapid evolution in T. angustula included odorant receptor genes, all but one undergoing considerable contractions.
Conclusions Here, we provided the first genome assembly for the ecologically and economically important T. angustula, the second SLB species to be sequenced with LR technology thus far. We demonstrated that even relatively small amounts of LR data in combination with sufficient SR data can yield high-quality genome assemblies for bees.