The advent of rapid and inexpensive DNA sequencing has led to an explosion of data that must be transformed into knowledge about genome organization and function. Gene prediction is customarily the starting point for genome analysis. This paper presents a bioinformatics study of the oil palm genome, including a comparative genomics analysis, database and tools development, and mining of biological data for genes of interest. We annotated 26,087 oil palm genes integrated from two gene-prediction pipelines, Fgenesh++ and Seqping. As case studies, we conducted comprehensive investigations on intronless, resistance and fatty acid biosynthesis genes, and demonstrated that the current gene prediction set is of high quality. 3,672 intronless genes were identified in the oil palm genome, an important resource for evolutionary study. Further scrutiny of the oil palm genes revealed 210 candidate resistance genes involved in pathogen defense. Fatty acids have diverse applications ranging from food to industrial feedstock, and we identified 42 key genes involved in fatty-acid biosynthesis in oil palm mesocarp and kernel. These results provide an important resource for studies on plant genomes and a theoretical foundation for marker-assisted breeding of oil palm and related crops.All rights reserved. No reuse allowed without permission.(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint . http://dx.doi.org/10.1101/111120 doi: bioRxiv preprint first posted online 3
IntroductionOil palm belongs to the genus Elaeis of the family Arecaceae. The genus has two species -E. guineensis (African oil palm) and E. oleifera (American oil palm). E. guineensis has three fruit forms that mainly vary in the thickness of their seed (or kernel) shell -dura (thick-shell), tenera (thin-shell) and pisifera (no shell). The African oil palm is by far the most productive oil crop 1 in the world, with estimated production in year 2015/2016 of 61.68 million tonnes, of which the Malaysian share was 19.50 million tonnes 2 . Palm oil constitutes ~34.35% of the world production of edible oils and fats. Globally, palm oil is mainly produced from E. guineensis, in the tenera form. E. oleifera, is little planted because of its low yield (only 10 -20% of guineensis). However, it is more disease-resistant and planted in areas where guineensis is well-nigh impossible, e.g., Central-Southern America. Even then, it is mainly planted as a backcross to guineensis (interspecific hybrid) to raise its yield. Nevertheless, it has economically valuable traits which plant breeders drool over to introgress into guineensis, such as a more liquid oil with higher carotenoid and vitamin E contents, disease resistance and slow height increment.
1The importance of oil palm has resulted in considerable interest to sequence its transcriptomes and genome.Initial work used expressed sequence tags (ESTs) 3 , a technique very useful for tagging expressed genes bu...