Background
Pistachio (
Pistacia vera
), one of the most important commercial nut crops worldwide, is highly adaptable to abiotic stresses and is tolerant to drought and salt stresses.
Results
Here, we provide a draft de novo genome of pistachio as well as large-scale genome resequencing. Comparative genomic analyses reveal stress adaptation of pistachio is likely attributable to the expanded cytochrome P450 and chitinase gene families. Particularly, a comparative transcriptomic analysis shows that the jasmonic acid (JA) biosynthetic pathway plays an important role in salt tolerance in pistachio. Moreover, we resequence 93 cultivars and 14 wild
P. vera
genomes and 35 closely related wild
Pistacia
genomes, to provide insights into population structure, genetic diversity, and domestication. We find that frequent genetic admixture occurred among the different wild
Pistacia
species. Comparative population genomic analyses reveal that pistachio was domesticated about 8000 years ago and suggest that key genes for domestication related to tree and seed size experienced artificial selection.
Conclusions
Our study provides insight into genetic underpinning of local adaptation and domestication of pistachio. The
Pistacia
genome sequences should facilitate future studies to understand the genetic basis of agronomically and environmentally related traits of desert crops.
Electronic supplementary material
The online version of this article (10.1186/s13059-019-1686-3) contains supplementary material, which is available to authorized users.
Objectives
Pistacia genus belongs to the flowering plants in the cashew family and contains at least 11 species. The whole-genome resequencing data of different species from Pistacia genus are described herein. The data reported here will be useful for better understand the adaptive evolution, demographic history, genetic diversity, population structure, and domestication of pistachio.
Data description
Genomic DNA was isolated from fresh leaves and used to construct libraries with insert size of 350 bp. Sequence libraries were made and sequenced on the Illumina Hiseq 4000 platform to produce 150 bp paired-end reads. A total number of 4,851,118,730 billion reads (ranging from 33,305,900 to 34,990,618 reads per sample) were created across all samples. We produced a total of 727.67 Gbp data which have been deposited in the Genome Sequence Archive (GSA) database with the Accession of CRA000978. All of the data are also available as the sequence read archive (SRA) format in the National Center for Biotechnology Information (NCBI) with identifier of SRP189222, mirroring our deposited data in GSA.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.