We report a high-quality draft sequence of the genome of the horse (Equus caballus). The genome is relatively repetitive, but has little segmental duplication. Chromosomes appear to have undergone few historical rearrangements – 48% of equine chromosomes show conserved synteny to a single human chromosome. Equine chromosome 11 is shown to have an evolutionary novel centromere devoid of centromeric satellite DNA, suggesting that centromeric function may arise prior to satellite repeat accumulation. Linkage disequilibrium, showing the influences of early domestication of large herds of female horses, is intermediate in length between dog and human, and there is long-range haplotype sharing among breeds.
SummaryThe horse, like the majority of animal species, has a limited amount of species-specific expressed sequence data available in public databases. As a result, structural models for the majority of genes defined in the equine genome are predictions based on ab initio sequence analysis or the projection of gene structures from other mammalian species. The current study used Illumina-based sequencing of messenger RNA (RNA-seq) to help refine structural annotation of equine protein-coding genes and for a preliminary assessment of gene expression patterns. Sequencing of mRNA from eight equine tissues generated 293 758 105 sequence tags of 35 bases each, equalling 10.28 gbp of total sequence data. The tag alignments represent approximately 207· coverage of the equine mRNA transcriptome and confirmed transcriptional activity for roughly 90% of the protein-coding gene structures predicted by Ensembl and NCBI. Tag coverage was sufficient to refine the structural annotation for 11 356 of these predicted genes, while also identifying an additional 456 transcripts with exon/intron features that are not listed by either Ensembl or NCBI. Genomic locus data and intervals for the protein-coding genes predicted by the Ensembl and NCBI annotation pipelines were combined with 75 116 RNA-seq-derived transcriptional units to generate a consensus equine protein-coding gene set of 20 302 defined loci. Gene ontology annotation was used to compare the functional and structural categories of genes expressed in either a tissue-restricted pattern or broadly across all tissue samples.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.