MinireviewT Th he e c ca at tt tl le e g ge en no om me e r re ev ve ea al ls s i it ts s s se ec cr re et ts s Cattle belong to an ancient group of mammals, the Cetartiodactyla, that first appeared around 60 million years ago. Domesticated cattle (Bos taurus and Bos taurus indicus) diverged from a common ancestor 250,000 years ago, and have had a long and rich association with human civilization since Neolithic times 8,000-10,000 years ago. All modern cattle breeds originate from large populations of the ancestral aurochs (Bos taurus primigenius; Figure 1) through thousands of years of domestication. During this time, more than 800 cattle breeds have been established, representing an important resource for understanding the genetics of complex traits in ruminants. More than a billion cattle are raised annually worldwide for beef and dairy products, as well as for hides. Cattle therefore represent significant scientific opportunities, as well as an important economic resource.Sequencing of the cattle genome began in December 2003, led by Richard Gibbs and George Weinstock at the Baylor College of Medicine's genome sequencing center in Houston, Texas, USA. The first draft sequence of the bovine genome was based on DNA taken from a Hereford dam, L1 Dominette 01449 (Figure 2), a cattle breed used in beef production. In parallel, a large number of single-nucleotide polymorphisms (SNPs) have also been generated from the partial sequence of six breeds (Holstein, Angus, Jersey, Limousin, Norwegian Red and Brahman). Taken together with the sequence of L1 Dominette 01449 (the reference bovine genome [1]) these represent a valuable resource for marker-assisted selection of genetic traits in commercial breeding programs.The Bovine Genome Project represents a complex collaborative effort between multiple groups and funding from the United States, Canada, France, United Kingdom, New Zealand and Australia.Undoubtedly the current bovine genome sequence will be improved in both its sequence coverage and its annotation, but this draft sequence will form the basis for cattle genetics and genomics for the next 20 years or more.
So what have we learned?T Th he e g ge en no om me e a as ss se em mb bl ly y p pr ro ob bl le em m --s st ti il ll l n no ot t s so ol lv ve ed d? ?The technology for generating raw sequence data has advanced rapidly over the past 35 years, starting with Sanger sequencing in the 1970s, automated fluorescent Sanger sequencing in the 1980s and, recently, ultra-high-A Ab bs st tr ra ac ct t throughput methods based on the parallel sequencing platforms produced by 454, Illumina, and ABI. However, the scale of these advances has not been matched by new algorithms and tools for sequence assembly, particularly for large genomes. Common problems associated with large genomes have been repetitive sequences (generally around 50% of a vertebrate genome), gene families and genetic polymorphisms, all of which can cause errors in assembly. Genome assembly is still a problem, requiring a combination of parallel computing...