The genome of the japonica subspecies of rice, an important cereal and model monocot, was sequenced and assembled by whole-genome shotgun sequencing. The assembled sequence covers 93% of the 420-megabase genome. Gene predictions on the assembled sequence suggest that the genome contains 32,000 to 50,000 genes. Homologs of 98% of the known maize, wheat, and barley proteins are found in rice. Synteny and gene homology between rice and the other cereal genomes are extensive, whereas synteny with Arabidopsis is limited. Assignment of candidate rice orthologs to Arabidopsis genes is possible in many cases. The rice genome sequence provides a foundation for the improvement of cereals, our most important crops.
We have produced a draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp. indica, by whole-genome shotgun sequencing. The genome was 466 megabases in size, with an estimated 46,022 to 55,615 genes. Functional coverage in the assembled sequences was 92.0%. About 42.2% of the genome was in exact 20-nucleotide oligomer repeats, and most of the transposons were in the intergenic regions between genes. Although 80.6% of predicted Arabidopsis thaliana genes had a homolog in rice, only 49.4% of predicted rice genes had a homolog in A. thaliana. The large proportion of rice genes with no recognizable homologs is due to a gradient in the GC content of rice coding sequences.
Forward genetic mutational studies, adaptive evolution, and phenotypic screening are powerful tools for creating new variant organisms with desirable traits. However, mutations generated in the process cannot be easily identified with traditional genetic tools. We show that new high-throughput, massively parallel sequencing technologies can completely and accurately characterize a mutant genome relative to a previously sequenced parental (reference) strain. We studied a mutant strain of Pichia stipitis, a yeast capable of converting xylose to ethanol. This unusually efficient mutant strain was developed through repeated rounds of chemical mutagenesis, strain selection, transformation, and genetic manipulation over a period of seven years. We resequenced this strain on three different sequencing platforms. Surprisingly, we found fewer than a dozen mutations in open reading frames. All three sequencing technologies were able to identify each single nucleotide mutation given at least 10-15-fold nominal sequence coverage. Our results show that detecting mutations in evolved and engineered organisms is rapid and cost-effective at the whole-genome level using new sequencing technologies. Identification of specific mutations in strains with altered phenotypes will add insight into specific gene functions and guide further metabolic engineering efforts.[Supplemental material is available online at www.genome.org. Complete data sets are available at the NCBI Short Read Archive under accession no. SRA 001158 (ftp://ftp.ncbi.nih.gov/pub/TraceDB/ShortRead).]Pichia stipitis (Pignal) is a haploid yeast related to endosymbionts of beetles that degrade rotting wood (Suh et al. 2003). It is an important organism for bioenergy production from lignocellulosic materials because of its high capacity to ferment xylose and cellobiose to ethanol (Parekh et al. 1988). We previously sequenced the reference strain, Pichia stipitis CBS-6054, resulting in a completely characterized genome of eight chromosomes totaling 15.4 Mb of sequence (Jeffries et al. 2007). This strain has been subjected to chemical mutagenesis, phenotypic selection, genetic engineering, and adaptive evolution in order to develop strains improved for ethanol production. Chemical mutagenesis and selection resulted in small improvements in ethanol production attributable in part to carbon catabolite derepression (Supplemental Fig. 1; Methods). Disruption of CYC1 (cyctochrome c, isoform 1) to create strain Shi21 increased the specific ethanol production rate by 50% and the ethanol yield by 10%; however, the nature of additional mutational events leading to this phenotype was uncharacterized.Traditional methods for identifying mutations are laborand time-intensive, so we tested the ability of next-generation sequencing technologies to determine the differences in this improved strain's entire genome relative to the reference strain. We generated high-coverage, whole-genome data sets using single fragment end reads from three next-generation sequencing platforms: 454 Life Sc...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.