As part of an international effort to sequence the rice genome, the Clemson University Genomics Institute is developing a sequence-tagged-connector (STC) framework. This framework includes the generation of deep-coverage BAC libraries from O. sativa ssp. japonica c.v. Nipponbare and the sequencing of both ends of the genomic DNA insert of the BAC clones. Here, we report a survey of the transposable elements (TE) in >73,000 STCs. A total of 6848 STCs were found homologous to regions of known TE sequences (E<10 −5 ) by FASTX search of STCs against a set of 1358 TE protein sequences obtained from GenBank. Of these TE-containing STCs (TE-STCs), 88% (6027) are related to retroelements and the remaining are transposase homologs. Nearly all DNA transposons known previously in plants were present in the STCs, including maize Ac/Ds, En/Spm, Mutator, and mariner-like elements. In addition, 2746 STCs were found to contain regions homologous to known miniature inverted-repeat transposable elements (MITEs). The distribution of these MITEs in regions near genes was confirmed by EST comparisons to MITE-containing STCs, and our results showed that the association of MITEs with known EST transcripts varies by MITE type. Unlike the biased distribution of retroelements in maize, we found no evidence for the presence of gene islands when we correlated TE-STCs with a physical map of the CUGI BAC library. These analyses of TEs in nearly 50 Mb of rice genomic DNA provide an interesting and informative preview of the rice genome.
SummaryA pattern enumeration algorithm named GBSSR has been developed to analyse coexpressed gene groups identified through gene chip expression profiling to search for putative cis -regulatory elements, an important step toward understanding transcriptional factors, quantitative trait loci and gene regulatory networks. Without making any statistical assumptions, this algorithm establishes the frequency distribution of all eligible 6 -15 bp strings by extensive bootstrap sampling from an entire genome worth of promoters, enabling those over-represented in a co-expressed gene group to be identified. Using a well-studied plant cold responsive gene system as a positive control, several known cold responsive elements were identified as top ranking candidates, along with some potentially novel ones. A typical analysis of 40 co-expressed genes takes a relatively inexpensive Linux cluster with 32 × 1.4 GHz Intel CPUs about 7 days to process.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.