An approach for genome analysis based on sequencing and assembly of unselected pieces of DNA from the whole chromosome has been applied to obtain the complete nucleotide sequence (1,830,137 base pairs) of the genome from the bacterium Haemophilus influenzae Rd. This approach eliminates the need for initial mapping efforts and is therefore applicable to the vast array of microbial species for which genome maps are unavailable. The H. influenzae Rd genome sequence (Genome Sequence DataBase accession number L42023) represents the only complete genome sequence from a free-living organism.
The naturally transformable, Gram-negative bacterium Haemophilus influenzae Rd preferentially takes up DNA of its own species by recognizing a 9-base pair sequence, 5'-AAGTGCGGT, carried in multiple copies in its chromosome. With the availability of the complete genome sequence, 1465 copies of the 9-base pair uptake site have been identified. Alignment of these sites unexpectedly reveals an extended consensus region of 29 base pairs containing the core 9-base pair region and two downstream 6-base pair A/T-rich regions, each spaced about one helix turn apart. Seventeen percent of the sites are in inverted repeat pairs, many of which are located downstream to gene termini and are capable of forming stem-loop structures in messenger RNA that might function as signals for transcription termination.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.