The complete sequence of the genome of a hyper-thermophilic archaebacterium, Pyrococcus horikoshii OT3, has been determined by assembling the sequences of the physical map-based contigs of fosmid clones and of long polymerase chain reaction (PCR) products which were used for gap-filling. The entire length of the genome was 1,738,505 bp. The authenticity of the entire genome sequence was supported by restriction analysis of long PCR products, which were directly amplified from the genomic DNA. As the potential protein-coding regions, a total of 2061 open reading frames (ORFs) were assigned, and by similarity search against public databases, 406 (19.7%) were related to genes with putative function and 453 (22.0%) to the sequences registered but with unknown function. The remaining 1202 ORFs (58.3%) did not show any significant similarity to the sequences in the databases. Sequence comparison among the assigned ORFs in the genome provided evidence that a considerable number of ORFs were generated by sequence duplication. By similarity search, 11 ORFs were assumed to contain the intein elements. The RNA genes identified were a single 16S-23S rRNA operon, two 5S rRNA genes and 46 tRNA genes including two with the intron structure. All the assigned ORFs and RNA coding regions occupied 91.25% of the whole genome. The data presented in this paper are available on the internet at http:@www.nite.go.jp.
A cDNA that expresses a receptor for very low density lipoprotein (VLDL) was isolated from a rabbit heart cDNA library and characterized. The deduced amino acid sequence of the cDNA revealed that the cDNA encodes a protein with stridng homology to the low density lipoprotein (LDL) receptor. (12) with poly(A)+ RNA isolated from normal rabbit heart. To exclude the rabbit LDL receptor, the entire pooled cDNA library was digested with Sal I and recircularized with T4 DNA ligase. The presence of a unique Sal I site in the rabbit LDL receptor cDNA (7) and the vector results in loss of any LDL receptor cDNAs after recircularization and retransformation. The resulting LDL-receptor-subtracted cDNA library was screened with the 1.9-kilobase Sma I-Sal I fragment from the rabbit LDL receptor cDNA (7)
We established a protocol for the prediction of the coding sequences of unidentified human genes based on the double selection and sequence analysis of cDNA clones with inserts carrying unreported 5'-terminal sequences and with insert sizes corresponding to nearly full-length transcripts. By applying the protocol, cDNA clones with inserts longer than 2 kb were isolated from a cDNA library of human immature myeloid cell line KG-1, and the coding sequences of 40 new genes were predicted. A computer search of the sequences indicated that 20 genes contained sequences similar to known genes in the GenBank/EMBL databases. The sequences of the remaining 20 genes were entirely new, and characteristic protein motifs or domains were identified in 32 genes. Other sequence features noted were that the coding sequences of 23 genes were followed by relatively long stretches of 3'-untranslated sequences and that 5 genes contained repetitive sequences in their 3'-untranslated regions. The chromosomal location of these genes has been determined. By increasing the scale of the above analysis, the coding sequences of many unidentified genes can be predicted.
The complete sequence of the genome of an aerobic hyper-thermophilic crenarchaeon, Aeropyrum pernix K1, which optimally grows at 95 degrees C, has been determined by the whole genome shotgun method with some modifications. The entire length of the genome was 1,669,695 bp. The authenticity of the entire sequence was supported by restriction analysis of long PCR products, which were directly amplified from the genomic DNA. As the potential protein-coding regions, a total of 2,694 open reading frames (ORFs) were assigned. By similarity search against public databases, 633 (23.5%) of the ORFs were related to genes with putative function and 523 (19.4%) to the sequences registered but with unknown function. All the genes in the TCA cycle except for that of alpha-ketoglutarate dehydrogenase were included, and instead of the alpha-ketoglutarate dehydrogenase gene, the genes coding for the two subunits of 2-oxoacid:ferredoxin oxidoreductase were identified. The remaining 1,538 ORFs (57.1%) did not show any significant similarity to the sequences in the databases. Sequence comparison among the assigned ORFs suggested that a considerable member of ORFs were generated by sequence duplication. The RNA genes identified were a single 16S-23S rRNA operon, two 5S rRNA genes and 47 tRNA genes including 14 genes with intron structures. All the assigned ORFs and RNA coding regions occupied 89.12% of the whole genome. The data presented in this paper are available on the internet homepage (http://www.mild.nite.go.jp).
The complete genomic sequence of an aerobic thermoacidophilic crenarchaeon, Sulfolobus tokodaii strain7 which optimally grows at 80 degrees C, at low pH, and under aerobic conditions, has been determined by the whole genome shotgun method with slight modifications. The genomic size was 2,694,756 bp long and the G + C content was 32.8%. The following RNA-coding genes were identified: a single 16S-23S rRNA cluster, one 5S rRNA gene and 46 tRNA genes (including 24 intron-containing tRNA genes). The repetitive sequences identified were SR-type repetitive sequences, long dispersed-type repetitive sequences and Tn-like repetitive elements. The genome contained 2826 potential protein-coding regions (open reading frames, ORFs). By similarity search against public databases, 911 (32.2%) ORFs were related to functional assigned genes, 921 (32.6%) were related to conserved ORFs of unknown function, 145 (5.1%) contained some motifs, and remaining 849 (30.0%) did not show any significant similarity to the registered sequences. The ORFs with functional assignments included the candidate genes involved in sulfide metabolism, the TCA cycle and the respiratory chain. Sequence comparison provided evidence suggesting the integration of plasmid, rearrangement of genomic structure, and duplication of genomic regions that may be responsible for the larger genomic size of the S. tokodaii strain7 genome. The genome contained eukaryote-type genes which were not identified in other archaea and lacked the CCA sequence in the tRNA genes. The result suggests that this strain is closer to eukaryotes among the archaea strains so far sequenced. The data presented in this paper are also available on the internet homepage (http://www.bio.nite.go.jp/E-home/genome_list-e.html/).
In this series of projects of sequencing human cDNA clones which correspond to relatively long and nearly full-length transcripts, we newly determined the sequences of 80 clones, and predicted the coding sequences of the corresponding genes, named KIAA0201 to KIAA0280. Among the sequenced clones, 68 were obtained from human immature myeloid cell line KG-1 and 12 from human brain. The average size of the clones was 5.3 kb, and that of distinct ORFs in clones was 2.8 kb, corresponding to a protein of approximately 100 kDa. Computer search against the public databases indicated that the sequences of 22 genes were unrelated to any reported genes, while the remaining 58 genes carried sequences which show some similarities to known genes. Protein motifs that matched those in the PROSITE motif database were found in 25 genes and significant transmembrane domains were identified in 30 genes. Among the known genes to which significant similarity was shown, the genes that play key roles in regulation of developmental stages, apoptosis and cell-to-cell interaction were included. Taking into account of both the search data on sequence similarity and protein motifs, at least seven genes were considered to be related to transcriptional regulation and six genes to signal transduction. When the expression profiles of the cDNA clones were examined with different human tissues, about half of the clones from brain (5 of 11) showed significant tissue-specificity, while approximately 80% of the genes from KG-1 were expressed ubiquitously.
Corynebacterium efficiens is the closest relative of Corynebacterium glutamicum, a species widely used for the industrial production of amino acids. C. efficiens but not C. glutamicum can grow above 40°C. We sequenced the complete C. efficiens genome to investigate the basis of its thermostability by comparing its genome with that of C. glutamicum. The difference in GC content between the species was reflected in codon usage and nucleotide substitutions. Our comparative genomic study clearly showed that there was tremendous bias in amino acid substitutions in all orthologous ORFs. Analysis of the direction of the amino acid substitutions suggested that three substitutions are important for the stability of the C. efficiens proteins: from lysine to arginine, serine to alanine, and serine to threonine. Our results strongly suggest that the accumulation of these three types of amino acid substitutions correlates with the acquisition of thermostability and is responsible for the greater GC content of C. efficiens.
We report the complete genome sequence of the deep-sea ␥-proteobacterium, Idiomarina loihiensis, isolated recently from a hydrothermal vent at 1,300-m depth on the Lo ihi submarine volcano, Hawaii. The I. loihiensis genome comprises a single chromosome of 2,839,318 base pairs, encoding 2,640 proteins, four rRNA operons, and 56 tRNA genes. A comparison of I. loihiensis to the genomes of other ␥-proteobacteria reveals abundance of amino acid transport and degradation enzymes, but a loss of sugar transport systems and certain enzymes of sugar metabolism. This finding suggests that I. loihiensis relies primarily on amino acid catabolism, rather than on sugar fermentation, for carbon and energy. Enzymes for biosynthesis of purines, pyrimidines, the majority of amino acids, and coenzymes are encoded in the genome, but biosynthetic pathways for Leu, Ile, Val, Thr, and Met are incomplete. Auxotrophy for Val and Thr was confirmed by in vivo experiments. The I. loihiensis genome contains a cluster of 32 genes encoding enzymes for exopolysaccharide and capsular polysaccharide synthesis. It also encodes diverse peptidases, a variety of peptide and amino acid uptake systems, and versatile signal transduction machinery. We propose that the source of amino acids for I. loihiensis growth are the proteinaceous particles present in the deep sea hydrothermal vent waters. I. loihiensis would colonize these particles by using the secreted exopolysaccharide, digest these proteins, and metabolize the resulting peptides and amino acids. In summary, the I. loihiensis genome reveals an integrated mechanism of metabolic adaptation to the constantly changing deep-sea hydrothermal ecosystem.hydrothermal vent
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.