As a base for human transcriptome and functional genomics, we created the "full-length long Japan" (FLJ) collection of sequenced human cDNAs. We determined the entire sequence of 21,243 selected clones and found that 14,490 cDNAs (10,897 clusters) were unique to the FLJ collection. About half of them (5,416) seemed to be protein-coding. Of those, 1,999 clusters had not been predicted by computational methods. The distribution of GC content of nonpredicted cDNAs had a peak at ∼58% compared with a peak at ∼42%for predicted cDNAs. Thus, there seems to be a slight bias against GC-rich transcripts in current gene prediction procedures. The rest of the cDNAs unique to the FLJ collection (5,481) contained no obvious open reading frames (ORFs) and thus are candidate noncoding RNAs. About one-fourth of them (1,378) showed a clear pattern of splicing. The distribution of GC content of noncoding cDNAs was narrow and had a peak at ∼42%, relatively low compared with that of protein-coding cDNAs.
A whole-genome radiation hybrid (RH) panel was used to construct a high-resolution map of the rat genome based on microsatellite and gene markers. These include 3,019 new microsatellite markers described here for the first time and 1,714 microsatellite markers with known genetic locations, allowing comparison and integration of maps from different sources. A robust RH framework map containing 1,030 positions ordered with odds of at least 1,000:1 has been defined as a tool for mapping these markers, and for future RH mapping in the rat. More than 500 genes which have been mapped in mouse and/or human were localized with respect to the rat RH framework, allowing the construction of detailed rat-mouse and rat-human comparative maps and illustrating the power of the RH approach for comparative mapping.
Macular corneal dystrophy (MCD; MIM 217800) is an autosomal recessive hereditary disease in which progressive punctate opacities in the cornea result in bilateral loss of vision, eventually necessitating corneal transplantation. MCD is classified into two subtypes, type I and type II, defined by the respective absence and presence of sulphated keratan sulphate in the patient serum, although both types have clinically indistinguishable phenotypes. The gene responsible for MCD type I has been mapped to chromosome 16q22, and that responsible for MCD type II may involve the same locus. Here we identify a new carbohydrate sulphotransferase gene (CHST6), encoding an enzyme designated corneal N-acetylglucosamine-6-sulphotransferase (C-GlcNAc6ST), within the critical region of MCD type I. In MCD type I, we identified several mutations that may lead to inactivation of C-GlcNAc6ST within the coding region of CHST6. In MCD type II, we found large deletions and/or replacements caused by homologous recombination in the upstream region of CHST6. In situ hybridization analysis did not detect CHST6 transcripts in corneal epithelium in an MCD type II patient, suggesting that the mutations found in type II lead to loss of cornea-specific expression of CHST6.
Functional prediction of open reading frames coded in the genome is one of the most important tasks in yeast genomics. Among a number of large-scale experiments for assigning certain functional classes to proteins, experiments determining protein-protein interaction are especially important because interacting proteins usually have the same function. Thus, it seems possible to predict the function of a protein when the function of its interacting partner is known. However, in vitro experiments often suffer from artifacts and a protein can often have multiple binding partners with different functions. We developed an objective prediction method that can systematically include the information of indirect interaction. Our method can predict the subcellular localization, the cellular role and the biochemical function of yeast proteins with accuracies of 72.7%, 63.6% and 52.7%, respectively. The prediction accuracy rises for proteins with more than three binding partners and thus we present the open prediction results for 16 such proteins.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.