As a base for human transcriptome and functional genomics, we created the "full-length long Japan" (FLJ) collection of sequenced human cDNAs. We determined the entire sequence of 21,243 selected clones and found that 14,490 cDNAs (10,897 clusters) were unique to the FLJ collection. About half of them (5,416) seemed to be protein-coding. Of those, 1,999 clusters had not been predicted by computational methods. The distribution of GC content of nonpredicted cDNAs had a peak at ∼58% compared with a peak at ∼42%for predicted cDNAs. Thus, there seems to be a slight bias against GC-rich transcripts in current gene prediction procedures. The rest of the cDNAs unique to the FLJ collection (5,481) contained no obvious open reading frames (ORFs) and thus are candidate noncoding RNAs. About one-fourth of them (1,378) showed a clear pattern of splicing. The distribution of GC content of noncoding cDNAs was narrow and had a peak at ∼42%, relatively low compared with that of protein-coding cDNAs.
Despite their high degree of genomic similarity, reminiscent of their relatively recent separation from each other (Ϸ6 million years ago), the molecular basis of traits unique to humans vs. their closest relative, the chimpanzee, is largely unknown. This report describes a large-scale single-contig comparison between human and chimpanzee genomes via the sequence analysis of almost one-half of the immunologically critical MHC. This 1,750,601-bp stretch of DNA, which encompasses the entire class I along with the telomeric part of the MHC class III regions, corresponds to an orthologous 1,870,955 bp of the human HLA region. Sequence analysis confirms the existence of a high degree of sequence similarity between the two species. However, and importantly, this 98.6% sequence identity drops to only 86.7% taking into account the multiple insertions͞deletions (indels) dispersed throughout the region. This is functionally exemplified by a large deletion of 95 kb between the virtual locations of human MICA and MICB genes, which results in a single hybrid chimpanzee MIC gene, in a segment of the MHC genetically linked to species-specific handling of several viral infections (HIV͞SIV, hepatitis B and C) as well as susceptibility to various autoimmune diseases. Finally, if generalized, these data suggest that evolution may have used the mechanistically more drastic indels instead of the more subtle singlenucleotide substitutions for shaping the recently emerged primate species.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.