The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.
To elucidate the origins of the MHC-B-MHC-C pair and the MHC class I chain-related molecule (MIC)A-MICB pair, we sequenced an MHC class I genomic region of humans, chimpanzees, and rhesus monkeys and analyzed the regions from an evolutionary stand-point, focusing first on LINE sequences that are paralogous within each of the first two species and orthologous between them. Because all the long interspersed nuclear element (LINE) sequences were fragmented and nonfunctional, they were suitable for conducting phylogenetic study and, in particular, for estimating evolutionary time. Our study has revealed that MHC-B and MHC-C duplicated 22.3 million years (Myr) ago, and the ape MICA and MICB duplicated 14.1 Myr ago. We then estimated the divergence time of the rhesus monkey by using other orthologous LINE sequences in the class I regions of the three primate species. The result indicates that rhesus monkeys, and possibly the Old World monkeys in general, diverged from humans 27-30 Myr ago. Interestingly, rhesus monkeys were found to have not the pair of MHC-B and MHC-C but many repeated genes similar to MHC-B. These results support our inference that MHC-B and MHC-C duplicated after the divergence between apes and Old World monkeys
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.