As a base for human transcriptome and functional genomics, we created the "full-length long Japan" (FLJ) collection of sequenced human cDNAs. We determined the entire sequence of 21,243 selected clones and found that 14,490 cDNAs (10,897 clusters) were unique to the FLJ collection. About half of them (5,416) seemed to be protein-coding. Of those, 1,999 clusters had not been predicted by computational methods. The distribution of GC content of nonpredicted cDNAs had a peak at ∼58% compared with a peak at ∼42%for predicted cDNAs. Thus, there seems to be a slight bias against GC-rich transcripts in current gene prediction procedures. The rest of the cDNAs unique to the FLJ collection (5,481) contained no obvious open reading frames (ORFs) and thus are candidate noncoding RNAs. About one-fourth of them (1,378) showed a clear pattern of splicing. The distribution of GC content of noncoding cDNAs was narrow and had a peak at ∼42%, relatively low compared with that of protein-coding cDNAs.
Mammalian genomes produce huge numbers of noncoding RNAs (ncRNAs). However, the functions of most ncRNAs are unclear, and novel techniques that can distinguish functional ncRNAs are needed. Studies of mRNAs have revealed that the half-life of each mRNA is closely related to its physiological function, raising the possibility that the RNA stability of an ncRNA reflects its function. In this study, we first determined the half-lives of 11,052 mRNAs and 1418 ncRNAs in HeLa Tet-off (TO) cells by developing a novel genome-wide method, which we named 59-bromo-uridine immunoprecipitation chase-deep sequencing analysis (BRIC-seq). This method involved pulse-labeling endogenous RNAs with 59-bromo-uridine and measuring the ongoing decrease in RNA levels over time using multifaceted deep sequencing. By analyzing the relationship between RNA half-lives and functional categories, we found that RNAs with a long half-life (t 1/2 $ 4 h) contained a significant proportion of ncRNAs, as well as mRNAs involved in housekeeping functions, whereas RNAs with a short halflife (t 1/2 < 4 h) included known regulatory ncRNAs and regulatory mRNAs. The stabilities of a significant set of short-lived ncRNAs are regulated by external stimuli, such as retinoic acid treatment. In particular, we identified and characterized several novel long ncRNAs involved in cell proliferation from the group of short-lived ncRNAs. We designated this novel class of ncRNAs with a short half-life as Short-Lived noncoding Transcripts (SLiTs). We propose that the strategy of monitoring RNA half-life will provide a powerful tool for investigating hitherto functionally uncharacterized regulatory RNAs.
By analyzing 1,780,295 5Ј-end sequences of human full-length cDNAs derived from 164 kinds of oligo-cap cDNA libraries, we identified 269,774 independent positions of transcriptional start sites (TSSs) for 14,628 human RefSeq genes. These TSSs were clustered into 30,964 clusters that were separated from each other by more than 500 bp and thus are very likely to constitute mutually distinct alternative promoters. To our surprise, at least 7674 (52%) human RefSeq genes were subject to regulation by putative alternative promoters (PAPs). On average, there were 3.1 PAPs per gene, with the composition of one CpG-island-containing promoter per 2.6 CpG-less promoters. In 17% of the PAP-containing loci, tissue-specific use of the PAPs was observed. The richest tissue sources of the tissue-specific PAPs were testis and brain. It was also intriguing that the PAP-containing promoters were enriched in the genes encoding signal transduction-related proteins and were rarer in the genes encoding extracellular proteins, possibly reflecting the varied functional requirement for and the restricted expression of those categories of genes, respectively. The patterns of the first exons were highly diverse as well. On average, there were 7.7 different splicing types of first exons per locus partly produced by the PAPs, suggesting that a wide variety of transcripts can be achieved by this mechanism. Our findings suggest that use of alternate promoters and consequent alternative use of first exons should play a pivotal role in generating the complexity required for the highly elaborated molecular systems in humans.[Supplemental material is available online at www.genome.org. The sequence data from this study have been submitted to DDBJ under accession nos. DA000001-DA999999, DB000001-DB294747, DB294748-DB384947, BP192706-BP383670, AU279383-AU280837, and AU116788-U160826.]One of the most striking findings revealed by the Human Genome Project is that the human genome contains only 20,000-25,000 kinds of protein-coding genes (International Human Genome Sequencing Consortium 2004). This number is unexpectedly small compared with the total gene numbers in yeast, fly, and worm genomes, which are estimated to be 6,000, 14,000, and 19,000, respectively (Goffeau et al. 1996;C. elegans Sequencing Consortium 1998;Adams et al. 2000). It is supposed that there must be other factors in addition to mere gene numbers to satisfy the prerequisites that enable the human genome to fabricate such highly elaborated systems as the brain and immune systems. To explain this, it has been hypothesized that multifaceted use of the genes should play a pivotal role in functional
The discovery of endogenous bioactive peptides has typically required a lengthy identification process. Computer-assisted analysis of cDNA and genomic DNA sequence information can markedly shorten the process. A bioinformatic analysis of full-length, enriched human cDNA libraries searching for previously unidentified bioactive peptides resulted in the identification and characterization of two related peptides of 28 and 20 amino acids, which we designated salusin-alpha and salusin-beta. Salusins are translated from an alternatively spliced mRNA of TOR2A, a gene encoding a protein of the torsion dystonia family. Intravenous administration of salusin-alpha or salusin-beta to rats causes rapid, profound hypotension and bradycardia. Salusins increase intracellular Ca2+, upregulate a variety of genes and induce cell mitogenesis. Salusin-beta stimulates the release of arginine-vasopressin from rat pituitary. Expression of TOR2A mRNA and its splicing into preprosalusin are ubiquitous, and immunoreactive salusin-alpha and salusin-beta are detected in many human tissues, plasma and urine, suggesting that salusins are endocrine and/or paracrine factors.
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.
Appropriate resources and expression technology necessary for human proteomics on a whole-proteome scale are being developed. We prepared a foundation for simple and efficient production of human proteins using the versatile Gateway vector system. We generated 33,275 human Gateway entry clones for protein synthesis, developed mRNA expression protocols for them and improved the wheat germ cell-free protein synthesis system. We applied this protein expression system to the in vitro expression of 13,364 human proteins and assessed their biological activity in two functional categories. Of the 75 tested phosphatases, 58 (77%) showed biological activity. Several cytokines containing disulfide bonds were produced in an active form in a nonreducing wheat germ cell-free expression system. We also manufactured protein microarrays by direct printing of unpurified in vitro-synthesized proteins and demonstrated their utility. Our 'human protein factory' infrastructure includes the resources and expression technology for in vitro proteome research.
Metastatic prostate cancer (PCa) is still an incurable disease. Long non-coding RNAs (lncRNAs) may be an overlooked source of cancer biomarkers and therapeutic targets. We therefore performed RNA sequencing on paired metastatic/non-metastatic PCa xenografts derived from clinical specimens. The most highly up-regulated transcript was LOC728606, a lncRNA now designated PCAT18. PCAT18 is specifically expressed in the prostate compared to 11 other normal tissues (p<0.05) and up-regulated in PCa compared to 15 other neoplasms (p<0.001). Cancer-specific up-regulation of PCAT18 was confirmed on an independent dataset of PCa and benign prostatic hyperplasia samples (p<0.001). PCAT18 was detectable in plasma samples and increased incrementally from healthy individuals to those with localized and metastatic PCa (p<0.01). We identified a PCAT18-associated expression signature (PES), which is highly PCa-specific and activated in metastatic vs. primary PCa samples (p<1E−4, odds ratio>2). The PES was significantly associated with androgen receptor (AR) signalling. Accordingly, AR activation dramatically up-regulated PCAT18 expression in vitro and in vivo. PCAT18 silencing significantly (p<0.001) inhibited PCa cell proliferation and triggered caspase 3/7 activation, with no effect on non-neoplastic cells. PCAT18 silencing also inhibited PCa cell migration (p<0.01) and invasion (p<0.01). These results position PCAT18 as a potential therapeutic target and biomarker for metastatic PCa.
To understand the mechanism of transcriptional regulation, it is essential to identify and characterize the promoter, which is located proximal to the mRNA start site. To identify the promoters from the large volumes of genomic sequences, we used mRNA start sites determined by a large-scale sequencing of the cDNA libraries constructed by the "oligo-capping" method. We aligned the mRNA start sites with the genomic sequences and retrieved adjacent sequences as potential promoter regions (PPRs) for 1031 genes. The PPR sequences were searched to determine the frequencies of major promoter elements. Among 1031 PPRs, 329 (32%) contained TATA boxes, 872 (85%) contained initiators, 999 (97%) contained GC box, and 663 (64%) contained CAAT box. Furthermore, 493 (48%) PPRs were located in CpG islands. This frequency of CpG islands was reduced in TATA + /Inr + PPRs and in the PPRs of ubiquitously expressed genes. In the PPRs of the CGM2 gene, the DRA gene, and the TM30pl genes, which showed highly colon specific expression patterns, the consensus sequences of E boxes were commonly observed. The PPRs were also useful for exploring promoter SNPs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.