As a base for human transcriptome and functional genomics, we created the "full-length long Japan" (FLJ) collection of sequenced human cDNAs. We determined the entire sequence of 21,243 selected clones and found that 14,490 cDNAs (10,897 clusters) were unique to the FLJ collection. About half of them (5,416) seemed to be protein-coding. Of those, 1,999 clusters had not been predicted by computational methods. The distribution of GC content of nonpredicted cDNAs had a peak at ∼58% compared with a peak at ∼42%for predicted cDNAs. Thus, there seems to be a slight bias against GC-rich transcripts in current gene prediction procedures. The rest of the cDNAs unique to the FLJ collection (5,481) contained no obvious open reading frames (ORFs) and thus are candidate noncoding RNAs. About one-fourth of them (1,378) showed a clear pattern of splicing. The distribution of GC content of noncoding cDNAs was narrow and had a peak at ∼42%, relatively low compared with that of protein-coding cDNAs.
We compared in detail the characteristics of the sequences of the cDNA clones obtained by the oligo-capping method (oligo-capping clones) with that of the sequences in the UniGene database. To compare the completeness of the sequences, three new variables, "fullness-proportion of clones" (the ratio of complete clones to total clones in a library), "fullness-proportion of genes" (the ratio of complete genes to total genes in a library), and "fullness-proportion of database" (the ratio of complete genes to total genes in a database sampled from a library), were defined. The fullness-proportion of clones of oligo-capping clones was 57.3%, 2.2 times larger than that of UniGene (25.9%). The fullness-proportion of genes of oligo-capping clones was 41.8%, 2.4 times larger than that of UniGene (17.8%). When gene length was restricted to > or = 1.5 kb, the fullness-proportion of genes of oligo-capping clones was four times larger than that of UniGene. The fullness-proportion of database of oligo-capping clones was approximately the same as that of UniGene. By simulating the clone redundancy, this coincidence was found to be due to the large redundancy of the UniGene database. Consequently, the cDNA sequence database of oligo-capping clones enabled high throughput selection of full-length cDNA clones.
Polymerase chain reaction (PCR) to amplify MDV DNA and subsequent sequencing identified the junction of TRL/UL, UL/IRL, IRS/US, and US/TRS. The TRL/UL junction is located 192 bp downstream of the last EcoRI site in the TRL region, while the UL/IRL junction is located 192 bp upstream of the first EcoRI restriction enzyme site in the IRL region. The IRS/US junction is located 950 bp downstream of the second EcoRI site in the IRS region, while the US/TRS junction is located 950 bp upstream of the first EcoRI restriction enzyme site in the TRS region. BamHI restriction enzyme mapping of one of the PCR products identified two novel DNA subfragments, BamHI-U2 and -P4, upstream of the US/TRS junction of the MDV genome. Sequencing of the BamHI-D fragment revealed a novel open reading frame (ORF) encoding a 155 amino acid protein. The TRL/UL junction is located in this ORF. The N-terminal 65 amino acids of this protein is homologous to the N-terminal region of the previously reported pp38, which is located in the UL/IRL region. Computer-assisted analysis indicated that both are transmembrane proteins and that they share an antigenic domain.
The sparkling enope squid, Watasenia scintillans, is a deep-sea mollusk inhabiting the western part of the Pacific Ocean. It has the peculiar ability to illuminate its body without the involvement of other organisms. In this study, we extracted the brain DNA from a single squid female caught in the Japan Sea and determined the complete genome sequence of its mitochondrial DNA using the Illumina sequencing platform. The circular sequence is 20,089 bp in length. Using the next-generation sequencing data, we also estimated the mean copy number of mitochondria per cell in the brain to be 108 by comparing the depths of the read data in the nuclear and mitochondrial genomes. The haploid genome size was calculated to be 4.78 Gb. Six heteroplasmy sites were also identified, together with their allele frequencies, in this individual. Our methodology is shown to be useful in mitochondrion-related studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.