We have characterized the approximately 6.5-kilobase cytoplasmic poly(A)+ Line-i (Li) RNA present in a human teratocarcinoma cell line, NTera2Di, by primer extension and by analysis of cloned cDNAs. The bulk of the RNA begins (5' end) at the residue previously identffied as the 5' terminus of the longest known primate genomic Li elements, presumed to represent "unit" length. Several of the cDNA clones are close to 6 kilobase pairs, that is, close to full length. The partial sequences of 18 cDNA clones and full sequence of one (5,975 base pairs) indicate that many different genomic Li elements contribute transcripts to the 6.5-kilobase cytoplasmic poly(A)+ RNA in NTera2Dl cells because no 2 of the 19 cDNAs analyzed had identical sequences. The transcribed elements appear to represent a subset of the total genomic Lls, a subset that has a characteristic consensus sequence in the 3' noncoding region and a high degree of sequence conservation throughout. Two open reading frames (ORFs) of 1,122 (ORF1) and 3,852 (ORF2) bases, flanked by about 800 and 200 bases of sequence at the 5' and 3' ends, respectively, can be identified in the cDNAs. Both ORFs are in the same frame, and they are separated by 33 bases bracketed by two conserved in-frame stop codons. ORF 2 is interrupted by at least one randomly positioned stop codon in the majority of the cDNAs. The data support proposals suggesting that the human Li family includes one or more functional genes as well as an extraordinarily large number of pseudogenes whose ORFs are broken by stop codons. The cDNA structures suggest that both genes and pseudogenes are transcribed. At least one of the cDNAs (cD11), which was sequenced in its entirety, could, in principle, represent an mRNA for production of the ORF1 polypeptide. The similarity of mammalian Lls to several recently described invertebrate movable elements defines a new widely distributed class of elements which we term class 1I retrotransposons.Line-1 (Li) is a family of long highly repeated DNA sequences dispersed in all mammalian genomes (9-12, 14, 22, 23, 55, 58, 65). In primates, the longest known family members are about 6 kilobase pairs (kbp), although many family members are truncated and internally rearranged (1,22,23,31,35,39,48). The structure of randomly selected genomic Lls from various primates is similar to that of processed pseudogenes, including the presence on one strand of long but broken open reading frames (ORFs), an A-rich 3' terminus (on the strand containing the ORFs), the apparent lack of introns interrupting the ORFs, and variablesized target site duplications (Fig. 1). The proteins predicted by the ORFs include regions with homology to reverse transcriptase and nucleic acid-binding proteins (19,24,36).Li elements in other mammals are similar to those in primates with regard to abundance and overall organization (3,19,20,33,36,37,50,60,63). Moreover, both the nucleotide sequence of the ORF region and the polypeptides predicted from the ORFs are homologous in all mammalian orders that have be...