“…Hepatitis C virus (HCV), a plus sense RNA virus identified in 1989 (Choo et al+, 1989), is estimated to chronically infect roughly 4,000,000 people in the United States (National Institutes of Health, 1997), often with serious consequences (for a review, see Branch et al+, 2000)+ Because HCV poses a public health threat, it is important to identify all HCV RNA structural elements and expressed polypeptides to define all potential diagnostic markers, vaccine components, and targets for pharmaceutical agents+ At the moment, HCV RNA is known to contain a single large open reading frame (ORF), about 9,000 nt in length, encoding a single polyprotein that is the source of 10 viral proteins: the core, E1, E2, P7, NS2, NS3, NS4a, NS4b, NS5a, and NS5b (Rice, 1996)+ This ORF is flanked by about 350 nt at its 59 end and about 220 at its 39 end+ Although the full range of the functions provided by the flanking sequences is not yet clear, terminal structures are likely to play a role in replication, and the 59 flanking sequence forms part of an internal ribosome entry site (IRES) that promotes the initiation of HCV polyprotein synthesis (Brown et al+, 1992;Tsukiyama-Kohara et al+, 1992;Reynolds et al+, 1995;Lu & Wimmer, 1996)+ As exemplified by the hepatitis B virus, viral genomes often contain overlapping genes+ Thus, HCV RNA may contain regions where the main ORF is overlapped by another gene or by an RNA structural element+ To seek these multifunctional regions, we carried out comparative sequence analysis on diverse HCV sequences retrieved from GenBank (Benson et al+, 1996), locating synonymous codons in the standard HCV ORF in which the third position nucleotides are much more conserved than chance alone would pre-dict+ This unusual third-base conservation is likely to occur in regions that have novel functions in addition to their known coding function (see Materials and Methods)+ Previous studies identified some of the regions of HCV RNA that have unusual nucleotide conservation (Ina et al+, 1994;Smith & Simmonds, 1997) and, in particular, they revealed that the RNA sequence of the core-encoding region is more conserved than would be necessary to maintain the observed level of conservation of the core protein+ Ina and colleagues (Ina et al+, 1994) suggested that an overlapping gene might constrain the sequence and proposed that translation of a second ORF might be initiated at the GUG codon at bases Ϫ41 to Ϫ39 and continue into the coding region+ However, the reading frame that contains this GUG has an in-frame stop codon (bases ϩ2 to ϩ4) that terminates it at the start of the main ORF+ This stop codon is present in all reported full-length core sequences; its presence reduces the likelihood that the GUG functions as the start codon for a protein that extends into the core-encoding region+ Smith and Simmonds (1997) concluded that the reduced frequency of synonymous substitutions "cannot be accounted for by additional coding restraints" (p+ 240)+ Recent studies indicate that the initial segment of the core-encoding region of the main HCV ORF contains features necessary for th...…”