“…Accurate computational methods are needed to classify these transcripts and the corresponding genomic exons as protein coding or non-coding, even if the transcript models are incomplete or if they only reveal novel exons of already-known genes. In addition to classifying novel transcript models, such methods also have applications in evaluating and revising existing gene annotations ( Butler et al , 2009 ; Clamp et al , 2007 ; Kellis et al , 2003 ; Lin et al , 2007 ; Pruitt et al , 2009 ), and as input features for de novo gene structure predictors ( Alioto and Guigó, 2009 ; Brent, 2008 ). We have previously ( Lin et al , 2008 ) compared numerous methods for determining whether an exon-length nucleotide sequence is likely to be protein coding or non-coding, including single-sequence metrics that analyze the genome of interest only and comparative genomics metrics that use alignments of orthologous regions in the genomes of related species.…”