Computational Inference of Homologous Gene Structures in the Human Genome

Yeh, Ru-Fang; Lim, Lee P.; Burge, Christopher B.

doi:10.1101/gr.175701

Cited by 366 publications

(244 citation statements)

References 31 publications

Supporting

Mentioning

241

Contrasting

Unclassified

Order By: Relevance

“…This may be particularly relevant, as the GenomeScan software program has been developed to build gene structures from the detection of protein sequence homologies. 10 However, the distribution of such clone-ORF pairs rejected according to their evidence codes did not reveal any significant difference compared to the distribution we reported for the specific ones (data not shown), suggesting that the specificity limit we used did not account for the low percentage of clones assigned to predicted ORFs.…”

Section: Discussioncontrasting

confidence: 59%

“…However, 32.1% of predicted ORFs matched complete genes with at least three exons. 10 If this distribution was met in our study, we may expect to have 487 predicted (P+PE+E) ORFs and to pair at least 156 clones with predicted ORFs (32.1% of 487), representing 17% of all ORF-clone pairs, whereas we only reported 30 such pairs. Finally, the observation that 24 of these pairs (2.6%; Table 4) were derived from EST evidence (E) only vs 58 expected ones (6.3%; Table 3) and four from EST and GenomeScan combination (PE) (0.4%; Table 4) vs 187 expected ones (20.4%; Table 3), suggests that the GenomeScan software program is not associated with any significantly better understanding of the fine gene structure compared to the EST evidence alone.…”

Section: Discussionmentioning

confidence: 73%

“…In the initial report describing the GenomeScan software, the authors reported that among 22 607 genes predicted, 49.3% were partial genes. 10 These data may at least partly explain why we find it difficult to associate clones with predicted ORFs. Defined sequences for these partial genes may not recover the available sequences for the clones, leading to the loss of some clone-ORF pairs.…”

Section: Discussionmentioning

confidence: 89%

“…The GenomeScan software program used by NCBI for the annotation of the human genome combines the use of research for sequence homologies and ab initio predictions. 10,11 As a first step, homologies between genomic and protein sequences are determined. Then, if a protein homology is detected, an ab initio software program is run for the genomic sequence.…”

Section: Discussionmentioning

confidence: 99%

See 3 more Smart Citations

Relevance and limitations of public databases for microarray design: a critical approach to gene predictions

et al. 2003

View full text Add to dashboard Cite

In conjunction with the completion of the human genome sequence, microarray technology offers a complementary strategy to traditional methodologies used to search for genetic determinants involved in multifactorial diseases such as Alzheimer's disease. In order to gain benefits from this strategy, we have designed home-made microarrays to compare the expression of all ORFs located within loci of interest defined by genome scanning in Alzheimer family studies. Two approaches were selected using either probes amplified by PCR from a cDNA bank or specific oligonucleotides. Here, we report the challenging task of validating, prioritising and selecting the best ORFs derived from the genome sequence. The initial inventory from the NCBI website allowed us to select 5849 ORF's within nine loci. Half of them resulted from prediction models using the GenomeScan software. However, our data have shown that predicted ORFs may not be representative of exonic sequences, or even real genes. These observations have led us to exclude these ORFs from our study, decreasing their number from 5849 to 2748. Microarrays may be only 'snapshots' of our current knowledge of the human genome.

show abstract

Section: Discussioncontrasting

confidence: 59%

Section: Discussionmentioning

confidence: 73%

Section: Discussionmentioning

confidence: 89%

Section: Discussionmentioning

confidence: 99%

See 2 more Smart Citations

Relevance and limitations of public databases for microarray design: a critical approach to gene predictions

et al. 2003

View full text Add to dashboard Cite

show abstract

“…Loci were determined by transcript assembly alignments and/or EXONERATE alignments of peptides from A. thaliana, cacao, rice, soybean, grape and poplar peptides to repeat-soft-masked D5 genome using RepeatMasker (http://www.repeatmasker.org) with up to 2,000-bp extensions on both ends, unless extending into another locus on the same strand. Gene models were predicted by homology-based predictors, FGENESH1 41 , FGENESH_EST (similar to FGENESH1, EST as splice site and intron input instead of peptide/ translated open-reading frames) and GenomeScan 42 . The best scored predictions for each locus are selected using multiple positive factors including EST and peptide support, and one negative factor: overlap with repeats.…”

mentioning

confidence: 99%

Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres

Paterson

Wendel

Gundlach

et al. 2012

Nature

1,101

1,363

View full text Add to dashboard Cite

Eukaryotic gene finding

Guigó

2005

Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics

View full text Add to dashboard Cite

After the genome of an organism is sequenced and assembled, the first necessary step toward the understanding of its functional content is to locate all protein‐coding genes. Identification of genes is difficult in the eukaryotic genomes, because of the split nature of eukaryotic genes and because of the large intergenic spacers between adjacent genes. In this article, we will review how computational gene‐prediction programs address this difficulty, describing the basic components underlying most computational methods, and the strategies employed to integrate them.

show abstract

Computational Inference of Homologous Gene Structures in the Human Genome

Cited by 366 publications

References 31 publications

Relevance and limitations of public databases for microarray design: a critical approach to gene predictions

Relevance and limitations of public databases for microarray design: a critical approach to gene predictions

Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres

Eukaryotic gene finding

Contact Info

Product

Resources

About