2001
DOI: 10.1101/gr.175701
|View full text |Cite
|
Sign up to set email alerts
|

Computational Inference of Homologous Gene Structures in the Human Genome

Abstract: With the human genome sequence approaching completion, a major challenge is to identify the locations and encoded protein sequences of all human genes. To address this problem we have developed a new gene identification algorithm, GenomeScan, which combines exon-intron and splice signal models with similarity to known protein sequences in an integrated model. Extensive testing shows that GenomeScan can accurately identify the exon-intron structures of genes in finished or draft human genome sequence with a low… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

1
241
1
1

Year Published

2002
2002
2012
2012

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 366 publications
(244 citation statements)
references
References 31 publications
1
241
1
1
Order By: Relevance
“…This may be particularly relevant, as the GenomeScan software program has been developed to build gene structures from the detection of protein sequence homologies. 10 However, the distribution of such clone-ORF pairs rejected according to their evidence codes did not reveal any significant difference compared to the distribution we reported for the specific ones (data not shown), suggesting that the specificity limit we used did not account for the low percentage of clones assigned to predicted ORFs.…”
Section: Discussioncontrasting
confidence: 59%
See 3 more Smart Citations
“…This may be particularly relevant, as the GenomeScan software program has been developed to build gene structures from the detection of protein sequence homologies. 10 However, the distribution of such clone-ORF pairs rejected according to their evidence codes did not reveal any significant difference compared to the distribution we reported for the specific ones (data not shown), suggesting that the specificity limit we used did not account for the low percentage of clones assigned to predicted ORFs.…”
Section: Discussioncontrasting
confidence: 59%
“…However, 32.1% of predicted ORFs matched complete genes with at least three exons. 10 If this distribution was met in our study, we may expect to have 487 predicted (P+PE+E) ORFs and to pair at least 156 clones with predicted ORFs (32.1% of 487), representing 17% of all ORF-clone pairs, whereas we only reported 30 such pairs. Finally, the observation that 24 of these pairs (2.6%; Table 4) were derived from EST evidence (E) only vs 58 expected ones (6.3%; Table 3) and four from EST and GenomeScan combination (PE) (0.4%; Table 4) vs 187 expected ones (20.4%; Table 3), suggests that the GenomeScan software program is not associated with any significantly better understanding of the fine gene structure compared to the EST evidence alone.…”
Section: Discussionmentioning
confidence: 73%
See 2 more Smart Citations
“…Loci were determined by transcript assembly alignments and/or EXONERATE alignments of peptides from A. thaliana, cacao, rice, soybean, grape and poplar peptides to repeat-soft-masked D5 genome using RepeatMasker (http://www.repeatmasker.org) with up to 2,000-bp extensions on both ends, unless extending into another locus on the same strand. Gene models were predicted by homology-based predictors, FGENESH1 41 , FGENESH_EST (similar to FGENESH1, EST as splice site and intron input instead of peptide/ translated open-reading frames) and GenomeScan 42 . The best scored predictions for each locus are selected using multiple positive factors including EST and peptide support, and one negative factor: overlap with repeats.…”
mentioning
confidence: 99%