2002
DOI: 10.1186/gb-2002-3-12-research0081
|View full text |Cite
|
Sign up to set email alerts
|

An integrated computational pipeline and database to support whole-genome sequence annotation

Abstract: We describe here our experience in annotating the Drosophila melanogaster genome sequence, in the course of which we developed several new open-source software tools and a database schema to support large-scale genome annotation. We have developed these into an integrated and reusable software system for whole-genome annotation. The key contributions to overall annotation quality are the marshalling of high-quality sequences for alignments and the design of a system with an adaptable and expandable flexible ar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
11
0

Year Published

2005
2005
2012
2012

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 46 publications
(12 citation statements)
references
References 35 publications
1
11
0
Order By: Relevance
“…Our work fulfills the demand for a unified approach to TE annotation that capitalizes on the strength of multiple TE detection methods [3] and places TE annotation on common conceptual framework with gene annotation [59]. Compared with annotations generated for the Release 3 sequence [18], we confirmed precisely 743 out of 1,572 TE annotations.…”
Section: Discussionsupporting
confidence: 56%
“…Our work fulfills the demand for a unified approach to TE annotation that capitalizes on the strength of multiple TE detection methods [3] and places TE annotation on common conceptual framework with gene annotation [59]. Compared with annotations generated for the Release 3 sequence [18], we confirmed precisely 743 out of 1,572 TE annotations.…”
Section: Discussionsupporting
confidence: 56%
“…Although the ORFs encoded by our candidate ncRNAs are indeed short and their translated sequences have no similarity to known proteins (BLASTX analysis of these sequences was conducted as part of the D. melanogaster 3.1 annotation pipeline) (12,15), the possibility remains that some of these sequences encode novel small peptides. To assess this possibility, we asked whether sequence conservation in the D. pseudoobscura genome is greater within the ORF than in the remainder of the transcript.…”
Section: Resultsmentioning
confidence: 99%
“…ncRNA genes, we screened for cDNAs that did not intersect existing annotations (12,15). Additional analyses (transcript length, ORF length and composition, initiating codon, polyadenylation length and consensus sites, genomic extent of transcription unit, and splice site prediction) were accomplished by using PERL scripts and the WU-BLAST 2.0 (http:͞͞blast.wustl.edu) and SIM4 (16) …”
Section: Methodsmentioning
confidence: 99%
“…For other low-quality reads, we used PHRED to call bases and score quality. RT-PCR and oligo sequences were aligned to the genomic sequence by using SIM4WRAP (11). Matches were filtered by using the BERKELEY OUTPUT PARSER (11).…”
Section: Methodsmentioning
confidence: 99%
“…RT-PCR and oligo sequences were aligned to the genomic sequence by using SIM4WRAP (11). Matches were filtered by using the BERKELEY OUTPUT PARSER (11). The control, GENSCAN, FGENESH, and Heidelberg (2) predictions and associated oligos were loaded into a modified release 3.1 gadfly database, and each prediction was visualized with aligned RT-PCR products and oligos by using the APOLLO genome annotation browser and editor (15).…”
Section: Methodsmentioning
confidence: 99%