Apollo: a sequence annotation editor

Lewis, Stephen E.; Searle, S.; Harris, Nomi L.; Gibson, Mark S.; Iyer, Vivek; Richter, J; Wiel, C. van der; Bayraktaroglu, Leyla; Birney, Ewan; Crosby, MA; Kaminker, JS; Matthews, BB; Prochnik, SE; Smith, CD; Tupy, JL; Rubin, GM; Misra, Sima; Mungall, CJ; Clamp, M. E.

doi:10.1186/gb-2002-3-12-research0082

Cited by 384 publications

(152 citation statements)

References 53 publications

Supporting

Mentioning

145

Contrasting

Unclassified

Order By: Relevance

“…Matches were filtered by using the BERKELEY OUTPUT PARSER (11). The control, GENSCAN, FGENESH, and Heidelberg (2) predictions and associated oligos were loaded into a modified release 3.1 gadfly database, and each prediction was visualized with aligned RT-PCR products and oligos by using the APOLLO genome annotation browser and editor (15). In some cases, poor RT-PCR product sequence quality required manual National Center for Biotechnology Information BLASTN (16) comparison against the release 3 genomic sequence.…”

Section: Methodsmentioning

confidence: 99%

A computational and experimental approach to validating annotations and gene predictions in theDrosophila melanogastergenome

Yandell

Bailey

Misra

et al. 2005

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

Five years after the completion of the sequence of the Drosophila melanogaster genome, the number of protein-coding genes it contains remains a matter of debate; the number of computational gene predictions greatly exceeds the number of validated gene annotations. We have assembled a collection of >10,000 gene predictions that do not overlap existing gene annotations and have developed a process for their validation that allows us to efficiently prioritize and experimentally validate predictions from various sources by sequencing RT-PCR products to confirm gene structures. Our data provide experimental evidence for 122 proteincoding genes. Our analyses suggest that the entire collection of predictions contains only Ϸ700 additional protein-coding genes. Although we cannot rule out the discovery of genes with unusual features that make them refractory to existing methods, our results suggest that the D. melanogaster genome contains Ϸ14,000 protein-coding genes.gene number ͉ validation ͉ genome annotation T he total number of protein-coding genes in the Drosophila melanogaster genome remains a subject of debate. Whereas those who curated the D. melanogaster genome concluded that the annotated 13,659 genes in the 3.1 release likely constitute 95% of all protein-coding genes (1), others researchers have concluded that many, possibly thousands, of protein-coding genes remain unannotated (2). Two issues have fueled the debate surrounding gene number in D. melanogaster: the large numbers of computational gene predictions located within intergenic regions and varying standards of experimental evidence for concluding that a gene prediction corresponds to a real gene.As of release 3.1, Ϸ50% of the D. melanogaster genome is intergenic. Running the gene prediction program GENSCAN (3) on every intergenic region in the D. melanogaster genome results in 10,644 gene predictions spread amongst 62 megabases (Mb) of annotation-free sequence. Surely some of these predictions are real, but how many? The best way to answer this question is to subject a representative sample of the gene predictions to some validation procedure.The design and interpretation of experiments intended to assay expression of genes that have been predicted computationally have become controversial. One approach is to rely on hybridization to microarrays or RT-PCR assays for transcript expression (2), with the detection of a product by agarose gel electrophoresis taken as confirmation of the corresponding gene prediction. However, as our results show, unless the diagnostic PCR product includes a splice junction, amplification of residual genomic DNA and detection of unprocessed transcripts may lead to false verifications of gene predictions. It has also been critical to determine the sequence, and not just the size, of the PCR products (4).One way to obtain spliced cDNAs for sequencing is to perform RT-PCR with a 3Ј-oligo(dT) primer and an upstream PCR primer located in the prediction's 5Ј-most exon. The advantages of this approach are that it requires only a ...

show abstract

Section: Methodsmentioning

confidence: 99%

A computational and experimental approach to validating annotations and gene predictions in theDrosophila melanogastergenome

Yandell

Bailey

Misra

et al. 2005

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

show abstract

“…All gene models were subsequently filtered for transposable elements (TEs) by running InterProSscan (Zdobnov and Apweiler 2001) (see also below) while removing genes matching TE-related protein domains. Manual checking of gene models was then performed using the Apollo Annotation editor (Lewis et al 2002).…”

Section: à5mentioning

confidence: 99%

The genome of the leaf-cutting antAcromyrmex echinatiorsuggests key adaptations to advanced social life and fungus farming

Nygaard¹,

Zhang²,

Schiøtt³

et al. 2011

Genome Res.

218

224

View full text Add to dashboard Cite

We present a high-quality (>100× depth) Illumina genome sequence of the leaf-cutting ant Acromyrmex echinatior, a model species for symbiosis and reproductive conflict studies. We compare this genome with three previously sequenced genomes of ants from different subfamilies and focus our analyses on aspects of the genome likely to be associated with known evolutionary changes. The first is the specialized fungal diet of A. echinatior, where we find gene loss in the ant's arginine synthesis pathway, loss of detoxification genes, and expansion of a group of peptidase proteins. One of these is a unique ant-derived contribution to the fecal fluid, which otherwise consists of “garden manuring” fungal enzymes that are unaffected by ant digestion. The second is multiple mating of queens and ejaculate competition, which may be associated with a greatly expanded nardilysin-like peptidase gene family. The third is sex determination, where we could identify only a single homolog of the feminizer gene. As other ants and the honeybee have duplications of this gene, we hypothesize that this may partly explain the frequent production of diploid male larvae in A. echinatior. The fourth is the evolution of eusociality, where we find a highly conserved ant-specific profile of neuropeptide genes that may be related to caste determination. These first analyses of the A. echinatior genome indicate that considerable genetic changes are likely to have accompanied the transition from hunter-gathering to agricultural food production 50 million years ago, and the transition from single to multiple queen mating 10 million years ago.

show abstract

“…The alignment is built with ClustalW (Larkin et al 2007), and the common structure is derived with RNAz (Washietl et al 2005). It is also possible to map and visualize a selection of predictions in ApolloRNA (http://carlit.toulouse.inra.fr/ApolloRNA), an extension of the annotation environment Apollo (Lewis et al 2002) dedicated to RNA analysis. Last, selected predictions can be exported under the formats multifasta, gff, RNAML, and CSV.…”

Section: Explore Stepmentioning

confidence: 99%

RNAspace.org: An integrated environment for the prediction, annotation, and analysis of ncRNA

Cros¹,

Monte²,

Mariette³

et al. 2011

RNA

View full text Add to dashboard Cite

The annotation of noncoding RNA genes remains a major bottleneck in genome sequencing projects. Most genome sequences released today still come with sets of tRNAs and rRNAs as the only annotated RNA elements, ignoring hundreds of other RNA families. We have developed a web environment that is dedicated to noncoding RNA (ncRNA) prediction, annotation, and analysis and allows users to run a variety of tools in an integrated and flexible manner. This environment offers complementary ncRNA gene finders and a set of tools for the comparison, visualization, editing, and export of ncRNA candidates. Predictions can be filtered according to a large set of characteristics. Based on this environment, we created a public website located at http://RNAspace.org. It accepts genomic sequences up to 5 Mb, which permits for an online annotation of a complete bacterial genome or a small eukaryotic chromosome. The project is hosted as a Source Forge project (http://rnaspace.sourceforge.net/).

show abstract

Apollo: a sequence annotation editor

Cited by 384 publications

References 53 publications

A computational and experimental approach to validating annotations and gene predictions in theDrosophila melanogastergenome

A computational and experimental approach to validating annotations and gene predictions in theDrosophila melanogastergenome

The genome of the leaf-cutting antAcromyrmex echinatiorsuggests key adaptations to advanced social life and fungus farming

RNAspace.org: An integrated environment for the prediction, annotation, and analysis of ncRNA

Contact Info

Product

Resources

About