2017
DOI: 10.1101/184747
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Interoperable genome annotation with GBOL, an extendable infrastructure for functional data mining

Abstract: BackgroundA standard structured format is used by the public sequence databases to present genome annotations. A prerequisite for a direct functional comparison is consistent annotation of the genetic elements with evidence statements. However, the current format provides limited support for data mining, hampering comparative analyses at large scale. Results The provenance of a genome annotation describes the contextual details and derivation history of the process that resulted in the annotation. To enable in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
3

Relationship

7
1

Authors

Journals

citations
Cited by 11 publications
(15 citation statements)
references
References 36 publications
(22 reference statements)
0
15
0
Order By: Relevance
“…Gene predictions were directly stored in the SAPP semantic database ( Koehorst et al, 2017 ). Structural feature description was done using the GBOL ontology ( van Dam et al, 2017 ) Functional genome annotation was done with a standalone version of interproscan v5.24.63.0 ( Zdobnov and Apweiler, 2001 ) in direct interaction with the SAPP database using the pfam31 ( Bateman et al, 2004 ) database. The raw reads and full genome sequence are available from ENA (accession numbers PRJEB21769 and GCA_900248155 ).…”
Section: Methodsmentioning
confidence: 99%
“…Gene predictions were directly stored in the SAPP semantic database ( Koehorst et al, 2017 ). Structural feature description was done using the GBOL ontology ( van Dam et al, 2017 ) Functional genome annotation was done with a standalone version of interproscan v5.24.63.0 ( Zdobnov and Apweiler, 2001 ) in direct interaction with the SAPP database using the pfam31 ( Bateman et al, 2004 ) database. The raw reads and full genome sequence are available from ENA (accession numbers PRJEB21769 and GCA_900248155 ).…”
Section: Methodsmentioning
confidence: 99%
“…The genomic annotation (GFF3) and corresponding genomic sequence (FASTA) of N. gaditana were converted into a semantic framework using SAPP according to the GBOL ontology (Koehorst et al 2017;Van Dam et al 2017). Each RNA-seq dataset was mapped using the transcriptomics module using STAR 2.5 as the read mapping software (Dobin et al 2013).…”
Section: Rna-sequencingmentioning
confidence: 99%
“…A total of 5713 publicly available complete bacterial genomes were downloaded from the NCBI repository (November 2016) 40 . To prevent technical bias due to the use of different annotation tools and pipelines and different thresholds for assessing the significance of the inferred genetic elements, genomes were consistently structurally and functionally de-novo annotated using SAPP 22 , an annotation platform implementing a strictly defined ontology 41 .…”
Section: Genome Annotationmentioning
confidence: 99%
“…Genes were predicted using Prodigal (2.6.3) 43 and the identified proteins were functionally annotated using the Pfam library (version 30.0) within InterProScan (version 5.21-60.0) 25,44 . Annotations were automatically converted into RDF according to the GBOL ontology 41 and loaded into a semantic database for high-throughput annotation and analysis. For the retrieval of information, SPARQL was used (See supplementary file S5 for all queries used).…”
Section: Genome Annotationmentioning
confidence: 99%