2019
DOI: 10.1007/978-3-030-33223-5_29
|View full text |Cite
|
Sign up to set email alerts
|

From a Conceptual Model to a Knowledge Graph for Genomic Datasets

Abstract: Data access at genomic repositories is problematic, as data is described by heterogeneous and hardly comparable metadata. We previously introduced a unified conceptual schema, collected metadata in a single repository and provided classical search methods upon them. We here propose a new paradigm to support semantic search of integrated genomic metadata, based on the Genomic Knowledge Graph, a semantic graph of genomic terms and concepts, which combines the original information provided by each source with cur… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
1

Relationship

4
2

Authors

Journals

citations
Cited by 12 publications
(11 citation statements)
references
References 14 publications
0
11
0
Order By: Relevance
“…The three main organizations providing open-source viral sequences are NCBI (US), DDBJ (Japan), and EMBL-EBI (Europe); they operate within the broader contexts provided by the International Nucleotide Sequence Database Collaboration. 5 NCBI hosts the two, so far, most relevant viral sequence databases: Gen-Bank [35] contains the annotated collection of all publicly available DNA and RNA sequences; RefSeq [28] provides a stable reference for genome annotation, gene identification and characterization, and mutation/polymorphism analysis. GenBank is continuously updated thanks to the abundant sharing of multiple laboratories and data contributors around the world (note that SARS-CoV2 nucleotide sequences have increased from about 300 around the end of March 2020, to 1,624 as of April 27th).…”
Section: Current Scenariomentioning
confidence: 99%
See 1 more Smart Citation
“…The three main organizations providing open-source viral sequences are NCBI (US), DDBJ (Japan), and EMBL-EBI (Europe); they operate within the broader contexts provided by the International Nucleotide Sequence Database Collaboration. 5 NCBI hosts the two, so far, most relevant viral sequence databases: Gen-Bank [35] contains the annotated collection of all publicly available DNA and RNA sequences; RefSeq [28] provides a stable reference for genome annotation, gene identification and characterization, and mutation/polymorphism analysis. GenBank is continuously updated thanks to the abundant sharing of multiple laboratories and data contributors around the world (note that SARS-CoV2 nucleotide sequences have increased from about 300 around the end of March 2020, to 1,624 as of April 27th).…”
Section: Current Scenariomentioning
confidence: 99%
“…We have previously proposed a conceptual model focused on human genomics [6], which was based on a central entity Item, representing files of genomic regions. The simple schema evolved into a knowledge graph [5], including ontological representation of many relevant attributes (e.g., diseases, cell lines, tissue types...). The approach was validated through the practical implementation of the integration pipeline META-BASE 2 , which feeds an integrated database, searchable through the GenoSurf 3 interface [8].…”
Section: Introductionmentioning
confidence: 99%
“…In the upper part of Fig. 1 we show its sketch from [3] (this conceptual representation is also used for the advanced user interface, see [2]); with respect to the original GCM one can note some small changes, which are due our experience of use of the model. The Item represents the central entity of the schema: a single experimental (or annotation) file of genomic regions with their properties.…”
Section: Genomic Conceptual Model: Original and Simplifiedmentioning
confidence: 99%
“…For what concerns metadata in particular, the pipeline includes value normalization and enrichment steps that improve the ability to compare metadata from different sources. Currently, we have integrated experimental genomic data from Encyclopedia of DNA Elements (ENCODE), The Cancer Genome Atlas (TCGA), Roadmap Epigenomics, subsets of Gene Expression Omnibus, Cistrome, and annotations from GENCODE and RefSeq (see references in companion paper [2]); we plan to add many other sources.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation