Abstract:The Encyclopedia of DNA Elements (ENCODE) is a huge and still expanding public repository of more than 4,000 experiments and 25,000 data files, assembled by a large international consortium since 2007; unknown biological knowledge can be extracted from these huge and largely unexplored data, leading to data-driven genomic, transcriptomic, and epigenomic discoveries. Yet, search of relevant datasets for knowledge discovery is limitedly supported: metadata describing ENCODE datasets are quite simple and incomple… Show more
“…The data management backend has been produced in the last 2 years, capitalizing on several previous years of experience in the use of bio-ontologies for specific research projects (e.g. SOS-GeM (41) and GPKB (42)). The repository currently integrates about 40 million metadata items from five sources, described by 39 attributes over eight connected tables of the core schema and enriched with terms from eight different ontologies, which have been reduced to the same knowledge schema.…”
Many valuable resources developed by world-wide research institutions and consortia describe genomic datasets that are both open and available for secondary research, but their metadata search interfaces are heterogeneous, not interoperable and sometimes with very limited capabilities. We implemented GenoSurf, a multi-ontology semantic search system providing access to a consolidated collection of metadata attributes found in the most relevant genomic datasets; values of 10 attributes are semantically enriched by making use of the most suited available ontologies. The user of GenoSurf provides as input the search terms, sets the desired level of ontological enrichment and obtains as output the identity of matching data files at the various sources. Search is facilitated by drop-down lists of matching values; aggregate counts describing resulting files are updated in real time while the search terms are progressively added. In addition to the consolidated attributes, users can perform keyword-based searches on the original (raw) metadata, which are also imported; GenoSurf supports the interplay of attribute-based and keyword-based search through well-defined interfaces. Currently, GenoSurf integrates about 40 million metadata of several major valuable data sources, including three providers of clinical and experimental data (TCGA, ENCODE and Roadmap Epigenomics) and two sources of annotation data (GENCODE and RefSeq); it can be used as a standalone resource for targeting the genomic datasets at their original sources (identified with their accession IDs and URLs), or as part of an integrated query answering system for performing complex queries over genomic regions and metadata.
“…The data management backend has been produced in the last 2 years, capitalizing on several previous years of experience in the use of bio-ontologies for specific research projects (e.g. SOS-GeM (41) and GPKB (42)). The repository currently integrates about 40 million metadata items from five sources, described by 39 attributes over eight connected tables of the core schema and enriched with terms from eight different ontologies, which have been reduced to the same knowledge schema.…”
Many valuable resources developed by world-wide research institutions and consortia describe genomic datasets that are both open and available for secondary research, but their metadata search interfaces are heterogeneous, not interoperable and sometimes with very limited capabilities. We implemented GenoSurf, a multi-ontology semantic search system providing access to a consolidated collection of metadata attributes found in the most relevant genomic datasets; values of 10 attributes are semantically enriched by making use of the most suited available ontologies. The user of GenoSurf provides as input the search terms, sets the desired level of ontological enrichment and obtains as output the identity of matching data files at the various sources. Search is facilitated by drop-down lists of matching values; aggregate counts describing resulting files are updated in real time while the search terms are progressively added. In addition to the consolidated attributes, users can perform keyword-based searches on the original (raw) metadata, which are also imported; GenoSurf supports the interplay of attribute-based and keyword-based search through well-defined interfaces. Currently, GenoSurf integrates about 40 million metadata of several major valuable data sources, including three providers of clinical and experimental data (TCGA, ENCODE and Roadmap Epigenomics) and two sources of annotation data (GENCODE and RefSeq); it can be used as a standalone resource for targeting the genomic datasets at their original sources (identified with their accession IDs and URLs), or as part of an integrated query answering system for performing complex queries over genomic regions and metadata.
“…This information is necessary to properly construct the statistical model and interpret the obtained estimates in terms of the variable names, units, etc., and to exploit the ontological knowledge for experimental metadata-based semantic search (e.g. 44 ) to further improve the data discoverability.…”
In the information age, smart data modelling and data management can be carried out to address the wealth of data produced in scientific experiments. In this paper, we propose a semantic model for the statistical analysis of datasets by linear mixed models. We tie together disparate statistical concepts in an interdisciplinary context through the application of ontologies, in particular the Statistics Ontology (StatO), to produce FaIR data summaries. We hope to improve the general understanding of statistical modelling and thus contribute to a better description of the statistical conclusions from data analysis, allowing their efficient exploration and automated processing.
“…A derivação de novos conhecimentos por meio do processo de inferência é um dos benefícios da representação de informações por meio de ontologias. Fernández et al (2016) utilizaram ontologias para gerar uma base de metadados semânticos a partir de metadados e registros médicos do repositório Encyclopedia of DNA elements (ENCODE). Com o intuito de complementar as informações, os autores propuseram a aplicação de técnicas de inferência sobre as informações na base semântica.…”
Section: Inferênciaunclassified
“…Uma das principais motivações dos autores (Fernández et al, 2016) é que, apesar de fornecer informações de alta qualidade, o suporte para busca e recuperação de informações no repositório ENCODE -baseado estritamente na sintaxe dos termos de busca -é insuficiente. Nesse contexto, os autores buscaram melhorar a recuperação nesta base por meio do uso de ontologias e da anotação semântica dos metadados da coleção no repositório ENCONDE.…”
databases, and in the elaboration of queries and the comprehension of the users' needs. We also identified different semantic search engines, usually aimed at specific purposes, rather than generic searches. It was thus verified that SW technologies are used for researchers oriented towards specific contexts, and that researchers with a general purpose use semantic approaches, but not based in the Semantic Web.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.