Precision oncology relies on accurate discovery and interpretation of genomic variants, enabling individualized diagnosis, prognosis and therapy selection. We found that six prominent somatic cancer variant knowledgebases were highly disparate in content, structure and supporting primary literature, impeding consensus when evaluating variants and their relevance in a clinical setting. We developed a framework for harmonizing variant interpretations to produce a meta-knowledgebase of 12,856 aggregate interpretations. We demonstrated large gains in overlap between resources across variants, diseases and drugs as a result of this harmonization. We subsequently demonstrated improved matching between a patient cohort and harmonized interpretations of potential clinical significance, observing an increase from an average of 33% per individual knowledgebase to 57% in aggregate. Our analyses illuminate the need for open, interoperable sharing of variant interpretation data. We also provide a freely available web interface (search.cancervariants.org) for exploring the harmonized interpretations from these six knowledgebases.
Phosphorylation of specific substrates by protein kinases is a key control mechanism for vital cell-fate decisions and other cellular processes. However, discovering specific kinase-substrate relationships is time-consuming and often rather serendipitous. Computational predictions alleviate these challenges, but the current approaches suffer from limitations like restricted kinome coverage and inaccuracy. They also typically utilise only local features without reflecting broader interaction context. To address these limitations, we have developed an alternative predictive model. It uses statistical relational learning on top of phosphorylation networks interpreted as knowledge graphs, a simple yet robust model for representing networked knowledge. Compared to a representative selection of six existing systems, our model has the highest kinome coverage and produces biologically valid high-confidence predictions not possible with the other tools. Specifically, we have experimentally validated predictions of previously unknown phosphorylations by the LATS1, AKT1, PKA and MST2 kinases in human. Thus, our tool is useful for focusing phosphoproteomic experiments, and facilitates the discovery of new phosphorylation reactions. Our model can be accessed publicly via an easy-to-use web interface (LinkPhinder).
Precision oncology relies on the accurate discovery and interpretation of genomic variants to enable individualized therapy selection, diagnosis, or prognosis. However, knowledgebases containing clinical interpretations of somatic cancer variants are highly disparate in interpretation content, structure, and supporting primary literature, reducing consistency and impeding consensus when evaluating variants and their relevance in a clinical setting. With the cooperation of experts of the Global Alliance for Genomics and Health (GA4GH) and of six prominent cancer variant knowledgebases, we developed a framework for aggregating and harmonizing variant interpretations to produce a meta-knowledgebase of 12,856 aggregate interpretations covering 3,437 unique variants in 415 genes, 357 diseases, and 791 drugs. We demonstrated large gains in overlapping terms between resources across variants, diseases, and drugs as a result of this 1 reuse, remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not peer-reviewed) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share,The copyright holder has placed this . http://dx.doi.org/10.1101/366856 doi: bioRxiv preprint first posted online Jul. 11, 2018; harmonization. We subsequently demonstrated improved matching between patients of the GENIE cohort and harmonized interpretations of potential clinical significance, observing an increase from an average of 34% to 57% in aggregate. We developed an open and freely available web interface for exploring the harmonized interpretations from these six knowledgebases at search.cancervariants.org . MAINThe promise of precision oncology-in which a cancer patient's treatment is informed by the mutational profile of their tumor-requires concise, standardized, and searchable clinical interpretations of the detected variants. These structured interpretations of biomarker-disease associations are therapeutic (predictive of favorable or adverse response to therapy), diagnostic (determinant for disease type or subtype), or prognostic (an indicator for overall patient outcome). Additionally, clinical interpretations include germline variants that may predispose a patient to develop cancer. Isolated institutional efforts have contributed to the curation of the biomedical literature to collect and formalize these interpretations into knowledgebases [1][2][3][4][5][6][7][8][9][10][11][12] , but the vast scale of this overall activity and the rapid generation of new knowledge makes development of a single comprehensive curated knowledgebase infeasible 13 . In addition to the extent and diversity of the curated literature, the content and structure of interpretations within each knowledgebase is shaped by the institution that created it, thus increasing the burden of translating interpretations from multiple knowledgebases into a consensus interpretation for one or more genomic variants. Consequently, stakeholders interested in the effects of genomic variant...
The Global Alliance for Genomics and Health (GA4GH) proposes a data access policy model—“registered access”—to increase and improve access to data requiring an agreement to basic terms and conditions, such as the use of DNA sequence and health data in research. A registered access policy would enable a range of categories of users to gain access, starting with researchers and clinical care professionals. It would also facilitate general use and reuse of data but within the bounds of consent restrictions and other ethical obligations. In piloting registered access with the Scientific Demonstration data sharing projects of GA4GH, we provide additional ethics, policy and technical guidance to facilitate the implementation of this access model in an international setting.
Knowledge graphs became a popular means for modelling complex biological systems where they model the interactions between biological entities and their effects on the biological system. They also provide support for relational learning models which are known to provide highly scalable and accurate predictions of associations between biological entities. Despite the success of the combination of biological knowledge graph and relation learning models in biological predictive tasks, there is a lack of unified biological knowledge graph resources. This forced all current efforts and studies for applying a relational learning model on biological data to compile and build biological knowledge graphs from open biological databases. This process is often performed inconsistently across such efforts, especially in terms of choosing the original resources, aligning identifiers of the different databases and assessing the quality of included data. To make relational learning on biomedical data more standardised and reproducible, we propose a new biological knowledge graph which provides a compilation of curated relational data from open biological databases in a unified format with common, interlinked identifiers. We also provide a new module for mapping identifiers and labels from different databases which can be used to align our knowledge graph with biological data from other heterogeneous sources. Finally, to illustrate practical relevance of our work, we provide a set of benchmarks based on the presented data that can be used to train and assess the relational learning models in various tasks related to pathway and drug discovery.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.