The use of knowledge graphs as a data source for machine learning methods to solve complex problems in life sciences has rapidly become popular in recent years. Our Biological Insights Knowledge Graph (BIKG) combines relevant data for drug development from public as well as internal data sources to provide insights for a range of tasks: from identifying new targets to repurposing existing drugs. Besides the common requirements to organisational knowledge graphs such as being able to capture the domain precisely and give the users the ability to search and query the data, the focus on handling multiple use cases and supporting use case-specific machine learning models presents additional challenges: the data models must also be streamlined for the performance of downstream tasks; graph content must be easily customisable for different use cases; different projections of the graph content are required to support a wider range of different consumption modes. In this paper we describe our main design choices in implementation of the BIKG graph and discuss different aspects of its life cycle: from graph construction to exploitation.
BackgroundIdentification and tracking of important communicable diseases is pivotal to our understanding of the geographical distribution of disease, the emergence and spread of novel and resistant infections, and are of particular importance for public health policy planning. Moreover, understanding of current clinical practice norms is essential to audit clinical care, identify areas of concern, and develop interventions to improve care quality.However, there are several barriers to obtaining these research data. For example current disease surveillance mechanisms make it difficult for the busy doctor to know which diseases to notify, to whom and how, and are also time consuming. Consequently, many cases go un-notified. In addition assessments of current clinical practice are typically limited to small retrospective audits in individual hospitals.Therefore, we developed a free smartphone application to try to increase the identification of major infectious diseases and other acute medical presentations and improve our understanding of clinical practice.DescriptionWithin the first month there were over 1000 downloads and over 600 specific disease notifications, coming from a broad range of specialities, grades and from all across the globe, including some resource poor settings.Notifications have already provided important information, such as new cases of TB meningitis, resistant HIV and rabies, and important clinical information, such as where patient with myocardial infarctions are and are not receiving potentially life-saving therapy.The database generated can also answer new, dynamic and targeted questions. When a new guideline is released, for example for a new pandemic infection, we can track, in real-time, the global usage of the guideline and whether the recommendations are being followed. In addition this allows identification of where cases with key markers of severe disease are occurring. This is a potential resource for guideline-producing bodies, clinical governance and public health institutions and also for patient recruitment into ongoing studies.ConclusionsFurther parallel studies are needed to assess the clinical and epidemiological utility of novel disease surveillance applications, such as this, with direct comparisons made to data collected through routine surveillance routes.Nevertheless, current disease surveillance mechanisms do not always comprehensively and accurately reflect disease distribution for many conditions. Smartphone applications, such as ClickClinica, are a novel approach with the potential to generate real-time disease surveillance data that may augment current methods.
Duplication of nodes 1 is a common problem encountered when building knowledge graphs (KGs) from heterogeneous datasets, where it is crucial to be able to merge nodes having the same meaning. OntoMerger is a Python ontology integration library whose functionality is to deduplicate KG nodes. Our approach takes a set of KG nodes, mappings and disconnected hierarchies and generates a set of merged nodes together with a connected hierarchy. In addition, the library provides analytic and data testing functionalities that can be used to fine-tune the inputs, further reducing duplication, and to increase connectivity of the output graph. OntoMerger can be applied to a wide variety of ontologies and KGs. In this paper we introduce OntoMerger and illustrate its functionality on a real-world biomedical KG.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.