Utilising Graph Machine Learning within Drug Discovery and Development

Gaudelet, Thomas; Day, Ben; Jamasb, Arian R.; Soman, Jyothish; Regep, Cristian; Liu, Gertrude; Hayter, Jeremy B. R.; Vickers, Richard; Roberts, Charles W.M.; Tang, J.; Roblin, David; Blundell, Tom L.; Bronstein, Michael M.; Taylor-King, Jake P.

doi:10.48550/arxiv.2012.05716

Cited by 7 publications

(11 citation statements)

References 112 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, approaches exploiting knowledge graphs are being leveraged within the drug discovery domain to solve key tasks [7,15]. In a drug discovery knowledge graph, entities often represent key elements such as genes, disease or drugs, whilst the relations between them capture interactions.…”

Section: Knowledge Graphs In Drug Discoverymentioning

confidence: 99%

Understanding the Performance of Knowledge Graph Embeddings in Drug Discovery

Bonner,

Barrett,

et al. 2021

Preprint

View full text Add to dashboard Cite

Knowledge Graphs (KG) and associated Knowledge Graph Embedding (KGE) models have recently begun to be explored in the context of drug discovery and have the potential to assist in key challenges such as target identification. In the drug discovery domain, KGs can be employed as part of a process which can result in lab-based experiments being performed, or impact on other decisions, incurring significant time and financial costs and most importantly, ultimately influencing patient healthcare. For KGE models to have impact in this domain, a better understanding of not only of performance, but also the various factors which determine it, is required. In this study we investigate, over the course of many thousands of experiments, the predictive performance of five KGE models on two public drug discoveryoriented KGs. Our goal is not to focus on the best overall model or configuration, instead we take a deeper look at how performance can be affected by changes in the training setup, choice of hyperparameters, model parameter initialisation seed and different splits of the datasets. Our results highlight that these factors have significant impact on performance and can even affect the ranking of models. Indeed these factors should be reported along with model architectures to ensure complete reproducibility and fair comparisons of future work, and we argue this is critical for the acceptance of use, and impact of KGEs in a biomedical setting. To aid reproducibility of our own work, we release all experimentation code.Preprint. Under review.

show abstract

Section: Knowledge Graphs In Drug Discoverymentioning

confidence: 99%

Understanding the Performance of Knowledge Graph Embeddings in Drug Discovery

Bonner,

Barrett,

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…As a topical concrete application, knowledge graphs have been utilised to address various tasks in helping to combat the COVID-19 pandemic [35,60,145,112,142,32,55,21,9]. Additionally, considering the domain as a knowledge graph has the potential to enable recent advances in graph-specific machine learning models to be used to address some key tasks [41].…”

Section: Introductionmentioning

confidence: 99%

“…Such issues include assessing how reliable the underlying information is, how best to integrate disparate and heterogeneous resources, how to deal with the uncertainty inherent in the domain, how best to translate key drug discovery objectives into machine learning training objectives, and how to model and express data that is often quantitative and contextual in nature. Despite these complications, an increasing level of interest in the area suggests that knowledge graphs could play a crucial role in enabling machine learning based approaches for drug discovery [41,53,156].…”

Section: Introductionmentioning

confidence: 99%

A Review of Biomedical Datasets Relating to Drug Discovery: A Knowledge Graph Perspective

Bonner,

Barrett,

et al. 2021

Preprint

View full text Add to dashboard Cite

Drug discovery and development is an extremely complex process, with high attrition contributing to the costs of delivering new medicines to patients. Recently, various machine learning approaches have been proposed and investigated to help improve the effectiveness and speed of multiple stages of the drug discovery pipeline. Among these techniques, it is especially those using Knowledge Graphs that are proving to have considerable promise across a range of tasks, including drug repurposing, drug toxicity prediction and target gene-disease prioritisation. In such a knowledge graph-based representation of drug discovery domains, crucial elements including genes, diseases and drugs are represented as entities or vertices, whilst relationships or edges between them indicate some level of interaction. For example, an edge between a disease and drug entity might represent a successful clinical trial, or an edge between two drug entities could indicate a potentially harmful interaction. In order to construct high-quality and ultimately informative knowledge graphs however, suitable data and information is of course required. In this review, we detail publicly available primary data sources containing information suitable for use in constructing various drug discovery focused knowledge graphs. We aim to help guide machine learning and knowledge graph practitioners who are interested in applying new techniques to the drug discovery field, but who may be unfamiliar with the relevant data sources. The chosen datasets are selected via strict criteria, categorised according to the primary area of biological information contained within and are considered based upon what type of information could be extracted from them in order to help build a knowledge graph. To help motivate the study, a series of case studies of successful applications of knowledge graphs in drug discovery is presented. We also detail the existing pre-constructed knowledge graphs that have been made available for public access which could serve as potential machine learning benchmarks, as well as starting points for further taskspecific graph composition enrichments. Additionally, throughout the review, we raise the numerous and unique challenges and issues associated with the domain and its datasets -for example, the inherent uncertainty within the data, its constantly evolving nature and the various forms of bias in many sources. Overall we hope this review will help motivate more machine learning researchers to explore combining knowledge graphs and machine learning to help solve key and emerging questions in the drug discovery domain.Preprint. Under review.

show abstract

“…GNNs have recently emerged as a powerful class of deep learning architectures to analyze datasets where information is present in the form of heteregeneous graphs that encode complex data connectivity. Experimentally, these architectures have shown great promises to be impactful in diverse domains such as drug design (Stokes et al, 2020;Gaudelet et al, 2020), social networks (Monti et al, 2019;Pal et al, 2020), traffic networks (Derrow-Pinion et al, 2021), physics (Cranmer et al, 2019;Bapst et al, 2020), combinatorial optimization (Bengio et al, 2021;Cappart et al, 2021) and medical diagnosis (Li et al, 2020c).…”

Section: Introductionmentioning

confidence: 99%

Graph Neural Networks with Learnable Structural and Positional Representations

Dwivedi¹,

Luu²,

Thomas³

et al. 2021

Preprint

View full text Add to dashboard Cite

Graph neural networks (GNNs) have become the standard learning architectures for graphs. GNNs have been applied to numerous domains ranging from quantum chemistry, recommender systems to knowledge graphs and natural language processing. A major issue with arbitrary graphs is the absence of canonical positional information of nodes, which decreases the representation power of GNNs to distinguish e.g. isomorphic nodes and other graph symmetries. An approach to tackle this issue is to introduce Positional Encoding (PE) of nodes, and inject it into the input layer, like in Transformers. Possible graph PE are Laplacian eigenvectors. In this work, we propose to decouple structural and positional representations to make easy for the network to learn these two essential properties. We introduce a novel generic architecture which we call LSPE (Learnable Structural and Positional Encodings). We investigate several sparse and fully-connected (Transformer-like) GNNs, and observe a performance increase for molecular datasets, from 2.87% up to 64.14% when considering learnable PE for both GNN classes. 1

show abstract

Utilising Graph Machine Learning within Drug Discovery and Development

Cited by 7 publications

References 112 publications

Understanding the Performance of Knowledge Graph Embeddings in Drug Discovery

Understanding the Performance of Knowledge Graph Embeddings in Drug Discovery

A Review of Biomedical Datasets Relating to Drug Discovery: A Knowledge Graph Perspective

Graph Neural Networks with Learnable Structural and Positional Representations

Contact Info

Product

Resources

About