Drug discovery and development is an extremely complex process, with high attrition contributing to the costs of delivering new medicines to patients. Recently, various machine learning approaches have been proposed and investigated to help improve the effectiveness and speed of multiple stages of the drug discovery pipeline. Among these techniques, it is especially those using Knowledge Graphs that are proving to have considerable promise across a range of tasks, including drug repurposing, drug toxicity prediction and target gene-disease prioritisation. In such a knowledge graph-based representation of drug discovery domains, crucial elements including genes, diseases and drugs are represented as entities or vertices, whilst relationships or edges between them indicate some level of interaction. For example, an edge between a disease and drug entity might represent a successful clinical trial, or an edge between two drug entities could indicate a potentially harmful interaction. In order to construct high-quality and ultimately informative knowledge graphs however, suitable data and information is of course required. In this review, we detail publicly available primary data sources containing information suitable for use in constructing various drug discovery focused knowledge graphs. We aim to help guide machine learning and knowledge graph practitioners who are interested in applying new techniques to the drug discovery field, but who may be unfamiliar with the relevant data sources. The chosen datasets are selected via strict criteria, categorised according to the primary area of biological information contained within and are considered based upon what type of information could be extracted from them in order to help build a knowledge graph. To help motivate the study, a series of case studies of successful applications of knowledge graphs in drug discovery is presented. We also detail the existing pre-constructed knowledge graphs that have been made available for public access which could serve as potential machine learning benchmarks, as well as starting points for further taskspecific graph composition enrichments. Additionally, throughout the review, we raise the numerous and unique challenges and issues associated with the domain and its datasets -for example, the inherent uncertainty within the data, its constantly evolving nature and the various forms of bias in many sources. Overall we hope this review will help motivate more machine learning researchers to explore combining knowledge graphs and machine learning to help solve key and emerging questions in the drug discovery domain.Preprint. Under review.