Maria J. Falaguera scite author profile

¹

,

²

2021

J. Chem. Inf. Model.

The SureChEMBL database provides open access to 17 million chemical entities mentioned in 14 million patents published since 1970. However, alongside with molecules covered by patent claims, the database is full of starting materials and intermediate products of little pharmacological relevance. Herein, we introduce a new filtering protocol to automatically select the core chemical structures best representing a congeneric series of pharmacologically relevant molecules in patents. The protocol is first validated against a selection of 890 SureChEMBL patents for which a total of 51,738 manually curated molecules are deposited in ChEMBL. Our protocol was able to select 92.5% of the molecules in ChEMBL from all 270,968 molecules in SureChEMBL for those patents. Subsequently, the protocol was applied to all 240,988 US pharmacological patents for which 9,111,706 molecules are available in SureChEMBL. The unsupervised filtering process selected 5,949,214 molecules (65.3% of the total number of molecules) that form highly congeneric chemical series in 188,795 of those patents (78.3% of the total number of patents). A SureChEMBL version enriched with molecules of pharmacological relevance is available for download at ftp://ftp.ebi.ac.uk/pub/databases/chembl/SureChEMBLccs.

Congenericity of Claimed Compounds in Patent Applications

¹

,

²

2021

A method is presented to analyze quantitatively the degree of congenericity of claimed compounds in patent applications. The approach successfully differentiates patents exemplified with highly congeneric compounds of a structurally compact and well defined chemical series from patents containing a more diverse set of compounds around a more vaguely described patent claim. An application to 750 common patents available in SureChEMBL, SureChEMBLccs and ChEMBL is presented and the congenericity of patent compounds in those different sources discussed.

Identification of the Core Chemical Structure in SureChEMBL Patents

Falaguera¹,

²

2021

Preprint

The SureChEMBL database provides open access to 17 million chemical entities mentioned in 14 million patents published since 1970. However, alongside with molecules covered by patent claims, the database is full of starting materials and intermediate products of little pharmacological relevance. Herein, we introduce a new filtering protocol to automatically select the core chemical structures best representing a congeneric series of pharmacologically relevant molecules in patents. The protocol is first validated against a selection of 890 SureChEMBL patents for which a total of 51,738 manually curated molecules are deposited in ChEMBL. Our protocol was able to select 92.5% of the molecules in ChEMBL from all 270,968 molecules in SureChEMBL for those patents. Subsequently, the protocol was applied to all 240,988 US pharmacological patents for which 9,111,706 molecules are available in SureChEMBL. The unsupervised filtering process selected 5,949,214 molecules (65.3% of the total number of molecules) that form highly congeneric chemical series in 188,795 of those patents (78.3% of the total number of patents).

Identification of the Core Chemical Structure in SureChEMBL Patents

Falaguera¹,

²

2021

Preprint

The SureChEMBL database provides open access to 17 million chemical entities mentioned in 14 million patents published since 1970. However, alongside with molecules covered by patent claims, the database is full of starting materials and intermediate products of little pharmacological relevance. Herein, we introduce a new filtering protocol to automatically select the core chemical structures best representing a congeneric series of pharmacologically relevant molecules in patents. The protocol is first validated against a selection of 890 SureChEMBL patents for which a total of 51,738 manually curated molecules are deposited in ChEMBL. Our protocol was able to select 92.5% of the molecules in ChEMBL from all 270,968 molecules in SureChEMBL for those patents. Subsequently, the protocol was applied to all 240,988 US pharmacological patents for which 9,111,706 molecules are available in SureChEMBL. The unsupervised filtering process selected 5,949,214 molecules (65.3% of the total number of molecules) that form highly congeneric chemical series in 188,795 of those patents (78.3% of the total number of patents).

Integrative analysis of GWAS and co-localisation data suggests novel genes associated with age-related multimorbidity

West

¹

,

Karim

²

,

³

et al. 2022

Preprint

Advancing age is the greatest risk factor for developing multiple age-related diseases. When developing therapeutics, using a Geroscience approach to target the shared underlying pathways of ageing, rather than individual diseases, may be an effective way to treat and prevent age-related morbidity while potentially reducing the burden of polypharmacy. We harness the Open Targets Platform and Open Targets Genetics Portal to perform a systematic analysis of nearly 1,400 genome-wide association studies (GWAS) mapped to 34 age-related diseases and traits to identify genetic signals that appear to be shared between two or more of these traits. We identify 995 targets with shared genetic links to these age-related diseases and traits, which are enriched in mechanisms of ageing and include known ageing and longevity-related genes. Of these 995 genes, 128 are the target of an approved or investigational drug, 526 have experimental evidence of binding pockets or are predicted to be tractable by small molecule or antibody modality approaches, and 341 have no existing tractability evidence, representing underexplored genes which may reveal novel biological insights and therapeutic opportunities. We present these candidate targets in a web application, TargetAge, to enable the exploration and prioritisation of possible novel drug targets for age-related multimorbidity.

Illuminating the Chemical Space of Untargeted Proteins

¹

,

²

2023

J. Chem. Inf. Model.

2

According to the Illuminating the Druggable Genome (IDG) initiative, 90% of the proteins encoded by the human genome still lack an identified active ligand, that is, a small molecule with biologically relevant binding potency or functional activity in an in vitro assay. Under this scenario, there is an urgent need for new approaches to chemically address these yet untargeted proteins. It is widely recognized that the best starting point for generating novel small molecules for proteins is to exploit the expected polypharmacology of known active ligands across phylogenetically related proteins following the paradigm that similar proteins are likely to interact with similar ligands. Here, we introduce a computational strategy to identify privileged structures that, when chemically expanded, are highly probable to contain active small molecules for untargeted proteins. The protocol was first tested on a set of 576 currently targeted proteins having at least one protein family sibling the year before their first active ligand was reported. A privileged structure contained in active ligands that were identified in the following years was correctly anticipated for 214 (37%) of those targeted proteins, a lower-bound recall estimate when considering data completeness issues. When applied to a set of 1184 untargeted potential druggable genes in cancer, the identification of privileged structures from known bioactive ligands of protein family siblings allowed for extracting a priority list of diverse commercially available small molecules for 960 of them. Assuming a minimum success rate of 37%, the chemical library selections should be able to deliver active ligands for at least 355 currently untargeted proteins associated with cancer.