Industry-Scale Orchestrated Federated Learning for Drug Discovery

Oldenhof, Martijn; Ács, Gergely; Pejó, Balázs; Schuffenhauer, Ansgar; Holway, Nicholas; Sturm, Noé; Dieckmann, Arne; Fortmeier, Oliver; Boniface, Eric; Mayer, Clément; Gohier, Arnaud; Schmidtke, Peter; Niwayama, Ritsuya; Kopecky, Dieter; Mervin, Lewis; Rathi, Prakash Chandra; Friedrich, Lukas; Formanek, András; Péter, Antal; Rahaman, Jordon; Zalewski, Adam; Heyndrickx, Wouter; Oluoch, Ezron; Stößel, Manuel; Vančo, Michal; Endico, David; Gelus, Fabien; Boisfossé, Thaïs de; Darbier, Adrien; Nicollet, Ashley; Blottière, Matthieu; Teleńczuk, Maria; Nguyen, Van Tien; Martinez, Thibaud; Boillet, Camille; Moutet, Kelvin; Picosson, Alexandre; Gasser, Aurélien; Djafar, Inal; Simon, Antoine; Arany, Ádám; Simm, Jaak; Moreau, Yves; Engkvist, Ola; Ceulemans, Hugo; Marini, Camille; Galtier, Mathieu

doi:10.1609/aaai.v37i13.26847

Cited by 13 publications

(8 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This indicates that, in practice, the information transfer occurred generally and broadly across a vast spectrum of assays, many of which would not be amenable to cross-compound federation. Notably, a core cross-end point federation scheme can in principle be extended to enable cross-compound federation by mapping common assays to a shared head model . To secure the benefits of cross-end point federation, such cross-compound extension may then best be reserved to a limited set of amenable assays, such as some safety panel assays that happen to be outsourced by multiple pharma partners to common contract research organizations .…”

Section: Discussionmentioning

confidence: 99%

MELLODDY: Cross-pharma Federated Learning at Unprecedented Scale Unlocks Benefits in QSAR without Compromising Proprietary Information

Heyndrickx,

Mervin,

Morawietz

et al. 2023

J. Chem. Inf. Model.

Self Cite

View full text Add to dashboard Cite

Federated multipartner machine learning has been touted as an appealing and efficient method to increase the effective training data volume and thereby the predictivity of models, particularly when the generation of training data is resource-intensive. In the landmark MELLODDY project, indeed, each of ten pharmaceutical companies realized aggregated improvements on its own classification or regression models through federated learning. To this end, they leveraged a novel implementation extending multitask learning across partners, on a platform audited for privacy and security. The experiments involved an unprecedented cross-pharma data set of 2.6+ billion confidential experimental activity data points, documenting 21+ million physical small molecules and 40+ thousand assays in on-target and secondary pharmacodynamics and pharmacokinetics. Appropriate complementary metrics were developed to evaluate the predictive performance in the federated setting. In addition to predictive performance increases in labeled space, the results point toward an extended applicability domain in federated learning. Increases in collective training data volume, including by means of auxiliary data resulting from single concentration high-throughput and imaging assays, continued to boost predictive performance, albeit with a saturating return. Markedly higher improvements were observed for the pharmacokinetics and safety panel assay-based task subsets.

show abstract

Section: Discussionmentioning

confidence: 99%

MELLODDY: Cross-pharma Federated Learning at Unprecedented Scale Unlocks Benefits in QSAR without Compromising Proprietary Information

Heyndrickx,

Mervin,

Morawietz

et al. 2023

J. Chem. Inf. Model.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Data from chemical companies are subject to privacy and confidentiality concerns, which makes such data difficult to work with. New emerging approaches can leverage company data by using secure multiparty computation, where calculations are performed using encrypted data, or by means of federated learning, where local models are trained in each company and only gradients are exchanged thus keeping underlying data secure as exemplified by the innovative MELLODDY project . Bassani et al described the experience of Roche scientists, who used an alternative method in which local models predicted an unlabeled set, which was then used to teach the federated model, thus exploiting the idea of surrogate data sharing .…”

Section: Special and Remarkable Studiesmentioning

confidence: 99%

Introduction to the Special Issue: AI Meets Toxicology

Klambauer,

Clevert,

Shah

et al. 2023

Chem. Res. Toxicol.

View full text Add to dashboard Cite

“…Last, we evaluated the possible leakage of information from the MELLODDY models into the public data set. As the MELLODDY models were trained using data from multiple pharma companies, 40 it is likely that these have, to some extent, absorbed information that is present in this public data set. Thus, we assumed that the presence of compounds from the public test set in the MELLODDY training data would lead to an evaluation bias pushing the performance of the bPK model toward overoptimistic estimates.…”

Section: Analysis Of the Features Learned By The Bpk Score Modelmentioning

confidence: 99%

Prediction of Small-Molecule Developability Using Large-Scale In Silico ADMET Models

Beckers,

Sturm,

Sirockin

et al. 2023

J. Med. Chem.

Self Cite

View full text Add to dashboard Cite

Early in silico assessment of the potential of a series of compounds to deliver a drug is one of the major challenges in computer-assisted drug design. The goal is to identify the right chemical series of compounds out of a large chemical space to then subsequently prioritize the molecules with the highest potential to become a drug. Although multiple approaches to assess compounds have been developed over decades, the quality of these predictors is often not good enough and compounds that agree with the respective estimates are not necessarily druglike. Here, we report a novel deep learning approach that leverages large-scale predictions of ∼100 ADMET assays to assess the potential of a compound to become a relevant drug candidate. The resulting score, which we termed bPK score, substantially outperforms previous approaches and showed strong discriminative performance on data sets where previous approaches did not.

show abstract

Industry-Scale Orchestrated Federated Learning for Drug Discovery

Cited by 13 publications

References 33 publications

MELLODDY: Cross-pharma Federated Learning at Unprecedented Scale Unlocks Benefits in QSAR without Compromising Proprietary Information

MELLODDY: Cross-pharma Federated Learning at Unprecedented Scale Unlocks Benefits in QSAR without Compromising Proprietary Information

Introduction to the Special Issue: AI Meets Toxicology

Prediction of Small-Molecule Developability Using Large-Scale In Silico ADMET Models

Contact Info

Product

Resources

About