A teachable moment for dual-use

Urbina, Fabio; Лентзос, Филиппа; Invernizzi, Cédric; Ekins, Sean

doi:10.1038/s42256-022-00511-6

Cited by 9 publications

(6 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, to make all our AChE models more widely available to the scientific community we have created MegaAChE: a website that can be used for predictions of a limited number of molecules from a molecule structure (, Figure S6). We have also recently pointed out that such machine learning methods have a potential for dual use which we need to guard against and ensure valid use of the data sets. , …”

Section: Discussionmentioning

confidence: 99%

Validation of Acetylcholinesterase Inhibition Machine Learning Models for Multiple Species

Vignaux

Lane

Urbina

et al. 2023

Chem. Res. Toxicol.

Self Cite

View full text Add to dashboard Cite

Acetylcholinesterase (AChE) is an important enzyme and target for human therapeutics, environmental safety, and global food supply. Inhibitors of this enzyme are also used for pest elimination and can be misused for suicide or chemical warfare. Adverse effects of AChE pesticides on nontarget organisms, such as fish, amphibians, and humans, have also occurred as a result of biomagnifications of these toxic compounds. We have exhaustively curated the public data for AChE inhibition data and developed machine learning classification models for seven different species. Each set of models were built using up to nine different algorithms for each species and Morgan fingerprints (ECFP6) with an activity cutoff of 1 μM. The human (4075 compounds) and eel (5459 compounds) consensus models predicted AChE inhibition activity using external test sets from literature data with 81% and 82% accuracy, respectively, while the reciprocal cross (76% and 82% percent accuracy) was not species-specific. In addition, we also created machine learning regression models for human and eel AChE inhibition to return a predicted IC50 value for a queried molecule. We did observe an improved species specificity in the regression models, where a human support vector regression model of human AChE inhibition (3652 compounds) predicted the IC50s of the human test set to a better extent than the eel regression model (4930 compounds) on the same test set, based on mean absolute percentage error (MAPE = 9.73% vs 13.4%). The predictive power of these models certainly benefits from increasing the chemical diversity of the training set, as evidenced by expanding our human classification model by incorporating data from the Tox21 library of compounds. Of the 10 compounds we tested that were predicted active by this expanded model, two showed >80% inhibition at 100 μM. This machine learning approach therefore offers the ability to rapidly score massive libraries of molecules against the models for AChE inhibition that can then be selected for future in vitro testing to identify potential toxins. It also enabled us to create a public website, MegaAChE, for single-molecule predictions of AChE inhibition using these models at .

show abstract

Section: Discussionmentioning

confidence: 99%

Validation of Acetylcholinesterase Inhibition Machine Learning Models for Multiple Species

Vignaux

Lane

Urbina

et al. 2023

Chem. Res. Toxicol.

Self Cite

View full text Add to dashboard Cite

show abstract

“…To do this reliably would require a suitable applicability domain to ensure that a prediction was indeed for a chemical that was covered by the chemical property space of the model such that the toxicity predictions are reliable. There are many such applicability methods such as Euclidean, city block, Tanimoto, Mahalanobis, hoteling T2, leverage, and others that can be used to measure the distance from a training set to a test molecule. − With models that can predict toxicity from molecule structures alone also comes the great responsibility to ensure that they are not misused and that the potential for any dual use is minimized by narrowing the scope of the models to predict fewer molecules, or restricting access. − …”

Section: Discussionmentioning

confidence: 99%

Comparing LD₅₀/LC₅₀ Machine Learning Models for Multiple Species

Lane

Harris

Urbina

et al. 2023

ACS Chem. Health Saf.

Self Cite

View full text Add to dashboard Cite

The lethal dose or concentration which kills 50% of the animals (LD50 or LC50) is an important parameter for scientists to understand the toxicity of chemicals in different scenarios that can be used to make go-no-go decisions, and ultimately assist in the choice of the right personal protective equipment needed for containment. The LD50 assessment process has also required the use of many animals although modern methods have reduced the number of rats needed. Since a compound is usually considered highly toxic when the LD50 is lower than 25 mg/kg, such a classification provides potentially valuable safety information to synthetic chemists and other safety assessment scientists. The need for finding alternative approaches such as computational methods is important to ultimately reduce animal use for this testing further still. We now summarize our efforts to use public data for building in vivo LD50 or LC50 classification and regression machine learning models for various species (rat, mouse, fish, and daphnia) and their fivefold cross-validation statistics with different machine learning algorithms as well as an external curated test set for mouse LD50. These datasets consist of different molecule classes, may cover different activity ranges, and also have a range of dataset sizes. The challenges of using such computational models are that their applicability domain will also need to be understood so that they can be used to make reliable predictions for novel molecules. These machine learning models will also need to be backed up with experimental validation. However, such models could also be used for efforts to bridge gaps in individual toxicity datasets. Making such models available also opens them up to potential misuse or dual use. We will summarize these efforts and propose that they could be used for scoring the millions of commercially available molecules, most of which likely do not have a known LD50 or for that matter any data in vitro or in vivo for toxicity.

show abstract

“…The follow-up from this thought-experiment has led us to not only consider how we manage such technologies as a company but also our larger responsibility to raise awareness of the ease of such dual-use of AI in our community before it is too late . We have since reflected on this describing our experiment as a “teachable moment” for the field of dual-use on par with earlier high-profile experimental examples . It also represents “a wake-up call” with clear parallels with other examples of scientists developing powerful technologies without contemplating the misuse potential .…”

Section: Responsible Science In Practicementioning

confidence: 94%

“…14 We have since reflected on this describing our experiment as a "teachable moment" for the field of dual-use on par with earlier high-profile experimental examples. 15 It also represents "a wake-up call" with clear parallels with other examples of scientists developing powerful technologies without contem-plating the misuse potential. 16 These efforts have also been noted by the World Health Organization 17 and the European Commission 18 in recent policy documents.…”

Section: ■ Responsible Science In Practicementioning

confidence: 99%