2021
DOI: 10.1007/s13280-021-01598-8
|View full text |Cite
|
Sign up to set email alerts
|

Accelerating the pace of ecotoxicological assessment using artificial intelligence

Abstract: Species Sensitivity Distribution (SSD) is a key metric for understanding the potential ecotoxicological impacts of chemicals. However, SSDs have been developed to estimate for only handful of chemicals due to the scarcity of experimental toxicity data. Here we present a novel approach to expand the chemical coverage of SSDs using Artificial Neural Network (ANN). We collected over 2000 experimental toxicity data in Lethal Concentration 50 (LC50) for 8 aquatic species and trained an ANN model for each of the 8 a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 16 publications
(16 citation statements)
references
References 55 publications
0
12
0
Order By: Relevance
“…The simplest train-test-split can be achieved by random sampling of data points, provided by us as totally random , which has been the main approach in previous work applying ML to ecotoxicology, and generally suffices for a well-balanced dataset without repeated experiments 31,56,57 . For our dataset with repeated experiments, i.e., data points coinciding in chemical, species, and experimental variables (Figure 2), the totally random approach has a high risk of data leakage and the associated overestimated model performances.…”
Section: Methodsmentioning
confidence: 99%
“…The simplest train-test-split can be achieved by random sampling of data points, provided by us as totally random , which has been the main approach in previous work applying ML to ecotoxicology, and generally suffices for a well-balanced dataset without repeated experiments 31,56,57 . For our dataset with repeated experiments, i.e., data points coinciding in chemical, species, and experimental variables (Figure 2), the totally random approach has a high risk of data leakage and the associated overestimated model performances.…”
Section: Methodsmentioning
confidence: 99%
“…The simplest train-test-split can be achieved by random sampling of data points, which has been the main approach in previous work applying ML to ecotoxicology and generally suffices for a well-balanced dataset without repeated experiments 12,30,31 . For the ADORE dataset with repeated experiments, i.e., data points coinciding in chemical, species, and experimental conditions, this approach has a high risk of data leakage and the associated overestimated model performances, as the same chemical as well as the same chemical-taxon pair are likely to appear in both the training and test set.…”
Section: Split Totally At Randommentioning
confidence: 99%
“…However, assessing the growing number of marketed chemicals across different consumer products, populations, and environments is increasingly challenging. , Characterizing chemical toxicity impacts, including aspects on environmental fate, exposure, and (eco-)­toxicity effects, is essential across a wide range of chemical-related decision support tools, such as risk assessment , and screening, life cycle impact assessment (LCIA), , chemical footprinting, , chemical substitution, benchmarking chemical pollution against local-to-global boundaries, , and safe-and-sustainable-by-design (SSbD) assessments . The application of chemical-related decision support tools to the >100,000 marketed chemicals and the wide range of product uses is currently limited by a lack of structured, high-quality input data needed to characterize toxicity for millions of chemical–product combinations. , Obtaining new data from experimental tests is cost- and time-consuming, and confidential or nontransparent reporting hinder access to existing data. To address data gaps, scientists have been developing quantitative structure–activity relationships (QSAR) for decades by creating quantitative links between chemical structures and various target properties, including input parameters for characterizing chemical toxicity. , With increasing data availability and computing power, QSAR evolved from simple regressions on small sets of congeneric compounds to applying advanced statistical and machine learning (ML) techniques on large chemical sets with diverse molecular structures, boosting their predictive performance and applicability for a broader realm of chemicals. , Several advanced chemical data prediction models are readily accessible through public modeling suites providing predictions for multiple chemical properties , and many more have been documented in the scientific literature for individual chemical properties, including dissociation constants, , root concentration factors, and ecotoxicity end points. While the development of ML-based approaches has been an active field of research, a systematic adaptation for chemical toxicity characterization is still limited. The main challenges relate to a lack of oversight into required input parameters which could support the systematic development of ML-based approaches and a lack of transparency about whether such approaches can robustly predict parameter...…”
Section: Introductionmentioning
confidence: 99%