Explainable Artificial Intelligence (XAI) has experienced a significant growth over the last few years. This is due to the widespread application of machine learning, particularly deep learning, that has led to the development of highly accurate models but lack explainability and interpretability. A plethora of methods to tackle this problem have been proposed, developed and tested. This systematic review contributes to the body of knowledge by clustering these methods with a hierarchical classification system with four main clusters: review articles, theories and notions, methods and their evaluation. It also summarises the state-of-the-art in XAI and recommends future research directions.
Machine and deep learning have proven their utility to generate data-driven models with high accuracy and precision. However, their non-linear, complex structures are often difficult to interpret. Consequently, many scholars have developed a plethora of methods to explain their functioning and the logic of their inferences. This systematic review aimed to organise these methods into a hierarchical classification system that builds upon and extends existing taxonomies by adding a significant dimension—the output formats. The reviewed scientific papers were retrieved by conducting an initial search on Google Scholar with the keywords “explainable artificial intelligence”; “explainable machine learning”; and “interpretable machine learning”. A subsequent iterative search was carried out by checking the bibliography of these articles. The addition of the dimension of the explanation format makes the proposed classification system a practical tool for scholars, supporting them to select the most suitable type of explanation format for the problem at hand. Given the wide variety of challenges faced by researchers, the existing XAI methods provide several solutions to meet the requirements that differ considerably between the users, problems and application fields of artificial intelligence (AI). The task of identifying the most appropriate explanation can be daunting, thus the need for a classification system that helps with the selection of methods. This work concludes by critically identifying the limitations of the formats of explanations and by providing recommendations and possible future research directions on how to build a more generally applicable XAI method. Future work should be flexible enough to meet the many requirements posed by the widespread use of AI in several fields, and the new regulations.
Understanding the inferences of data-driven, machine-learned models can be seen as a process that discloses the relationships between their input and output. These relationships consist and can be represented as a set of inference rules. However, the models usually do not explicit these rules to their end-users who, subsequently, perceive them as black-boxes and might not trust their predictions. Therefore, scholars have proposed several methods for extracting rules from data-driven machine-learned models to explain their logic. However, limited work exists on the evaluation and comparison of these methods. This study proposes a novel comparative approach to evaluate and compare the rulesets produced by five model-agnostic, post-hoc rule extractors by employing eight quantitative metrics. Eventually, the Friedman test was employed to check whether a method consistently performed better than the others, in terms of the selected metrics, and could be considered superior. Findings demonstrate that these metrics do not provide sufficient evidence to identify superior methods over the others. However, when used together, these metrics form a tool, applicable to every rule-extraction method and machine-learned models, that is, suitable to highlight the strengths and weaknesses of the rule-extractors in various applications in an objective and straightforward manner, without any human interventions. Thus, they are capable of successfully modelling distinctively aspects of explainability, providing to researchers and practitioners vital insights on what a model has learned during its training process and how it makes its predictions.
These distributions were combined with food correlation matrices according to the Iman and Conover method in order to simulate 28 days of consumption for 40,000 simulated individuals. The simulated data were validated by comparing the consumption statistics (e.g. mean, median and certain percentiles) of the simulated individuals to the same statistics estimated from the observed individuals of the Comprehensive Database. The same comparison was done at food group level for each cluster. The opportunities and limitations of using the simulated database for exposure assessments are described.
Exposure models provide critical information for risk assessment of personal care product ingredients, but there have been limited opportunities to compare exposure model predictions to observational exposure data. Urinary excretion data from a biomonitoring study in eight individuals were used to estimate minimum absorbed doses for triclosan and methyl-, ethyl-, and n-propyl- parabens (TCS, MP, EP, PP). Three screening exposure models (European Commission Scientific Commission on Consumer Safety [SCCS] algorithms, ConsExpo in deterministic mode, and RAIDAR-ICE) and two higher-tier probabilistic models (SHEDS-HT, and Creme Care & Cosmetics) were used to model participant exposures. Average urinary excretion rates of TCS, MP, EP, and PP for participants using products with those ingredients were 16.9, 3.32, 1.9, and 0.91 μg/kg-d, respectively. The SCCS default aggregate and RAIDAR-ICE screening models generally resulted in the highest predictions compared to other models. Approximately 60–90% of the model predictions for most of the models were within a factor of 10 of the observed exposures; ~30–40% of the predictions were within a factor of 3. Estimated exposures from urinary data tended to fall in the upper range of predictions from the probabilistic models. This analysis indicates that currently available exposure models provide estimates that are generally realistic. Uncertainties in preservative product concentrations and dermal absorption parameters as well as degree of metabolism following dermal absorption influence interpretation of the modeled vs. measured exposures. Use of multiple models may help characterize potential exposures more fully than reliance on a single model.
Food consumption data are a key element of EFSA's risk assessment activities, forming the basis of dietary exposure assessment at the European level. In 2011, EFSA released the Comprehensive European Food Consumption Database, gathering consumption data from 34 national surveys representing 66,492 individuals from 22 European Union member states. Due to the different methodologies used, national survey data cannot be combined to generate European estimates of dietary exposure. This study was executed to assess how existing consumption data and the representativeness of dietary exposure and risk estimates at the European Union level can be improved by developing a 'Compiled European Food Consumption Database'. To create the database, the usual intake distributions of 589 food items representing the total diet were estimated for 36 clusters composed of subjects belonging to the same age class, gender and having a similar diet. An adapted form of the National Cancer Institute (NCI) method was used for this, with a number of important modifications. Season, body weight and whether or not the food was consumed at the weekend were used to predict the probability of consumption. A gamma distribution was found to be more suitable for modelling the distribution of food amounts in the different food groups instead of a normal distribution. These distributions were combined with food correlation matrices according to the Iman-Conover method in order to simulate 28 days of consumption for 40,000 simulated individuals. The simulated data were validated by comparing the consumption statistics of the simulated individuals and food groups with the same statistics estimated from the Comprehensive Database. The opportunities and limitations of using the simulated database for exposure assessments are described.
The herbicide 2,4-dichlorophenoxyacetic acid (2,4-D) has been commercially available since the 1940's. Despite decades of data on 2,4-D in food, air, soil, and water, as well as in humans, the quality the quality of these data has not been comprehensively evaluated. Using selected elements of the Biomonitoring, Environmental Epidemiology, and Short-lived Chemicals (BEES-C) instrument (temporal variability, avoidance of sample contamination, analyte stability, and urinary methods of matrix adjustment), the quality of 156 publications of environmental- and biomonitoring-based 2,4-D data was examined. Few publications documented steps were taken to avoid sample contamination. Similarly, most studies did not demonstrate the stability of the analyte from sample collection to analysis. Less than half of the biomonitoring publications reported both creatinine-adjusted and unadjusted urine concentrations. The scope and detail of data needed to assess temporal variability and sources of 2,4-D varied widely across the reviewed studies. Exposures to short-lived chemicals such as 2,4-D are impacted by numerous and changing external factors including application practices and formulations. At a minimum, greater transparency in reporting of quality control measures is needed. Perhaps the greatest challenge for the exposure community is the ability to reach consensus on how to address problems specific to short-lived chemical exposures in observational epidemiology investigations. More extensive conversations are needed to advance our understanding of human exposures and enable interpretation of these data to catch up to analytical capabilities. The problems defined in this review remain exquisitely difficult to address for chemicals like 2,4-D, with short and variable environmental and physiological half-lives and with exposures impacted by numerous and changing external factors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.