Fooling Partial Dependence via Data Poisoning

Baniecki, Hubert; Kretowicz, Wojciech; Biecek, Przemysław

doi:10.1007/978-3-031-26409-2_8

Cited by 4 publications

(3 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The most popularly used toolkits that we can access from this review are DALEX and AIX360. DALEX21 22 is a library used by R Studio. It only supports a few functionalities (ie, local post-hoc and global post-hoc), whereas AIX36012 is a library used by Python.…”

Section: Discussionmentioning

confidence: 99%

“…For this reason, it is necessary to have the skills and equipment to fill the gap from research to practice. To do so, XAI toolkits like AIX360,12 Alibi,14 Skater,15 H2O,16 17 InterpretML,18 19 EthicalML-XAI,19 20 DALEX,21 22 tf-explain,23 Investigate 24. Most interpretations and explanations are post hoc (local interpretable model-agnostic explanations (LIME) and SHapley Additive exPlanations (SHAP).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Explainable machine learning for breast cancer diagnosis from mammography and ultrasound images: a systematic review

Gurmessa,

Jimma

2024

BMJ Health Care Inform

View full text Add to dashboard Cite

BackgroundBreast cancer is the most common disease in women. Recently, explainable artificial intelligence (XAI) approaches have been dedicated to investigate breast cancer. An overwhelming study has been done on XAI for breast cancer. Therefore, this study aims to review an XAI for breast cancer diagnosis from mammography and ultrasound (US) images. We investigated how XAI methods for breast cancer diagnosis have been evaluated, the existing ethical challenges, research gaps, the XAI used and the relation between the accuracy and explainability of algorithms.MethodsIn this work, Preferred Reporting Items for Systematic Reviews and Meta-Analyses checklist and diagram were used. Peer-reviewed articles and conference proceedings from PubMed, IEEE Explore, ScienceDirect, Scopus and Google Scholar databases were searched. There is no stated date limit to filter the papers. The papers were searched on 19 September 2023, using various combinations of the search terms ‘breast cancer’, ‘explainable’, ‘interpretable’, ‘machine learning’, ‘artificial intelligence’ and ‘XAI’. Rayyan online platform detected duplicates, inclusion and exclusion of papers.ResultsThis study identified 14 primary studies employing XAI for breast cancer diagnosis from mammography and US images. Out of the selected 14 studies, only 1 research evaluated humans’ confidence in using the XAI system—additionally, 92.86% of identified papers identified dataset and dataset-related issues as research gaps and future direction. The result showed that further research and evaluation are needed to determine the most effective XAI method for breast cancer.ConclusionXAI is not conceded to increase users’ and doctors’ trust in the system. For the real-world application, effective and systematic evaluation of its trustworthiness in this scenario is lacking.PROSPERO registration numberCRD42023458665.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Explainable machine learning for breast cancer diagnosis from mammography and ultrasound images: a systematic review

Gurmessa,

Jimma

2024

BMJ Health Care Inform

View full text Add to dashboard Cite

show abstract

“…The ratio of majority to minority classes in the dataset is used to determine the effectiveness of the SDG in generating the target class which is the minority. Finally, the fairness, that is the bias introduced by the ML classifier algorithms as a result of training with synthetic data and bias introduced by real datasets were compared using dalex [30], an package in Python to check fairness. (e.g., malware 324 records and credit card fraud with 284808 records) 6.…”

Section: Related Workmentioning

confidence: 99%

A Methodology and an Empirical Analysis to Determine the Most Suitable Synthetic Data Generator

Kiran,

Kumar

2024

IEEE Access

View full text Add to dashboard Cite

According to a report published by Gartner in 2021, a significant portion of Machine Learning (ML) training data will be artificially generated. This development has led to the emergence of various synthetic data generators (SDGs), particularly those based on Generative Adversarial Networks (GAN). All research endeavors so far have been exploratory, focused on specific objectives such as validating utility or disclosure control or assessing how generators can decrease or increase inherent bias with differential privacy. Hence, we aim to empirically identify an AI-based, data generator that can produce datasets that closely resemble real datasets, while also determining the hyper-parameters that enable a satisfactory balance between utility, privacy, and fairness in the datasets. To achieve this, we utilize the Synthetic Data Vault, Data Synthesizer, and Smartnoise-synth, which are three synthetic data generation packages that are accessible via Python. Different data generation models available within the package are presented with 13 tabular datasets iteratively as sample inputs to generate synthetic data. We generated synthetic data using every dataset and generator and investigated the goodness of the generator using five hypothetical scenarios. The utility and privacy offered by the generated data were compared with those of real data. The fairness in the ML model trained with synthetic data was used as a third metric for evaluation. Finally, we employ synthetic data to train regression and classification Machine Learning (ML) algorithms and evaluate their performance. After conducting experiments, analyzing metrics, and comparing ML scores across all 11 generators, we determined that the CTGAN from SDV and PATECTGAN from the SN-synth package were the most effective in mimicking real data for all 13 datasets utilized in our research.

show abstract