The adaptation of deep learning models within safety-critical systems cannot rely only on good prediction performance but needs to provide interpretable and robust explanations for their decisions. When modeling complex sequences, attention mechanisms are regarded as the established approach to support deep neural networks with intrinsic interpretability. This paper focuses on the emerging trend of specifically designing diagnostic datasets for understanding the inner workings of attention mechanism based deep learning models for multivariate forecasting tasks. We design a novel benchmark of synthetically designed datasets with the transparent underlying generating process of multiple time series interactions with increasing complexity. The benchmark enables empirical evaluation of the performance of attention based deep neural networks in three different aspects: (i) prediction performance score, (ii) interpretability correctness, (iii) sensitivity analysis. Our analysis shows that although most models have satisfying and stable prediction performance results, they often fail to give correct interpretability. The only model with both a satisfying performance score and correct interpretability is IMV-LSTM, capturing both autocorrelations and crosscorrelations between multiple time series. Interestingly, while evaluating IMV-LSTM on simulated data from statistical and mechanistic models, the correctness of interpretability increases with more complex datasets.
Background: Due to recent changes in breast cancer treatment strategy, significantly more patients are treated with neoadjuvant systemic therapy (NST). Radiological methods do not precisely determine axillary lymph node status, with up to 30% of patients being misdiagnosed. Hence, supplementary methods for lymph node status assessment are needed. This study aimed to apply and evaluate machine learning models on clinicopathological data, with a focus on patients meeting NST criteria, for lymph node metastasis prediction. Methods: From the total breast cancer patient data (n = 8381), 719 patients were identified as eligible for NST. Machine learning models were applied for the NST-criteria group and the total study population. Model explainability was obtained by calculating Shapley values. Results: In the NST-criteria group, random forest achieved the highest performance (AUC: 0.793 [0.713, 0.865]), while in the total study population, XGBoost performed the best (AUC: 0.762 [0.726, 0.795]). Shapley values identified tumor size, Ki-67, and patient age as the most important predictors. Conclusion: Tree-based models achieve a good performance in assessing lymph node status. Such models can lead to more accurate disease stage prediction and consecutively better treatment selection, especially for NST patients where radiological and clinical findings are often the only way of lymph node assessment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.