Interpretable machine learning tackles the important problem that humans cannot understand the behaviors of complex machine learning models and how these models arrive at a particular decision. Although many approaches have been proposed, a comprehensive understanding of the achievements and challenges is still lacking. We provide a survey covering existing techniques to increase the interpretability of machine learning models. We also discuss crucial issues that the community should consider in future work such as designing user-friendly explanations and developing comprehensive evaluation metrics to further push forward the area of interpretable machine learning.
While deep neural networks (DNN) have become an effective computational tool, the prediction results are often criticized by the lack of interpretability, which is essential in many real-world applications such as health informatics. Existing attempts based on local interpretations aim to identify relevant features contributing the most to the prediction of DNN by monitoring the neighborhood of a given input. They usually simply ignore the intermediate layers of the DNN that might contain rich information for interpretation. To bridge the gap, in this paper, we propose to investigate a guided feature inversion framework for taking advantage of the deep architectures towards effective interpretation. The proposed framework not only determines the contribution of each feature in the input but also provides insights into the decision-making process of DNN models. By further interacting with the neuron of the target category at the output layer of the DNN, we enforce the interpretation result to be class-discriminative. We apply the proposed interpretation model to different CNN architectures to provide explanations for image data and conduct extensive experiments on ImageNet and PASCAL VOC07 datasets. The interpretation results demonstrate the effectiveness of our proposed framework in providing class-discriminative interpretation for DNN-based prediction.Deep neural networks (DNN) have achieved extremely high prediction accuracy in a wide range of fields such as computer vision [16,21,37], natural language processing [43], and recommender systems [17]. Despite the superior performance, DNN models are often regarded as black-boxes, since these models cannot
Deep learning is increasingly being used in high-stake decision making applications that affect individual lives. However, deep learning models might exhibit algorithmic discrimination behaviors with respect to protected groups, potentially posing negative impacts on individuals and society. Therefore, fairness in deep learning has attracted tremendous attention recently. We provide a comprehensive review covering existing techniques to tackle algorithmic fairness problems from the computational perspective. Specifically, we show that interpretability can serve as a useful ingredient, which could be augmented into the biases detection and mitigation pipelines. We also discuss open research problems and future research directions, aiming to push forward the area of fairness in deep learning and build genuinely fair, accountable, and transparent deep learning systems.
BackgroundNasopharyngeal carcinoma (NPC) is among the most common squamous cell carcinoma in South China and Southeast Asia. Radiotherapy is the primary treatment for NPC. However, radioresistance acts as a significant factor that limits the efficacy of radiotherapy for NPC patients. Growing evidence supports that microRNAs (miRNAs) play an important role in radiation response.MethodsReal-time quantitative PCR was used to analyze the expression of miR-19b-3p in NPC cell lines and NP69. miR-19b-3p expression profiles in NPC tissues were obtained from the Gene Expression Omnibus database. The effect of miR-19b-3p on radiosensitivity was evaluated by cell viability assays, colony formation assays and in vivo experiment. Apoptosis and cell cycle were examined by flow cytometry. Luciferase reporter assay was used to assess the target genes of miR-19b-3p. Expression of target proteins and downstream molecules were analyzed by Western blot.ResultsmiR-19b-3p was upregulated in NPC and served as an independent predictor for reduced patient survival. Radioresponse assays showed that miR-19b-3p overexpression resulted in decreased sensitivity to irradiation, whereas miR-19b-3p downregulation resulted in increased sensitivity to irradiation in vitro. Moreover, miR-19b-3p decreased the sensitivity of NPC cells to irradiation in vivo. Luciferase reporter assay confirmed that TNFAIP3 was a direct target gene of miR-19b-3p. Knockdown of TNFAIP3 reduced sensitivity to irradiation, whereas upregulation of TNFAIP3 expression reversed the inhibitory effects of miR-19b-3p on NPC cell radiosensitivity. Mechanistically, we found that miR-19b-3p increased NPC cell radioresistance by activating the TNFAIP3/ NF-κB axis.ConclusionsmiR-19b-3p contributes to the radioresistance of NPC by activating the TNFAIP3/ NF-κB axis. miR-19b-3p is a determinant of NPC radioresponse and may serve as a potential therapeutic target in NPC treatment.Electronic supplementary materialThe online version of this article (doi:10.1186/s13046-016-0465-1) contains supplementary material, which is available to authorized users.
With the widespread use of deep neural networks (DNNs) in highstake applications, the security problem of the DNN models has received extensive attention. In this paper, we investigate a specific security problem called trojan attack, which aims to attack deployed DNN systems relying on the hidden trigger patterns inserted by malicious hackers. We propose a training-free attack approach which is different from previous work, in which trojaned behaviors are injected by retraining model on a poisoned dataset. Specifically, we do not change parameters in the original model but insert a tiny trojan module (TrojanNet) into the target model. The infected model with a malicious trojan can misclassify inputs into a target label when the inputs are stamped with the special trigger. The proposed TrojanNet has several nice properties including (1) it activates by tiny trigger patterns and keeps silent for other signals, (2) it is model-agnostic and could be injected into most DNNs, dramatically expanding its attack scenarios, and (3) the training-free mechanism saves massive training efforts comparing to conventional trojan attack methods. The experimental results show that TrojanNet can inject the trojan into all labels simultaneously (all-label trojan attack) and achieves 100% attack success rate without affecting model accuracy on original tasks. Experimental analysis further demonstrates that state-of-the-art trojan detection algorithms fail to detect TrojanNet attack. The code is available at https://github.com/trx14/TrojanNet. CCS CONCEPTS• Security and privacy → Malware and its mitigation;
RNN models have achieved the state-of-the-art performance in a wide range of text mining tasks. However, these models are often regarded as black-boxes and are criticized due to the lack of interpretability. In this paper, we enhance the interpretability of RNNs by providing interpretable rationales for RNN predictions. Nevertheless, interpreting RNNs is a challenging problem. Firstly, unlike existing methods that rely on local approximation, we aim to provide rationales that are more faithful to the decision making process of RNN models. Secondly, a flexible interpretation method should be able to assign contribution scores to text segments of varying lengths, instead of only to individual words. To tackle these challenges, we propose a novel attribution method, called REAT, to provide interpretations to RNN predictions. REAT decomposes the final prediction of a RNN into additive contribution of each word in the input text. This additive decomposition enables REAT to further obtain phrase-level attribution scores. In addition, REAT is generally applicable to various RNN architectures, including GRU, LSTM and their bidirectional versions. Experimental results demonstrate the faithfulness and interpretability of the proposed attribution method. Comprehensive analysis shows that our attribution method could unveil the useful linguistic knowledge captured by RNNs. Some analysis further demonstrates our method could be utilized as a debugging tool to examine the vulnerability and failure reasons of RNNs, which may lead to several promising future directions to promote generalization ability of RNNs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.