Classifiers deployed in the real world operate in a dynamic environment, where the data distribution can change over time. These changes, referred to as concept drift, can cause the predictive performance of the classifier to drop over time, thereby making it obsolete. To be of any real use, these classifiers need to detect drifts and be able to adapt to them, over time. Detecting drifts has traditionally been approached as a supervised task, with labeled data constantly being used for validating the learned model. Although effective in detecting drifts, these techniques are impractical, as labeling is a difficult, costly and time consuming activity. On the other hand, unsupervised change detection techniques are unreliable, as they produce a large number of false alarms. The inefficacy of the unsupervised techniques stems from the exclusion of the characteristics of the learned classifier, from the detection process. In this paper, we propose the Margin Density Drift Detection (MD3) algorithm, which tracks the number of samples in the uncertainty region of a classifier, as a metric to detect drift. The MD3 algorithm is a distribution independent, application independent, model independent, unsupervised and incremental algorithm for reliably detecting drifts from data streams. Experimental evaluation on 6 drift induced datasets and 4 additional datasets from the cybersecurity domain demonstrates that the MD3 approach can reliably detect drifts, with significantly fewer false alarms compared to unsupervised feature based drift detectors. At the same time, it produces performance comparable to that of a fully labeled drift detector. The reduced false alarms enables the signaling of drifts only when they are most likely to affect classification performance. As such, the MD3 approach leads to a detection scheme which is credible, label efficient and general in its applicability.
No abstract
Deep learning has demonstrated remarkable accuracy analyzing images for cancer detection tasks in recent years. The accuracy that has been achieved rivals radiologists and is suitable for implementation as a clinical tool. However, a significant problem is that these models are black-box algorithms therefore they are intrinsically unexplainable. This creates a barrier for clinical implementation due to lack of trust and transparency that is a characteristic of black box algorithms. Additionally, recent regulations prevent the implementation of unexplainable models in clinical settings which further demonstrates a need for explainability. To mitigate these concerns, there have been recent studies that attempt to overcome these issues by modifying deep learning architectures or providing after-the-fact explanations. A review of the deep learning explanation literature focused on cancer detection using MR images is presented here. The gap between what clinicians deem explainable and what current methods provide is discussed and future suggestions to close this gap are provided.
Landmarks are salient objects in an environment. They play an important role in navigation by serving as orientation aids and marking decision points. Recently, there have been several efforts to design methods to automatically designate certain buildings with salient features as landmarks. All of these methodologies consist of similar steps: (a) establishing a neighborhood, usually around an intersection, (b) performing statistical or data mining analysis to find the building with outlier characteristics, and (c) establishing this salient building as the local landmark. Although these advances are significant, we believe that there are still several key issues that need to be fully addressed in order to realize the new generation of Automatic Landmark Detection Systems (ALDSs). Currently, the main shortcomings in the domain of ALDSs is the lack of a thorough and systematic study of attributes of objects that are analyzed to select landmarks, and deficient experimental verification of the benefits of ALDSs to the end users. Unless, these shortcomings are thoroughly addressed, the viability, applicability, and usefulness of ALDSs are uncertain. On the other hand, automatic landmark detection has the potential to be a dynamic, fascinating, and interdisciplinary research topic with wide applicability. Therefore, the goal of this paper is to discuss the current shortcomings in the domain of landmark detection, propose some preliminary solutions, and provide general guidelines for implementation of the new generation of ALDSs. Specifically, we discuss and promote the importance of: (a) widening the types of attributes analyzed in the landmark detection process, (b) weighting each attribute relative to its significance, (c) extending the types of objects considered as landmark candidates beyond just buildings, (d) identifying landmarks outside the vicinity of intersections, (e) identifying false landmarks along routes, and (f) using virtual environments for experiments with ALDSs. Throughout the paper, we discuss several demonstrative examples and experiments to clarify and support the ideas and concepts that are being promoted.
Many real‐world data mining applications have to deal with unlabeled streaming data. They are unlabeled because the sheer volume of the stream makes it impractical to label a significant portion of the data. The data streams can evolve over time and these changes are called concept drifts. Concept drifts have different characteristics, which can be used to categorize them into different types. A trade‐off between performance and cost exists among many concept drift detection approaches. On the one hand, high accuracy detection approach usually requires labeled data, possibly involving high cost for labeling. On the other hand, a variety of methods have been devoted to the topic of concept drift detection with unlabeled data, but these approaches often are most suited for only a subset of the concept drift types. The objective of this survey is to present these methods, categorize them and give recommendations of usage based on their behaviors under different types of concept drift. This article is categorized under: Fundamental Concepts of Data and Knowledge > Data Concepts Fundamental Concepts of Data and Knowledge > Key Design Issues in Data Mining Explainable AI > Classification
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.