StrategyAtlas: Strategy Analysis for Machine Learning Interpretability

Collaris, Dennis; Wijk, Jarke van

doi:10.1109/tvcg.2022.3146806

Cited by 9 publications

(9 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In general, the number of instances and features that can be visually expressed with our approach has no intrinsic limit. Collaris and van Wijk [CVW22] found that usually the top 10–20 features were impactful for the tabular data sets they experimented with. For hundreds of features, it would be cognitively demanding for a human to analyse the influence of all these features at different levels of granularity.…”

Section: Discussionmentioning

confidence: 99%

“…The ultimate goal of such a procedure is to identify misclassified instances and interpret why this has happened [CdMP14], as well as improve predictive performance [SMGC14]. This scenario is where visual analytics (VA) approaches are considered as a possible solid solution [WDC*22] with many recent works focusing on problematic subsets of data for the interpretation and performance boost of ML models [CVW22, ZOS*23]. However, the classification problem becomes significantly more complex when the data set contains both class overlap and class imbalance .…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

HardVis: Visual Analytics to Handle Instance Hardness Using Undersampling and Oversampling Techniques

Chatzimparmpas

Paulovich

Kerren

2022

Computer Graphics Forum

View full text Add to dashboard Cite

Despite the tremendous advances in machine learning (ML), training with imbalanced data still poses challenges in many real‐world applications. Among a series of diverse techniques to solve this problem, sampling algorithms are regarded as an efficient solution. However, the problem is more fundamental, with many works emphasizing the importance of instance hardness. This issue refers to the significance of managing unsafe or potentially noisy instances that are more likely to be misclassified and serve as the root cause of poor classification performance. This paper introduces HardVis, a visual analytics system designed to handle instance hardness mainly in imbalanced classification scenarios. Our proposed system assists users in visually comparing different distributions of data types, selecting types of instances based on local characteristics that will later be affected by the active sampling method, and validating which suggestions from undersampling or oversampling techniques are beneficial for the ML model. Additionally, rather than uniformly undersampling/oversampling a specific class, we allow users to find and sample easy and difficult to classify training instances from all classes. Users can explore subsets of data from different perspectives to decide all those parameters, while HardVis keeps track of their steps and evaluates the model's predictive performance in a test set separately. The end result is a well‐balanced data set that boosts the predictive power of the ML model. The efficacy and effectiveness of HardVis are demonstrated with a hypothetical usage scenario and a use case. Finally, we also look at how useful our system is based on feedback we received from ML experts.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

HardVis: Visual Analytics to Handle Instance Hardness Using Undersampling and Oversampling Techniques

Chatzimparmpas

Paulovich

Kerren

2022

Computer Graphics Forum

View full text Add to dashboard Cite

show abstract

“…In general, the number of instances and features that can be visually expressed with our approach has no intrinsic limit. Collaris and van Wĳk [131] found that usually the top 10-20 features were impactful for the tabular data sets they experimented with. For hundreds of features, it would be cognitively demanding for a human to analyze the influence of all these features at different levels of granularity.…”

Section: Scalability For a Large Number Of Instances And Featuresmentioning

confidence: 99%

“…6. Collaris and van Wĳk [131] also limited the number of instances to 5,000 in order to prevent overplotting issues in their projection-based view. Arguably, similar constraints should apply to our tool, especially for the UMAP projection and the inverse polar chart view.…”

Section: Scalability For a Large Number Of Instances And Featuresmentioning

confidence: 99%

“…The ultimate goal of such a procedure is to identify misclassified instances and interpret why this has happened [84], as well as improve predictive performance [596]. This scenario is where visual analytics (VA) approaches are considered as a possible solid solution [701] with many recent works focusing on problematic subsets of data for the interpretation and performance boost of ML models [131,738]. However, as already explained in Chapter 2, the classification problem becomes considerably more difficult when the data set is imbalanced and the minority class contains mostly ambiguous samples.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Visual Analytics for Explainable and Trustworthy Machine Learning

Chatzimparmpas¹

View full text Add to dashboard Cite

The deployment of artificial intelligence solutions and machine learning research has exploded in popularity in recent years, with numerous types of models proposed to interpret and predict patterns and trends in data from diverse disciplines. However, as the complexity of these models grows, it becomes increasingly difficult for users to evaluate and rely on the model results, since their inner workings are mostly hidden in black boxes, which are difficult to trust in critical decision-making scenarios. While automated methods can partly handle these problems, recent research findings suggest that their combination with innovative methods developed within information visualization and visual analytics can lead to further insights gained from models and, consequently, improve their predictive ability and enhance trustworthiness in the entire process. Visual analytics is the area of research that studies the analysis of vast and intricate information spaces by combining statistical and machine learning models with interactive visual interfaces. By following this methodology, human experts can better understand such spaces and apply their domain expertise in the process of building and improving the underlying models. The primary goals of this dissertation are twofold, focusing on (1) methodological aspects, by conducting qualitative and quantitative meta-analyses to support the visualization research community in making sense of its literature and to highlight unsolved challenges, as well as (2) technical solutions, by developing visual analytics approaches for various machine learning models, such as dimensionality reduction and ensemble learning methods. Regarding the first goal, we define, categorize, and examine in depth the means for visual coverage of the different trust levels at each stage of a typical machine learning pipeline and establish a design space for novel visualizations in the area. Regarding the second goal, we discuss multiple visual analytics tools and systems implemented by us to facilitate the underlying research on the various stages of the machine learning pipeline, i.e., data processing, feature engineering, hyperparameter tuning, understanding, debugging, refining, and comparing models. Our approaches are data-agnostic, but mainly target tabular data with meaningful attributes in diverse domains, such as health care and finance. The applicability and effectiveness of this work were validated with case studies, usage scenarios, expert interviews, user studies, and critical discussions of limitations and alternative designs. The results of this dissertation provide new avenues for visual analytics research in explainable and trustworthy machine learning. Keywords: visualization, interaction, visual analytics, explainable machine learning, XAI, trustworthy machine learning, ensemble learning, dimensionality reduction, supervised learning, unsupervised learning, ML, AI, tabular data

show abstract