Explainability and interpretability are two critical aspects of decision support systems. Within computer vision, they are critical in certain tasks related to human behavior analysis such as in health care applications. Despite their importance, it is only recently that researchers are starting to explore these aspects. This paper provides an introduction to explainability and interpretability in the context of computer vision with an emphasis on looking at people tasks. Specifically, we review and study those mechanisms in the context of first impressions analysis. To the best of our knowledge, this is the first effort in this direction. Additionally, we describe a challenge we organized on explainability in first impressions analysis from video. We analyze in detail the newly introduced data set, evaluation protocol, proposed solutions and summarize the results of the challenge. Finally, derived from our study, we outline research opportunities that we foresee will be decisive in the near future for the development of the explainable computer vision field.Keywords Explainable computer vision · First impressions · Personality analysis · Multimodal information · Algorithmic accountability 1 IntroductionLooking at People (LaP) -the field of research focused on the visual analysis of human behavior -has been a very active research field within computer vision in the last decade [28,29,62]. Initially, LaP focused on tasks associated with basic human behaviors that were obviously visual (e.g., basic gesture recognition [71,70] or face recognition in restricted scenarios [10,83]). Research progress in LaP has now led to models that can solve those initial tasks relatively easily [66,82]. Instead, attention on human behavior analysis has now turned to problems that are not visually evident to model / recognize [84,48,72]. For instance, consider the task of assessing personality traits from visual information [72]. Although there are methods that can estimate apparent personality traits with (relatively) acceptable performance, model recommendations by themselves are useless if the end user is not confident on the model's reasoning, as the primary use for such estimation is to understand bias in human assessors.Explainability and interpretability are thus critical features of decision support systems in some LaP tasks [26]. The former focuses on mechanisms that can tell what is the rationale behind the decision or recommendation made by
In a rapidly digitizing world, machine learning algorithms are increasingly employed in scenarios that directly impact humans. This also is seen in job candidate screening. Data-driven candidate assessment is gaining interest, due to high scalability and more systematic assessment mechanisms. However, it will only be truly accepted and trusted if explainability and transparency can be guaranteed. The current chapter emerged from ongoing discussions between psychologists and computer scientists with machine learning interests, and discusses the job candidate screening problem from an interdisciplinary viewpoint. After introducing the general problem, we present a tutorial on common important methodological focus points in psychological and machine learning research. Following this, we both contrast and combine psychological and machine learning approaches, and present a use case example of a data-driven job candidate assessment system, intended to be explainable towards non-technical hiring specialists. In connection to this, we also give an overview of more traditional job candidate assessment approaches, and discuss considerations for optimizing the acceptability of technology-supported hiring solutions by relevant stakeholders. Finally, we present several recommendations on how interdisciplinary collaboration on the topic may be fostered.
Music is a widely enjoyed content type, existing in many multifaceted representations. With the digital information age, a lot of digitized music information has theoretically become available at the user's fingertips. However, the abundance of information is too large-scaled and too diverse to annotate, oversee and present in a consistent and human manner, motivating the development of automated Music Information Retrieval (Music-IR) techniques.In this paper, we encourage to consider music content beyond a monomodal audio signal and argue that Music-IR approaches with multimodal and user-centered strategies are necessary to serve reallife usage patterns and maintain and improve accessibility of digital music data. After discussing relevant existing work in these directions, we show that the field of Music-IR faces similar challenges as neighboring fields, and thus suggest opportunities for joint collaboration and mutual inspiration.
Inspired by the success of deploying deep learning in the fields of Computer Vision and Natural Language Processing, this learning paradigm has also found its way into the field of Music Information Retrieval. In order to benefit from deep learning in an effective, but also efficient manner, deep transfer learning has become a common approach. In this approach, it is possible to reuse the output of a pre-trained neural network as the basis for a new learning task. The underlying hypothesis is that if the initial and new learning tasks show commonalities and are applied to the same type of input data (e.g. music audio), the generated deep representation of the data is also informative for the new task. Since, however, most of the networks used to generate deep representations are trained using a single initial learning source, their representation is unlikely to be informative for all possible future tasks. In this paper, we present the results of our investigation of what are the most important factors to generate deep representations for the data and learning tasks in the music domain. We conducted this investigation via an extensive empirical study that involves multiple learning sources, as well as multiple deep learning architectures with varying levels of information sharing between sources, in order to learn music representations. We then validate these representations considering multiple target datasets for evaluation. The results of our experiments yield several insights on how to approach the design of methods for learning widely deployable deep data representations in the music domain.
This paper presents the MusiClef data set, a multimodal data set of professionally annotated music. It includes editorial metadata about songs, albums, and artists, as well as MusicBrainz identifiers to facilitate linking to other data sets. In addition, several state-of-the-art audio features are provided. Different sets of annotations and music context data -collaboratively generated user tags, web pages about artists and albums, and the annotation labels provided by music experts -are included too. Versions of this data set were used in the MusiClef evaluation campaigns in 2011 and 2012 for auto-tagging tasks. We report on the motivation for the data set, on its composition, on related sets, and on the evaluation campaigns in which versions of the set were already used. These campaigns likewise represent one use case, i.e. music autotagging, of the data set. The complete data set is publicly available for download at
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.