Cataloged from PDF version of article.In this study, we propose a method for finding people in large news photograph and video collections.\ud
Our method exploits the multi-modal nature of these data sets to recognize people and does not require\ud
any supervisory input. It first uses the name of the person to populate an initial set of candidate faces.\ud
From this set, which is likely to include the faces of other people, it selects the group of most similar\ud
faces corresponding to the queried person in a variety of conditions. Our main contribution is to\ud
transform the problem of recognizing the faces of the queried person in a set of candidate faces to the\ud
problem of finding the highly connected sub-graph (the densest component) in a graph representing\ud
the similarities of faces. We also propose a novel technique for finding the similarities of faces by\ud
matching interest points extracted from the faces. The proposed method further allows the\ud
classification of new faces without needing to re-build the graph. The experiments are performed on\ud
two data sets: thousands of news photographs from Yahoo! news and over 200 news videos from\ud
TRECVid2004. The results show that the proposed method provides significant improvements over textbased\ud
methods.\ud
(C) 2009 Elsevier Ltd. All rights reserve
In many computational linguistic scenarios, training labels are subjectives making it necessary to acquire the opinions of multiple annotators/experts, which is referred to as "wisdom of crowds". In this paper, we propose a new approach for modeling wisdom of crowds based on the Latent Mixture of Discriminative Experts (LMDE) model that can automatically learn the prototypical patterns and hidden dynamic among different experts. Experiments show improvement over state-of-the-art approaches on the task of listener backchannel prediction in dyadic conversations.
Human emotion is an important part of human-human communication, since the emotional state of an individual often affects the way that he/she reacts to others. In this paper, we present a method based on concatenated Hidden Markov Model (co-HMM) to infer the dimensional and continuous emotion labels from audio-visual cues. Our method is based on the assumption that continuous emotion levels can be modeled by a set of discrete values. Based on this, we represent each emotional dimension by step-wise label classes, and learn the intrinsic and extrinsic dynamics using our co-HMM model. We evaluate our approach on the Audio-Visual Emotion Challenge (AVEC 2012) dataset. Our results show considerable improvement over the baseline regression model presented with the AVEC 2012.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.