We present two techniques that are shown to yield improved Keyword Spotting (KWS) performance when using the ATWV/MTWV performance measures: (i) score normalization, where the scores of different keywords become commensurate with each other and they more closely correspond to the probability of being correct than raw posteriors; and (ii) system combination, where the detections of multiple systems are merged together, and their scores are interpolated with weights which are optimized using MTWV as the maximization criterion. Both score normalization and system combination approaches show that significant gains in ATWV/MTWV can be obtained, sometimes on the order of 8-10 points (absolute), in five different languages. A variant of these methods resulted in the highest performance for the official surprise language evaluation for the IARPA-funded Babel project in April 2013.
Detecting emotions in the context of automated call center services can be helpful for following the evolution of the human-computer dialogs, enabling dynamic modification of the dialog strategies and influencing the final outcome. The emotion detection work reported here is a part of larger study aiming to model user behavior in real interactions. We make use of a corpus of real agent-client spoken dialogs in which the manifestation of emotion is quite complex, and it is common to have shaded emotions since the interlocutors attempt to control the expression of their internal attitude. Our aims are to define appropriate emotions for call center services, to annotate the dialogs and to validate the presence of emotions via perceptual tests and to find robust cues for emotion detection. In contrast to research carried out with artificial data with simulated emotions, for real-life corpora the set of appropriate emotion labels must be determined. Two studies are reported: the first investigates automatic emotion detection using linguistic information, whereas the second concerns perceptual tests for identifying emotions as well as the prosodic and textual cues which signal them. About 11% of the utterances are annotated with non-neutral emotion labels. Preliminary experiments using lexical cues detect about 70% of these labels.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.