The “FAME” Interactive Space

Metze, Florian; Gieselmann, Petra; Holzapfel, Hartwig; Kluge, Tobias; Rogina, Ivica; Waibel, Alex; Wölfel, Matthias; Crowley, James L.; Reignier, Patrick; Vaufreydaz, Dominique; Bérard, François; Cohen, B.; Coutaz, Joëlle; Rouillard, S.; Arranz, Victoria; Bertrán, Maria Angels Fitó; Rodríguez, Horacio

doi:10.1007/11677482_11

Cited by 8 publications

(9 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our speech detection system exposes performances that make it suitable for our projects and goals such as CHIL [11] or [12]. Actually, we do not want to use it for automatic speech recognition or diarization but for interaction and context modeling.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

A Lightweight Speech Detection System for Perceptive Environments

Vaufreydaz

Emonet

Reignier

2006

Machine Learning for Multimodal Interaction

View full text Add to dashboard Cite

Abstract. In this paper, we address the problem of speech activity detection in multimodal perceptive environments. Such space may contain many different microphones (lapel, distant or table top). Thus, we need a generic speech activity detector in order to cope with different speech conditions (from closetalking to noisy distant speech). Moreover, as the number of microphones in the room can be high, we also need a very light system. The speech activity detector presented in this article works efficiently on dozens of microphones in parallel. We will see that even if its absolute score of the evaluation is not perfect (30% and 40% of error rate respectively on the two tasks), its accuracy is good enough in the context we are using it.

show abstract

Section: Discussionmentioning

confidence: 99%

“…In these experiments, starting energy thresholds of the energy detector were values empirically defined during previous research projects (NESPOLE! [10] and FAME [11]). The evaluation metrics of our system are given in the three following tables.…”

Section: 42mentioning

confidence: 99%

A Lightweight Speech Detection System for Perceptive Environments

Vaufreydaz

Emonet

Reignier

2006

Machine Learning for Multimodal Interaction

View full text Add to dashboard Cite

show abstract

“…A second prototype, the FAME Interactive Space [Metze and al., 2006], provided access to recordings of lectures via a table top interface that accepted voice commands from a user. The M4 European project (MultiModal Meeting Manager, 2002, introduced a framework for the integration of multimodal data streams and for the detection of group actions [McCowan et al, 2003[McCowan et al, , 2005, and proposed solutions for multimodal tracking of the focus of attention of meeting participants, multimodal summarization, and multimodal information retrieval.…”

Section: Research On Multimodal Human Interaction Analysismentioning

confidence: 99%

Multimodal signal processing for meetings: an introduction

Popescu-Belis

Carletta

2012

Multimodal Signal Processing

View full text Add to dashboard Cite

This book is an introduction to multimodal signal processing. In it, we use the goal of building applications that can understand meetings as a way to focus and motivate the processing we describe. Multimodal signal processing takes the outputs of capture devices running at the same time -primarily cameras and microphones, but also electronic whiteboards and pens -and automatically analyses them to make sense of what is happening in the space being recorded. For instance, these analyses might indicate who spoke, what was said, whether there was an active discussion, and who was dominant in it. These analyses require the capture of multimodal data using a range of signals, followed by a low-level automatic annotation of them, gradually layering up annotation until information that relates to user requirements is extracted.Multimodal signal processing can be done in real time, that is, fast enough to build applications that influence the group while they are together, or offline -not always but often at higher quality -for later review of what went on. It can also be done for groups that are all together in one space, typically an instrumented meeting room, or for groups that are in different spaces but use technology such as video-conferencing to communicate. The book thus introduces automatic approaches to capturing, processing and ultimately understanding human interaction in meetings, and describes the state-of-the-art for all technologies involved.Multimodal signal processing raises the possibility of a wide range of applications that help groups improve their interactions and hence their effectiveness between or during meetings. However, developing applications has required improvements in the technological state-of-theart in many arenas.The first comprises core technologies like audio and visual processing and recognition that tell us basic facts such as who was present and what words were said. On top of this information comes processing that begins to make sense of a meeting in human terms. Part of this is simply combining different sources of information into a record of who said what, when, and to whom, but it is often also useful, for instance, to apply models of group dynamics from the behavioral and social sciences in order to reveal how a group interacts, or to abstract and summarize the meeting content overall. Finding ways to integrate the varying analyses required for a particular meeting support application has been a major new challenge.Finally, moving from components that model and analyze multimodal human-to-human communication scenes to real-world applications has required careful user requirements capture,

show abstract

“…Error Rate Demirdjian et al [25] Vision & speech 0% Demirdjian et al [25] Speech 5% Demirdjian et al [25] Vision 8% Morency et al [61] Gesture & dialog context 8% Morency and Darrell [60] Gestures & dialog state 9% Quattoni et al [67] Vision & semantics 9% Wang and Demirdjian [86] Speech & gestures 12% Webb et al [87] Speech & dialog state 17% Metze et al [59] Speech, context, & gesture 17% Morency et al [61] Gesture 22% Saenko et al [72] Vision 34% Eisenstein and Davis [31] Linguistic context 34% Bugmann [13] Speech 40% However, these techniques have not been applied to the same extent in Human…”

Section: Hci Techniquesmentioning

confidence: 99%

“…Metze et al describe an Augmented Table which allows several users at the same time to perform multi-modal, cross-lingual document retrieval of audio-visual documents [59]. The Augmented Table enhances multi-lingual speech recognition with context and a visual gesture recognition system, using tokens.…”

mentioning

confidence: 99%

Human Robot Interaction Through Semantic Integration of Multiple Modalities, Dialog Management, and Contexts

Johnson

Agah

2009

Int J of Soc Robotics

View full text Add to dashboard Cite

The hypothesis for this research is that applying the Human Computer Interaction (HCI) concepts of using multiple modalities, dialog management, context, and semantics to Human Robot Interaction (HRI) will improve the performance of Instruction Based Learning (IBL) compared to only using speech. We tested the hypothesis by simulating a domestic robot that can be taught to clean a house using a multi-modal interface. We used a method of semantically integrating the inputs from multiple modalities and contexts that multiplies a confidence score for each input by a Fusion Weight, sums the products, and then uses the input with the highest product sum. We developed an algorithm for determining the Fusion Weights. We concluded that different modalities, contexts, and modes of dialog management impact human robot interaction; however, which combination is better depends on the importance of the accuracy of learning what is taught versus the succinctness of the dialog between the user and the robot.iii

show abstract

The “FAME” Interactive Space

Cited by 8 publications

References 7 publications

A Lightweight Speech Detection System for Perceptive Environments

A Lightweight Speech Detection System for Perceptive Environments

Multimodal signal processing for meetings: an introduction

Human Robot Interaction Through Semantic Integration of Multiple Modalities, Dialog Management, and Contexts

Contact Info

Product

Resources

About