2005 IEEE International Conference on Multimedia and Expo
DOI: 10.1109/icme.2005.1521563
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Speech Activity Detection, Source Localization, and Speech Recognition on the Chil Seminar Corpus

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 23 publications
(14 citation statements)
references
References 11 publications
0
14
0
Order By: Relevance
“…To check if our SAD system is light enough, we compute the real-time factor as expressed in equation (1). …”
Section: Evaluation Metricsmentioning
confidence: 99%
See 3 more Smart Citations
“…To check if our SAD system is light enough, we compute the real-time factor as expressed in equation (1). …”
Section: Evaluation Metricsmentioning
confidence: 99%
“…Speech is indeed one of the preferred and most natural communication channels in human to human interactions, and sounds are revealing of human activity. This is why many perceptual environments, such as in the CHIL project [1], are equipped with speech detection, speech recognition and acoustic localization systems. One requirement in such perceptive environments is to be able to process multiple and various microphones in parallel while fitting real time constraints.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Still according to (Dey, 2001), a "system is context-aware if it uses context to provide relevant information and/or services to the user, where relevancy depends on the user's task". In a context-aware multimodal interaction system, perceived contextual information is often used to complement or disambiguate an active mode of interaction, such as speech (Stillman & Essa, 2001;Macho et al, 2005). For example, (Yoshimi & Pingali, 2002) describe a video conferencing application, which combines carefully placed multiple distributed microphone pairs with calibrated cameras to identify the current speaker and their location, in order to achieve a finer control of the speech recognition process.…”
Section: Context-aware Multimodal Interaction Systemsmentioning
confidence: 99%