The AMI System for the Transcription of Speech in Meetings

2007 IEEE Workshop on Automatic Speech Recognition &Amp; Understanding (ASRU)

Hain

Bourlard

2007

Self Cite

110

The AMI and AMIDA projects are concerned with the recognition and interpretation of multiparty meetings. Within these projects we have: developed an infrastructure for recording meetings using multiple microphones and cameras; released a 100 hour annotated corpus of meetings; developed techniques for the recognition and interpretation of meetings based primarily on speech recognition and computer vision; and developed an evaluation framework at both component and system levels. In this paper we present an overview of these projects, with an emphasis on speech recognition and content extraction.

Section: Meeting Speech Recognitionmentioning

confidence: 99%

Recognition and understanding of meetings the AMI and AMIDA projects

2007 IEEE Workshop on Automatic Speech Recognition &Amp; Understanding (ASRU)

Hain

Bourlard

2007

Self Cite

110

“…Acoustic segmentation and speech/non-speech detection remains an important problem, with nearly 10% of errors in our current system resulting from errors in the speech/non-speech detection component. A feature of the systems developed for meeting recognition is the use of multiple recognition passes, cross-adaptation and model combination (Hain et al 2007). In particular successive passes make use of more detailed-and more diverse-acoustic and language models.…”

Section: Long-context Featuresmentioning

confidence: 99%

Automatic analysis of multiparty meetings

2011

Sadhana

This paper is about the recognition and interpretation of multiparty meetings captured as audio, video and other signals. This is a challenging task since the meetings consist of spontaneous and conversational interactions between a number of participants: it is a multimodal, multiparty, multistream problem. We discuss the capture and annotation of the AMI meeting corpus, the development of a meeting speech recognition system, and systems for the automatic segmentation, summarisation and social processing of meetings, together with some example applications based on these systems.

“…Automatic transcriptions of the AMI meeting corpus were obtained using the AMI-ASR system [13]. This LVCSR system is based on decision tree clustered crossword triphone hidden Markov models, and a trigram language model.…”

Section: ) Speech Recognitionmentioning

confidence: 99%

Recognition of Dialogue Acts in Multiparty Meetings Using a Switching DBN

Dielmann

IEEE Trans. Audio Speech Lang. Process.

2008

Abstract-This paper is concerned with the automatic recognition of Dialogue Acts (DAs) in multiparty conversational speech. We present a joint generative model for DA recognition in which segmentation and classification of DAs are carried out in parallel.Our approach to DA recognition is based on a switching dynamic Bayesian network (DBN) architecture. This generative approach models a set of features, related to lexical content and prosody, and incorporates a weighted interpolated factored language model. The switching DBN coordinates the recognition process by integrating the component models. The factored language model, which is estimated from multiple conversational data corpora, is used in conjunction with additional task specific language models. In conjunction with this joint generative model, we have also investigated the use of a discriminative approach, based on conditional random fields, to perform a reclassification of the segmented DAs.We have carried out experiments on the AMI corpus of multimodal meeting recordings, using both manually transcribed speech, and the output of an automatic speech recogniser, and using different configurations of the generative model. Our results indicate that the system performs well both on reference and fully automatic transcriptions. A further significant improvement in recognition accuracy is obtained by the application of the discriminative reranking approach based on conditional random fields.