Transcribing Meetings With the AMIDA Systems

Hain, Thomas; Burget, Lukáš; Dines, John; Garner, Philip N.; Grézl, František; Hannani, Asmaa El; Huijbregts, Marijn; Karafiát, Martin; Lincoln, Mike; Wan, Vincent

doi:10.1109/tasl.2011.2163395

Cited by 112 publications

(91 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This paper investigates the use of DNN-based acoustic modeling for distant speech recognition in the context of a meeting recognition task using AMI corpus [9]. The objective is to study how deep architectures can reduce the mismatch between systems trained on clean speech from close-talking microphones (also called individual head microphone (IHM)) and noisy and reverberant speech from single distant microphone (SDM) (i.e., to improve the distant ASR performance by also using IHM data).…”

Section: Introductionmentioning

confidence: 99%

Learning feature mapping using deep neural network bottleneck features for distant large vocabulary speech recognition

Himawan

Motlíček

Potard

et al. 2015

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Automatic speech recognition from distant microphones is a difficult task because recordings are affected by reverberation and background noise. First, the application of the deep neural network (DNN)/hidden Markov model (HMM) hybrid acoustic models for distant speech recognition task using AMI meeting corpus is investigated. This paper then proposes a feature transformation for removing reverberation and background noise artefacts from bottleneck features using DNN trained to learn the mapping between distant-talking speech features and close-talking speech bottleneck features. Experimental results on AMI meeting corpus reveal that the mismatch between close-talking and distant-talking conditions is largely reduced, with about 16% relative improvement over conventional bottleneck system (trained on close-talking speech). If the feature mapping is applied to close-talking speech, a minor degradation of 4% relative is observed.

show abstract

Section: Introductionmentioning

confidence: 99%

Learning feature mapping using deep neural network bottleneck features for distant large vocabulary speech recognition

Himawan

Motlíček

Potard

et al. 2015

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…These advanced techniques take into account the estimated noise or interfering signal characteristics for superior noise suppression capability [43,44]. In the context of ASR, beamforming techniques have been successfully exploited in the ICSI/SRI [45] and AMIDA [46] systems for transcriptions of meetings [47]. Another research efforts have explored unified multichannel-based speech recognition such as LIMABEAM and multi-channel-based neural networks speech recognizer.…”

Section: Multi-channel Integration In Acoustic Modelingmentioning

confidence: 99%

“…In addition to the AMI test set, the trained acoustic models are evaluated on a NIST Rich Transcription (RT-07) ASR evaluation task to determine if the feature mapping approach trained on the AMI corpus improves the ASR performance of unseen condition. The experiments used the suggested AMI corpus partitions for training and evaluation sets [46,47], even though some of the meeting recordings were discarded from the original corpus when array recordings were missing, to ensure that both headset recordings and the corresponding synchronized array recordings are available for training and testing.…”

Section: Experimental Data and Setupmentioning

confidence: 99%

“…The DNNs use a 9-frame temporal context, enriched with cepstral mean only and cepstral mean and variance normalization for fMLLR and nonfMLLR systems, respectively. The AMI pronunciation dictionary, of approximately 23K words, is used in the experiments and the Viterbi decoding is performed using a 2-gram language model (LM) [60], previously built for NIST RT-07 corpora [46]. An additional experiment with a stronger LM (4-gram) is performed with the best system to determine if the gains in acoustic modeling are retained.…”

Section: Experimental Data and Setupmentioning

confidence: 99%

See 1 more Smart Citation

Feature mapping using far-field microphones for distant speech recognition

Himawan

Motlíček

Sridharan

2016

Speech Communication

View full text Add to dashboard Cite

Acoustic modeling based on deep architectures has recently gained remarkable success, with substantial improvement of speech recognition accuracy in several automatic speech recognition (ASR) tasks. For distant speech recognition, the multi-channel deep neural network based approaches rely on the powerful modeling capability of deep neural network (DNN) to learn suitable representation of distant speech directly from its multi-channel source. In this model-based combination of multiple microphones, features from each channel are concatenated and used together as an input to DNN. This allows integrating the multi-channel audio for acoustic modeling without any pre-processing steps. Despite powerful modeling capabilities of DNN, an environmental mismatch due to noise and reverberation may result in severe performance degradation when features are simply fed to a DNN without a feature enhancement step. In this paper, we introduce the nonlinear bottleneck feature mapping approach using DNN, to transform the noisy and reverberant features to its clean version. The bottleneck features trained on clean signal are used as a teacher signal because they contain relevant information to phoneme classification, and the mapping is performed with the objective of suppressing noise and reverberation. The individual and combined impacts of beamforming and speaker adaptation techniques along with the feature mapping are examined for distant large vocabulary speech recognition, using a single and multiple far-field microphones. As an alternative to beamforming, experiments with concatenating multiple channel features are conducted. The experimental results on the AMI meeting corpus show that the feature mapping, used in combination with beamforming and speaker adaptation yields a distant speech recognition performance below 50% word error rate (WER), using DNN for acoustic modeling.

show abstract

“…The simulated ASR noise percentage varied from 10% to 30%, because the best recognition accuracy reaches around 70% in conversational environments [37]. However, noise was never applied to the explicit query itself.…”

mentioning

confidence: 99%

Question answering in conversations: Query refinement using contextual and semantic information

Habibi

Mahdabi

Popescu-Belis

2016

Data & Knowledge Engineering

View full text Add to dashboard Cite

This paper introduces a query refinement method applied to questions asked by users to a system during a meeting or a conversation that they have with other users. To answer the questions, the proposed method leverages the local context of the conversation along with semantic resources, either WordNet or word embeddings from word2vec. The method first represents the local context by extracting keywords from the transcript of the conversation, which is obtained from a real-time Automatic Speech Recognition (ASR) system and may contain noise. It then expands the queries with keywords that best represent the topic of the query, i.e. expansion keywords accompanied by weights indicating their topical similarity to the query. Finally, semantically related terms are added, using two options: either synonymous terms drawn from WordNet or similar words based on distributed representations in a low-dimensional word embedding space learned using word2vec. To evaluate the system, we introduce a dataset (named AREX for AMI Requests for Explanations) and an evaluation metric based on relevance judgments collected by crowdsourcing. We compare our query expansion approach with other methods, over queries from the AREX dataset, showing the superiority of our method when either manual or automatic transcripts of the AMI Meeting Corpus are used.

show abstract

Transcribing Meetings With the AMIDA Systems

Cited by 112 publications

References 24 publications

Learning feature mapping using deep neural network bottleneck features for distant large vocabulary speech recognition

Learning feature mapping using deep neural network bottleneck features for distant large vocabulary speech recognition

Feature mapping using far-field microphones for distant speech recognition

Question answering in conversations: Query refinement using contextual and semantic information

Contact Info

Product

Resources

About