2002
DOI: 10.1016/s0167-6393(01)00062-0
|View full text |Cite
|
Sign up to set email alerts
|

Large vocabulary continuous speech recognition of Broadcast News – The Philips/RWTH approach

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2002
2002
2020
2020

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 27 publications
(22 citation statements)
references
References 6 publications
0
21
0
Order By: Relevance
“…Such kinds of data are to be expected in most practical applications of automatic speech processing. Most recent research in this field addresses this problem as part of large-vocabulary continuous-speech-recognition systems (LVCSRs), like BN transcription systems (Woodland, 2002;Gauvain et al, 2002;Beyerlein et al, 2002) or speaker-diarisation and speaker-tracking systems in BN data (Zhu et al, 2005;Sinha et al, 2005;Žibert et al, 2005;Istrate et al, 2005;Moraru et al, 2005;Barras et al, 2006;Tranter & Reynolds, 2006). In most of these investigations, energy and/or cepstral coefficients (mainly MFCCs) are used for the segmenting, and GMMs or HMMs are used for classifying the segments into speech and different non-speech classes.…”
Section: Speech Detection In Continuous Audio Streamsmentioning
confidence: 99%
See 1 more Smart Citation
“…Such kinds of data are to be expected in most practical applications of automatic speech processing. Most recent research in this field addresses this problem as part of large-vocabulary continuous-speech-recognition systems (LVCSRs), like BN transcription systems (Woodland, 2002;Gauvain et al, 2002;Beyerlein et al, 2002) or speaker-diarisation and speaker-tracking systems in BN data (Zhu et al, 2005;Sinha et al, 2005;Žibert et al, 2005;Istrate et al, 2005;Moraru et al, 2005;Barras et al, 2006;Tranter & Reynolds, 2006). In most of these investigations, energy and/or cepstral coefficients (mainly MFCCs) are used for the segmenting, and GMMs or HMMs are used for classifying the segments into speech and different non-speech classes.…”
Section: Speech Detection In Continuous Audio Streamsmentioning
confidence: 99%
“…Such a segmentation is usually applied as a pre-processing step in real-world systems for automatic speech processing: in automatic speech recognition (Shafran & Rose, 2003), like a broadcast-news transcription (Gauvain et al, 2002;Woodland, 2002;Beyerlein et al, 2002), in automatic audio indexing and summarization (Makhoul et al, 2000;Magrin-Chagnolleau & Parlangeau-Valles, 2002), in audio and speaker diarisation (Tranter & Reynolds, 2006;Barras et al, 2006;Sinha et al, 2005;Istrate et al, 2005;Moraru et al, 2005), in speaker identification and tracking (Martin et al, 2000), and in all other applications where efficient speech detection helps to greatly reduce the computational complexity and generate more understandable and accurate outputs. Accordingly, an SNS segmentation has to be easily integrated into such systems and should not increase the overall computational load.…”
Section: The Impact Of Speech Detection On Speech-processing Applicatmentioning
confidence: 99%
“…State-of-the-art recognition systems (Beyerlein et al, 2002;Evermann and Woodland, 2003;Kanthak et al, 2002;Mohri et al, 2002) use a statistical approach for speech recognition, based on the Bayes decision rule. The basic structure of such a system is presented in figure 1.…”
Section: Statistical Speech Recognitionmentioning
confidence: 99%
“…An evaluation of an isolated-word recognizer has shown that more than half of the recognition errors are due to inaccurate word boundaries [1]. Apart from ASR, a good segmentation of audio stream has many practical applications such as broadcast news transcription [2], automatic audio indexing and summarization [3], audio and speaker diarization [4]. Accordingly, segmentation has to be easily integrated into the systems concerned, but it should not increase the overall computational load.…”
Section: Introductionmentioning
confidence: 99%
“…Although these signal representations have been originally designed to model the short-term spectral information of speech events, they were also successfully applied in SND systems in combination with Gaussian Mixture Models (GMMs) or Hidden Markov Models (HMMs) for separating different sound sources (broadband speech, telephone speech, music, noise, silence, etc.) [2,6]. In the context of conference rooms, combination of energy features generated directly from the signal, and the acoustic phonetic features derived from observations generated by ASR acoustic models were used as input to the GMM classification framework [8].…”
Section: Introductionmentioning
confidence: 99%