Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07 2007
DOI: 10.1109/icassp.2007.367250
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Speech/Non-Speech Detection for Automatic Speech Recognition in Meeting Rooms

Abstract: Abstract. The goal of this work is to provide robust and accurate speech detection for automatic speech recognition (ASR) in meeting room settings. The solution is based on computing long-term modulation spectrum, and examining specific frequency range for dominant speech components to classify speech and non-speech signals for a given audio signal. Manually segmented speech segments, short-term energy, short-term energy and zero-crossing based segmentation techniques, and a recently proposed Multi Layer Perce… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2010
2010
2018
2018

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 21 publications
(12 citation statements)
references
References 14 publications
0
12
0
Order By: Relevance
“…Speech/silent intervals are detected using a method based on long-term modulation spectrum energy features (Maganti et al, 2007). Detection of syllable nuclei is performed using the method introduced in De Jong and Wempe (2009), which is based on intensity peak detection of voiced segments of speech.…”
Section: Prosodic Measurementsmentioning
confidence: 99%
“…Speech/silent intervals are detected using a method based on long-term modulation spectrum energy features (Maganti et al, 2007). Detection of syllable nuclei is performed using the method introduced in De Jong and Wempe (2009), which is based on intensity peak detection of voiced segments of speech.…”
Section: Prosodic Measurementsmentioning
confidence: 99%
“…The first level is devoted to distinguishing speech from non-speech sound. This task, known under the name of Automatic Speech Detection [10,11,12], has been extensively studied in the literature, since it is basilar to any system requiring speech enhancement, speech recognition and (as in our case) speech classification.…”
Section: Classifier Architecturementioning
confidence: 99%
“…For example, there exists a vast literature regarding speech discrimination [10,11,12,13], vehicle recognition [14,15,16] and weapon classification [17,18,7]. In addition, due to the maturity of the field there exist several commercial and open-source products that perform these tasks, such as the Halo system 1 and the Sphinx toolkit 2 .…”
Section: Introductionmentioning
confidence: 99%
“…The use of the within-class covariance (WCC) matrix to normalize data variances has become widely dispread in the speaker recognition field [41,43]. The need to be normalized for I-vectors which differ from one application to another is due to its representation of a wide range of the speech variability.…”
Section: Wccnmentioning
confidence: 99%