Unsupervised Speech/Non-Speech Detection for Automatic Speech Recognition in Meeting Rooms

Maganti, Hari Krishna; Motlíček, Petr; Gática-Pérez, Daniel

doi:10.1109/icassp.2007.367250

Cited by 21 publications

(12 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Speech/silent intervals are detected using a method based on long-term modulation spectrum energy features (Maganti et al, 2007). Detection of syllable nuclei is performed using the method introduced in De Jong and Wempe (2009), which is based on intensity peak detection of voiced segments of speech.…”

Section: Prosodic Measurementsmentioning

confidence: 99%

Investigating automatic measurements of prosodic accommodation and its dynamics in social interaction

Looze

Scherer

Vaughan

et al. 2014

Speech Communication

View full text Add to dashboard Cite

Spoken dialogue systems are increasingly being used to facilitate and enhance human communication. While these interactive systems can process the linguistic aspects of human communication, they are not yet capable of processing the complex dynamics involved in social interaction, such as the adaptation on the part of interlocutors. Providing interactive systems with the capacity to process and exhibit this accommodation could however improve their efficiency and make machines more socially-competent interactants.At present, no automatic system is available to process prosodic accommodation, nor do any clear measures exist that quantify its dynamic manifestation. While it can be observed to be a monotonically manifest property, it is our hypotheses that it evolves dynamically with functional social aspects.In this paper, we propose an automatic system for its measurement and the capture of its dynamic manifestation. We investigate the evolution of prosodic accommodation in 41 Japanese dyadic telephone conversations and discuss its manifestation in relation to its functions in social interaction. Overall, our study shows that prosodic accommodation changes dynamically over the course of a conversation and across conversations, and that these dynamics inform about the naturalness of the conversation flow, the speakers' degree of involvement and their affinity in the conversation.

show abstract

Section: Prosodic Measurementsmentioning

confidence: 99%

Investigating automatic measurements of prosodic accommodation and its dynamics in social interaction

Looze

Scherer

Vaughan

et al. 2014

Speech Communication

View full text Add to dashboard Cite

show abstract

“…The first level is devoted to distinguishing speech from non-speech sound. This task, known under the name of Automatic Speech Detection [10,11,12], has been extensively studied in the literature, since it is basilar to any system requiring speech enhancement, speech recognition and (as in our case) speech classification.…”

Section: Classifier Architecturementioning

confidence: 99%

“…For example, there exists a vast literature regarding speech discrimination [10,11,12,13], vehicle recognition [14,15,16] and weapon classification [17,18,7]. In addition, due to the maturity of the field there exist several commercial and open-source products that perform these tasks, such as the Halo system 1 and the Sphinx toolkit 2 .…”

Section: Introductionmentioning

confidence: 99%

Microphone array based classification for security monitoring in unstructured environments

Scardapane

Scarpiniti

Bucciarelli

et al. 2015

AEU - International Journal of Electronics and Communications

View full text Add to dashboard Cite

“…The use of the within-class covariance (WCC) matrix to normalize data variances has become widely dispread in the speaker recognition field [41,43]. The need to be normalized for I-vectors which differ from one application to another is due to its representation of a wide range of the speech variability.…”

Section: Wccnmentioning

confidence: 99%

Integration of evolutionary computation algorithms and new AUTO-TLBO technique in the speaker clustering stage for speaker diarization of broadcast news

Karim

Hajji

Cherif

2017

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

The task of speaker diarization is to answer the question "who spoke when?" In this paper, we present different clustering approaches which consist of Evolutionary Computation Algorithms (ECAs) such as Genetic Algorithm (GA), Particle Swarm Optimization (PSO) algorithm, and Differential Evolution (DE) algorithm as well as Teaching-Learning-Based Optimization (TLBO) technique as a new optimization technique at the aim to optimize the number of clusters in the speaker clustering stage which remains a challenging problem. Clustering validity indexes, such as Within-Class Distance (WCD) index, Davies and Bouldin (DB) index, and Contemporary Document (CD) index, is also used in order to make a correction for each possible grouping of speakers' segments. The proposed algorithms are evaluated on News Broadcast database (NDTV), and their performance comparisons are made between each another as well as with some well-known clustering algorithms. Results show the superiority of the new AUTO-TLBO technique in terms of comparative results obtained on NDTV, RT-04F, and ESTER datasets of News Broadcast.

show abstract

Unsupervised Speech/Non-Speech Detection for Automatic Speech Recognition in Meeting Rooms

Cited by 21 publications

References 14 publications

Investigating automatic measurements of prosodic accommodation and its dynamics in social interaction

Investigating automatic measurements of prosodic accommodation and its dynamics in social interaction

Microphone array based classification for security monitoring in unstructured environments

Integration of evolutionary computation algorithms and new AUTO-TLBO technique in the speaker clustering stage for speaker diarization of broadcast news

Contact Info

Product

Resources

About