Multistream speaker diarization of meetings recordings beyond MFCC and TDOA features

Vijayasenan, Deepu; Valente, Fabio; Bourlard, Hervé

doi:10.1016/j.specom.2011.07.001

Cited by 16 publications

(14 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is encouraging, considering that the proposed method has estimate the additional parameters needed for diarisation. In terms of diarisation, the proposed method higher accuracy than [8] on Mix-8 (91.7% versus 74.9%) and on Mix-DC (97.3% versus 77.3%). This justifies the joint modeling of the source activity detection and the source signal recovery.…”

Section: Discussionmentioning

confidence: 89%

“…Diarisation is assessed with accuracy (Acc) defined as the percentage of frames for which a source was correctly identified (as either active if actually active, or inactive if actually inactive). As baseline, we used [5] for source separation and [8] for speaker diarisation. Both baselines were provided with the correct number of sources.…”

Section: Discussionmentioning

confidence: 99%

“…Both baselines were provided with the correct number of sources. Because [8] is designed for non-overlapping audio streams, we considered each of the 2 J source combinations as a virtual speaker, and we translated the result of the clustering over virtual speakers into clustering of individual sources (we tested all possible associations and reported the one giving the highest accuracy). Afterwards, we use a median filter on the estimated label of each source to remove spikes.…”

Section: Discussionmentioning

confidence: 99%

“…Besides, state-of-theart methods on diarisation, e.g. [8] consists of a pipeline that starts by extracting features from the audio mixture, e.g. Mel frequency cepstral coefficients, and proceeds with speech/non-speech segmentation of the audio stream, and clustering of the speech segments into associated speakers.…”

Section: Introductionmentioning

confidence: 99%

“…An EM algorithm is designed for model parameter estimation. We compare the performance of the proposed EM with [5] in terms of separation, and with [8] in terms of diarisation.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

An EM algorithm for joint source separation and diarisation of multichannel convolutive speech mixtures

Kounades-Bastian

Girin

Alameda-Pineda

et al. 2017

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

We present a probabilistic model for joint source separation and diarisation of multichannel convolutive speech mixtures. We build upon the framework of local Gaussian model (LGM) with non-negative matrix factorization (NMF). The diarisation is introduced as a temporal labeling of each source in the mix as active or inactive at the short-term frame level. We devise an EM algorithm in which the source separation process is aided by the diarisation state, since the latter indicates the sources actually present in the mixture. The diarisation state is tracked with a Hidden Markov Model (HMM) with emission probabilities calculated from the estimated source signals. The proposed EM has separation performance comparable with a state-of-the-art LGM NMF method, while outperforming a state-of-the-art speaker diarisation pipeline.

show abstract

Section: Discussionmentioning

confidence: 89%

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

“…An EM algorithm is designed for model parameter estimation. We compare the performance of the proposed EM with [5] in terms of separation, and with [8] in terms of diarisation.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations