ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8682255
|View full text |Cite
|
Sign up to set email alerts
|

Designing an Effective Metric Learning Pipeline for Speaker Diarization

Abstract: State-of-the-art speaker diarization systems utilize knowledge from external data, in the form of a pre-trained distance metric, to effectively determine relative speaker identities to unseen data. However, much of recent focus has been on choosing the appropriate feature extractor, ranging from pre-trained i−vectors to representations learned via different sequence modeling architectures (e.g. 1D-CNNs, LSTMs, attention models), while adopting off-the-shelf metric learning solutions. In this paper, we argue th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
3
1

Relationship

2
7

Authors

Journals

citations
Cited by 19 publications
(13 citation statements)
references
References 16 publications
(33 reference statements)
0
13
0
Order By: Relevance
“…Joint modeling methods have been studied in an effort to alleviate the complex preparation process and take into account the dependencies between these models. They include, for example, joint modeling of x-vector extraction and PLDA scoring [16,31] and joint modeling of SAD and speaker embedding [32]. However, the clustering process has remained unchanged because it is an unsupervised process.…”
Section: Clustering-based Methodsmentioning
confidence: 99%
“…Joint modeling methods have been studied in an effort to alleviate the complex preparation process and take into account the dependencies between these models. They include, for example, joint modeling of x-vector extraction and PLDA scoring [16,31] and joint modeling of SAD and speaker embedding [32]. However, the clustering process has remained unchanged because it is an unsupervised process.…”
Section: Clustering-based Methodsmentioning
confidence: 99%
“…Another field where deep metric learning has achieved successful results is the processing of audio signals [50]. The authors in [57] exploited Triplet and Quadruple networks for speaker diarization. They utilized different sampling strategies and margin parameter selection to observe their effect on diarization performance.…”
Section: Deep Metric Learning Problemsmentioning
confidence: 99%
“…The Coswara project was one of the first publicly available COVID-19 audio datasets and remains unique in its wide variety of sounds collected. Utilizing classical features such as MFCCs [37,38], spectral centroid and mean square energy to train a random forest classifier for the sound classification task, the authors report a test accuracy of 66%. More recently, Imran et al [2] developed tools that utilize CNNs trained with mel spectrograms for cough detection followed by model ensembling to determine whether or not the sample belonged to a COVID-19 patient.…”
Section: Related Workmentioning
confidence: 99%