IberSPEECH 2018 2018
DOI: 10.21437/iberspeech.2018-43
|View full text |Cite
|
Sign up to set email alerts
|

ODESSA at Albayzin Speaker Diarization Challenge 2018

Abstract: This paper describes the ODESSA submissions to the Albayzin Speaker Diarization Challenge 2018. The challenge addresses the diarization of TV shows. This work explores three different techniques to represent speech segments, namely binary key, x-vector and triplet-loss based embeddings. While training-free methods such as the binary key technique can be applied easily to a scenario where training data is limited, the training of robust neural-embedding extractors is considerably more challenging. However, when… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2018
2018
2019
2019

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 18 publications
0
4
0
Order By: Relevance
“…As described in details in [16], ODESSA "speaker" primary run is the combination at similarity level of three different representations (x-vector trained on NIST SRE data, triplet loss embedding trained on VoxCeleb and binary key). This complex system reaches a performance of DER = 7.21% which is still below the simpler multimodal PLUMCOT primary run (that combines triplet loss speaker embedding and neural face embedding) with DER = 6.86%.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…As described in details in [16], ODESSA "speaker" primary run is the combination at similarity level of three different representations (x-vector trained on NIST SRE data, triplet loss embedding trained on VoxCeleb and binary key). This complex system reaches a performance of DER = 7.21% which is still below the simpler multimodal PLUMCOT primary run (that combines triplet loss speaker embedding and neural face embedding) with DER = 6.86%.…”
Section: Resultsmentioning
confidence: 99%
“…Therefore, ODESSA submissions to the "speaker" part of the multimodal diarization challenge rely on the same systems used for its open-set submissions to the speaker diarization challenge: the fusion at similarity-level of various speech turn representation (such as neural embeddings and binary keys). More information can be found in [16]). All three ODESSA submissions use the same "face" part as PLUMCOT primary submission.…”
Section: Submissionsmentioning
confidence: 99%
“…This discriminator was also trained with external data. • G11-ODESSA [25]. EURECOM, LIMSI, CNRS, France.…”
Section: Open-set Condition Systemsmentioning
confidence: 99%
“…The primary system performed unsupervised PLDA adaptation, while the contrastive one did not. • G11-ODESSA [25]. EURECOM, LIMSI, CNRS, France.…”
Section: Closed-set Condition Systemsmentioning
confidence: 99%