ODESSA at Albayzin Speaker Diarization Challenge 2018

Patino, José; Delgado, Héctor; Yin, Ruiqing; Bredin, Hervé; Barras, Claude; Evans, Nicholas

doi:10.21437/iberspeech.2018-43

Cited by 2 publications

(4 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As described in details in [16], ODESSA "speaker" primary run is the combination at similarity level of three different representations (x-vector trained on NIST SRE data, triplet loss embedding trained on VoxCeleb and binary key). This complex system reaches a performance of DER = 7.21% which is still below the simpler multimodal PLUMCOT primary run (that combines triplet loss speaker embedding and neural face embedding) with DER = 6.86%.…”

Section: Resultsmentioning

confidence: 99%

“…Therefore, ODESSA submissions to the "speaker" part of the multimodal diarization challenge rely on the same systems used for its open-set submissions to the speaker diarization challenge: the fusion at similarity-level of various speech turn representation (such as neural embeddings and binary keys). More information can be found in [16]). All three ODESSA submissions use the same "face" part as PLUMCOT primary submission.…”

Section: Submissionsmentioning

confidence: 99%

See 1 more Smart Citation

ODESSA/PLUMCOT at Albayzin Multimodal Diarization Challenge 2018

Maurice¹,

Bredin

Yin

et al. 2018

IberSPEECH 2018

Self Cite

View full text Add to dashboard Cite

This paper describes ODESSA and PLUMCOT submissions to Albayzin Multimodal Diarization Challenge 2018. Given a list of people to recognize (alongside image and short video samples of those people), the task consists in jointly answering the two questions "who speaks when?" and "who appears when?". Both consortia submitted 3 runs (1 primary and 2 contrastive) based on the same underlying monomodal neural technologies: neural speaker segmentation, neural speaker embeddings, neural face embeddings, and neural talking-face detection. Our submissions aim at showing that face clustering and recognition can (hopefully) help to improve speaker diarization.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Submissionsmentioning

confidence: 99%

ODESSA/PLUMCOT at Albayzin Multimodal Diarization Challenge 2018

Maurice¹,

Bredin

Yin

et al. 2018

IberSPEECH 2018

Self Cite

View full text Add to dashboard Cite

show abstract

“…This discriminator was also trained with external data. • G11-ODESSA [25]. EURECOM, LIMSI, CNRS, France.…”

Section: Open-set Condition Systemsmentioning

confidence: 99%

“…The primary system performed unsupervised PLDA adaptation, while the contrastive one did not. • G11-ODESSA [25]. EURECOM, LIMSI, CNRS, France.…”

Section: Closed-set Condition Systemsmentioning

confidence: 99%

Albayzin 2018 Evaluation: The IberSpeech-RTVE Challenge on Speech Technologies for Spanish Broadcast Media

et al. 2019

View full text Add to dashboard Cite

The IberSpeech-RTVE Challenge presented at IberSpeech 2018 is a new Albayzin evaluation series supported by the Spanish Thematic Network on Speech Technologies (Red Temática en Tecnologías del Habla (RTTH)). That series was focused on speech-to-text transcription, speaker diarization, and multimodal diarization of television programs. For this purpose, the Corporacion Radio Television Española (RTVE), the main public service broadcaster in Spain, and the RTVE Chair at the University of Zaragoza made more than 500 h of broadcast content and subtitles available for scientists. The dataset included about 20 programs of different kinds and topics produced and broadcast by RTVE between 2015 and 2018. The programs presented different challenges from the point of view of speech technologies such as: the diversity of Spanish accents, overlapping speech, spontaneous speech, acoustic variability, background noise, or specific vocabulary. This paper describes the database and the evaluation process and summarizes the results obtained. in Spanish [8][9][10][11], and more recently, the Multi-Genre Broadcast (MGB) Challenge with data in English and Arabic 2 [12][13][14]. In other areas apart from broadcast speech, several evaluation campaigns have been proposed such as the ones organized in the scope of the Zero Resource Speech Challenge [15,16], the TC-STAR evaluation on recordings of the European Parliament's sessions in English and Spanish [5], or the MediaEval evaluation of multimodal search and hyperlinking [17].As a way to measure the performance of different techniques and approaches, in this 2018 edition, the IberSpeech-RTVE Challenge Evaluation campaign was proposed in three different conditions: speech-to-text transcription (STT), speaker diarization (SD), and multimodal diarization (MD). Twenty-two teams registered to the challenge, and eighteen submitted systems in at least one of the three proposed tasks. In this paper, we describe the challenge and the data provided by the organization to the participants. We also provide a description of the systems presented to the evaluation, their results, and a set of conclusions that can be drawn from this evaluation campaign.This paper is organized as follows. In Section 2, the RTVE2018 database is presented. Section 3 describes the three evaluation tasks, speech-to-text transcription, speaker diarization, and multimodal diarization. Section 4 provides a brief description of the main features of the submitted systems. Section 5 presents results, and Section 6 gives conclusions.

show abstract

ODESSA at Albayzin Speaker Diarization Challenge 2018

Cited by 2 publications

References 18 publications

ODESSA/PLUMCOT at Albayzin Multimodal Diarization Challenge 2018

ODESSA/PLUMCOT at Albayzin Multimodal Diarization Challenge 2018

Albayzin 2018 Evaluation: The IberSpeech-RTVE Challenge on Speech Technologies for Spanish Broadcast Media

Contact Info

Product

Resources

About