2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2015
DOI: 10.1109/asru.2015.7404855
|View full text |Cite
|
Sign up to set email alerts
|

The 2015 sheffield system for longitudinal diarisation of broadcast media

Abstract: Speaker diarisation is the task of answering "who spoke when" within a multi-speaker audio recording. Diarisation of broadcast media typically operates on individual television shows, and is a particularly difficult task, due to a high number of speakers and challenging background conditions. Using prior knowledge, such as that from previous shows in a series, can improve performance. Longitudinal diarisation allows to use knowledge from previous audio files to improve performance, but requires finding matchin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
10
0

Year Published

2015
2015
2018
2018

Publication Types

Select...
6
2

Relationship

3
5

Authors

Journals

citations
Cited by 11 publications
(10 citation statements)
references
References 19 publications
0
10
0
Order By: Relevance
“…• SU [30]: clustering using the SHoUT toolkit; posteriors are generated from a Speaker Separation DNN, used as input to a speaker-state HMM.…”
Section: Submitted Systems and Resultsmentioning
confidence: 99%
“…• SU [30]: clustering using the SHoUT toolkit; posteriors are generated from a Speaker Separation DNN, used as input to a speaker-state HMM.…”
Section: Submitted Systems and Resultsmentioning
confidence: 99%
“…DNNs which can classify speakers have been implemented in the speaker recognition field [11] which involves projecting acoustic features into a lower-dimensional feature set. Our own previous work has also implemented speaker separation DNNs [12].…”
Section: Introductionmentioning
confidence: 99%
“…A speaker segmentation stage using autoassociative neural networks (AANN) was proposed in which a windowing method is used where an AANN model is trained for the left half of the window and tested on the right to give a confidence score on how likely each part belongs to the same speaker [17]. Finally, DNNs have been applied to the clustering stage by training speaker separation DNNs and adapting these to specific recordings [16,18].…”
Section: Introductionmentioning
confidence: 99%
“…Artificial neural networks (ANN) have been trained to learn a feature transform [15] and DNNs can be trained to detect speech/nonspeech in an SAD stage where adapting the DNN leads to improved performance [16]. A speaker segmentation stage using autoassociative neural networks (AANN) was proposed in which a windowing method is used where an AANN model is trained for the left half of the window and tested on the right to give a confidence score on how likely each part belongs to the same speaker [17].…”
Section: Introductionmentioning
confidence: 99%