2021 IEEE Spoken Language Technology Workshop (SLT) 2021
DOI: 10.1109/slt48900.2021.9383502
|View full text |Cite
|
Sign up to set email alerts
|

Look Who’s Not Talking

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 38 publications
0
5
0
Order By: Relevance
“…Therefore, magnitudes can also be utilized for the out-of-distribution detection in the embedding space. This property can possibly be employed as an additional layer of defense to compensate for failures of a voice activity detector, as suggested by [22].…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Therefore, magnitudes can also be utilized for the out-of-distribution detection in the embedding space. This property can possibly be employed as an additional layer of defense to compensate for failures of a voice activity detector, as suggested by [22].…”
Section: Discussionmentioning
confidence: 99%
“…It is worth mentioning that the magnitudes of speaker embeddings were already successfully applied for the voice activity detection task [22]. However, this work lacks any qualitative analysis of embedding magnitudes properties.…”
Section: Magnitude-aware Embeddingsmentioning
confidence: 99%
“…Therefore, instead of tuning the threshold for each domain data, we adopt clustering with a silhouette coefficient trick. Some studies [10,11,24,25] already composed their clustering-based SD systems using silhouette coefficient, and those systems show superior performance on various datasets without threshold tuning.…”
Section: Initial Clustering Phasementioning
confidence: 99%
“…Speaker diarisation (SD), which segments input audio to short utterances according to speaker identity, is going through a rapid breakthrough [1,2]. Based on the success of recent SD systems [3][4][5][6][7][8][9][10][11][12], online SD systems are also being developed [13][14][15][16][17][18][19][20]. In an online SD system, the system should decide the speaker label of a given short segment leveraging only current and past segments, where only a part of past segments are available.…”
Section: Introductionmentioning
confidence: 99%
“…The former "divides-and-conquers" speaker diarisation into several subtasks. The exact configuration differs from system to system, but in general they consist of speech activity detection (SAD), embedding extraction and clustering [2][3][4]. The latter directly segments audio recordings into homogeneous speaker regions using deep neural networks [5][6][7][8].…”
Section: Introductionmentioning
confidence: 99%