Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-1908
|View full text |Cite
|
Sign up to set email alerts
|

Self-Attentive Similarity Measurement Strategies in Speaker Diarization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 14 publications
(11 citation statements)
references
References 16 publications
0
10
0
Order By: Relevance
“…There are also several different approaches to generate the affinity matrix. In [152], self-attention-based network was introduced to directly generate a similarity matrix from a sequence of speaker embeddings. In [153], several affinity matrices with different temporal resolutions were fused into single affinity matrix based on a neural network.…”
Section: Single-module Optimization 311 Speaker Clustering Enhanced B...mentioning
confidence: 99%
“…There are also several different approaches to generate the affinity matrix. In [152], self-attention-based network was introduced to directly generate a similarity matrix from a sequence of speaker embeddings. In [153], several affinity matrices with different temporal resolutions were fused into single affinity matrix based on a neural network.…”
Section: Single-module Optimization 311 Speaker Clustering Enhanced B...mentioning
confidence: 99%
“…Considering that the CTS data is so different from the remaining non-conversation telephone speech (NCTS) 16kHz audio signal, we build two different systems for CTS data and NCTS data. For NCTS data, we employ the system described in [6]. For CTS data, we first use AHC to determine the homogeneous speaker region.…”
Section: Data Partition and Data Resourcesmentioning
confidence: 99%
“…For NCTS data, we employ an attention-based neural network to measure the similarity between two segments. The network architecture and training process are the same as the attentive vector-to-sequence (Att-v2s) scoring in [6]. The architecture of this transformer-based model consists of a multi-head self-attention module and several linear layers, as Figure shows.…”
Section: Similarity Measurement and Clusteringmentioning
confidence: 99%
See 1 more Smart Citation
“…Popular similarity measurements include cosine similarity [169] and PLDAbases similarity [167,179]. Recently, some deep-learningbased similarity measurements were also introduced, such as the LSTM-based scoring [188], self-attentive similarity measurement strategies [189], and joint training of speaker embedding and PLDA scoring [166]. Common clustering algorithms include k-means [169], agglomerative hierarchical clustering [167], spectral clustering [188,169], Bayesian Hidden Markov Model based clustering [181,182,183] etc.…”
Section: Speaker Clusteringmentioning
confidence: 99%