2018
DOI: 10.48550/arxiv.1808.01535
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Triplet Network with Attention for Speaker Diarization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 14 publications
0
5
0
Order By: Relevance
“…To further improve the model accuracy, this study uses triplet attention to improve the CSPDarknet53 feature extraction network in YOLOv4. The triplet attention module ( Song et al, 2018 ) is an inexpensive and effective attention mechanism with few parameters and does not involve dimensionality reduction. It is an additional neural network, as shown in Figure 1 .…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…To further improve the model accuracy, this study uses triplet attention to improve the CSPDarknet53 feature extraction network in YOLOv4. The triplet attention module ( Song et al, 2018 ) is an inexpensive and effective attention mechanism with few parameters and does not involve dimensionality reduction. It is an additional neural network, as shown in Figure 1 .…”
Section: Methodsmentioning
confidence: 99%
“…Therefore, the YOLO algorithm can predict the category and location of multiple objects in real-time at one time. Unlike traditional object detection algorithms, which select the sliding window method and the Faster R-CNN algorithm (Song et al, 2018).…”
Section: Methodology Principle Of Yolomentioning
confidence: 99%
See 1 more Smart Citation
“…In particular, Huang J. et al [24], Ren et al [25], Kumar et al [26], and Harvill et al [27] use triplet loss with varied neural network architectures for the task of the speech emotion recognition. Bredin [28] and Song et al [29] use triplet-loss based learning approaches for the speaker diarization, and Zhang and Koshida [30] and Li et al [31] for the related task of speaker verification.…”
Section: B Previous Work On the Use Of Triplet Loss For The Metric Em...mentioning
confidence: 99%
“…For instance, siamese networks (Bromley, Guyon, LeCun, Säckinger and Shah, 1994) and triplet networks (Hoffer and Ailon, 2015) are neural networks suitable for direct representation learning by minimizing the contrastive loss or triplet loss calculated in the latent embedding space. These techniques have shown promising results in face verification and identification (Schroff, Kalenichenko and Philbin, 2015) as well as in speech tasks such as speaker diarization and verification (Jati and Georgiou, 2019;Song, Willi, Thiagarajan, Berisha and Spanias, 2018).…”
Section: Related Work and Motivationmentioning
confidence: 99%