Interspeech 2023 2023
DOI: 10.21437/interspeech.2023-1228
|View full text |Cite
|
Sign up to set email alerts
|

Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor

Abstract: Deep neural network-based systems have significantly improved the performance of speaker diarization tasks. However, end-to-end neural diarization (EEND) systems often struggle to generalize to scenarios with an unseen number of speakers, while target speaker voice activity detection (TS-VAD) systems tend to be overly complex. In this paper, we propose a simple attention-based encoder-decoder network for end-toend neural diarization (AED-EEND). In our training process, we introduce a teacher-forcing strategy t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
references
References 36 publications
(75 reference statements)
0
0
0
Order By: Relevance