2021
DOI: 10.48550/arxiv.2110.04694
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multi-Channel End-to-End Neural Diarization with Distributed Microphones

Abstract: Recent progress on end-to-end neural diarization (EEND) has enabled overlap-aware speaker diarization with a single neural network. This paper proposes to enhance EEND by using multi-channel signals from distributed microphones. We replace Transformer encoders in EEND with two types of encoders that process a multichannel input: spatio-temporal and co-attention encoders. Both are independent of the number and geometry of microphones and suitable for distributed microphone settings. We also propose a model adap… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 26 publications
0
2
0
Order By: Relevance
“…The study partner was instructed to place 1 microphone facing themselves, and the other at the participant, however it was not possible to confirm if this was carried out. Future work could use more sophisticated diarization algorithms that incorporate both channels in the model training, thus resulting in an optimised dual-channel speaker diarization system 27,28 .…”
Section: Discussionmentioning
confidence: 99%
“…The study partner was instructed to place 1 microphone facing themselves, and the other at the participant, however it was not possible to confirm if this was carried out. Future work could use more sophisticated diarization algorithms that incorporate both channels in the model training, thus resulting in an optimised dual-channel speaker diarization system 27,28 .…”
Section: Discussionmentioning
confidence: 99%
“…Most works following the EEND principle have focused on improvements on the architecture or modeling. Some by using self-attention layers [4] or conformer layers [5] instead of the original BLSTM layers for feature encoding; others have focused on more complex diarization scenarios such as its online fashion [6,7] or when more than one microphone is available [8] or by improving the model iteratively using pseudo-labels [9]. Some have used EEND together with more standard approaches by using EEND-inspired models to find overlaps among pairs of speakers in the output of a cascaded system [10] or leveraging EEND's VAD performance by us-ing an external VAD system [11] or combining short duration diarization outputs to produce better whole-utterance diarization [12,13,14,15].…”
Section: Introductionmentioning
confidence: 99%