2022
DOI: 10.31219/osf.io/2pj8s
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ASR-aware end-to-end neural diarization

Abstract: We present a Conformer-based end-to-end neural diarization (EEND) model that uses both acoustic input and features derived from an automatic speech recognition (ASR) model. Two categories of features are explored: features derived directly from ASR output (phones, position-in-word and word boundaries) and features derived from a lexical speaker change detection model, trained by fine-tuning a pretrained BERT model on the ASR output. Three modifications to the Conformer-based EEND architecture are proposed to i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 18 publications
0
1
0
Order By: Relevance
“…A separately trained ASR system can then be used to transcribe each segment found by speaker diarisation, and obtain speaker-attributed ASR output over long audio streams [2,3]. Recently, end-to-end methods have been proposed for jointly modelling some modules in a speaker diarisation pipeline with an ASR system [4][5][6][7][8][9][10][11][12].…”
Section: Introductionmentioning
confidence: 99%
“…A separately trained ASR system can then be used to transcribe each segment found by speaker diarisation, and obtain speaker-attributed ASR output over long audio streams [2,3]. Recently, end-to-end methods have been proposed for jointly modelling some modules in a speaker diarisation pipeline with an ASR system [4][5][6][7][8][9][10][11][12].…”
Section: Introductionmentioning
confidence: 99%