ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414741
|View full text |Cite
|
Sign up to set email alerts
|

Lattice-Free Mmi Adaptation of Self-Supervised Pretrained Acoustic Models

Abstract: In this work, we propose lattice-free MMI (LFMMI) for supervised adaptation of self-supervised pretrained acoustic model. We pretrain a Transformer model on thousand hours of untranscribed Librispeech data followed by supervised adaptation with LFMMI on three different datasets. Our results show that fine-tuning with LFMMI, we consistently obtain relative WER improvements of 10% and 35.3% on the clean and other test sets of Librispeech (100h), 10.8% on Switchboard (300h), and 4.3% on Swahili (38h) and 4.4% on … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 16 publications
0
5
0
Order By: Relevance
“…Second, E2E systems model AM and LM jointly, and they are mostly trained with connectionist temporal classification (CTC) loss [34] (enabling alignment-free training). In [35], it is compared CTC and LF-MMI adaptation of pre-trained models. Recently, attention-based (e.g., Transformers) have become the de facto choice for AM [4,10,36].…”
Section: Related Workmentioning
confidence: 99%
“…Second, E2E systems model AM and LM jointly, and they are mostly trained with connectionist temporal classification (CTC) loss [34] (enabling alignment-free training). In [35], it is compared CTC and LF-MMI adaptation of pre-trained models. Recently, attention-based (e.g., Transformers) have become the de facto choice for AM [4,10,36].…”
Section: Related Workmentioning
confidence: 99%
“…One such model is the XLSR [41], which can then be fine-tuned to ATC data. The authors of [42] proposed to use the LF-MMI criterion (similar to hybrid-based ASR) for the supervised adaptation of the self-supervised pretrained XLSR model [41]. We employed this technique to fine-tune the pre-trained model on our annotated ATC data.…”
Section: Automatic Speech Recognitionmentioning
confidence: 99%
“…Second, E2E systems model AM and LM jointly, and they are mostly trained with connectionist temporal classification (CTC) loss [25] (enabling alignment-free training). [26] compares CTC and LF-MMI adaption of pre-trained models. Recently, attention-based (e.g., Transformers) have become de facto choice for AM [4,27,9].…”
Section: Related Workmentioning
confidence: 99%