2021
DOI: 10.48550/arxiv.2104.14297
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

End-to-End Speech Recognition from Federated Acoustic Models

Abstract: Training Automatic Speech Recognition (ASR) models under federated learning (FL) settings has recently attracted considerable attention. However, the FL scenarios often presented in the literature are artificial and fail to capture the complexity of real FL systems. In this paper, we construct a challenging and realistic ASR federated experimental setup consisting of clients with heterogeneous data distributions using the French Common Voice dataset, a large heterogeneous dataset containing over 10k speakers. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
7
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 26 publications
(45 reference statements)
0
7
0
Order By: Relevance
“…Recently, performing on-device federated training of acoustic models has attracted considerable attention [7,9,12,23,44]. In [23], FL was employed for a keyword spotting task and the development of a wake-word detection system, whereas, [7,9] investigated the efect of non-i.i.d. distributions on the same task.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, performing on-device federated training of acoustic models has attracted considerable attention [7,9,12,23,44]. In [23], FL was employed for a keyword spotting task and the development of a wake-word detection system, whereas, [7,9] investigated the efect of non-i.i.d. distributions on the same task.…”
Section: Related Workmentioning
confidence: 99%
“…distributions on the same task. In [7], a highly skewed data distribution scenario was considered, where a large set of speakers used their devices to record a set of sentences. To address the challenges introduced due to the non-i.i.d.…”
Section: Related Workmentioning
confidence: 99%
“…Recently, performing on-device federated training of acoustic models has attracted considerable attention [7,9,12,23,42]. In [23], FL was employed for a keyword spotting task and the development of a wake-word detection system, whereas, [7,9] investigated the effect of non-i.i.d. distributions on the same task.…”
Section: Related Workmentioning
confidence: 99%
“…FL-based adaptation for ASR models faces several unique challenges including the lack of ground truth transcriptions, high compute and cross-device network communication costs, the non independent and identical distribution of data (non-IIDness), and the difficulty of providing privacy guarantees. Several recent works have considered cross-device FL for ASR applications [14,15,16,17,18,19]. In particular, the challenge of training on non-IID data has been addressed using weighted model averaging [14,15] and federated variation noise [17].…”
Section: Introductionmentioning
confidence: 99%