A Federated Approach in Training Acoustic Models

Dimitriadis, Dimitrios; Kumatani, Kenichi; Gmyr, Robert; Gaur, Yashesh; Eskimez, Şefik Emre

doi:10.21437/interspeech.2020-1791

Cited by 38 publications

(41 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [2,7], federated learning was applied to improve a general shared acoustic model with the goal of privacy preservation, but no speaker adaptation was targeted. Federated learning was also experimented in [4] to speed up the training process and improve the shared general acoustic model performance.…”

Section: Related Workmentioning

confidence: 99%

Study on Acoustic Model Personalization in a Context of Collaborative Learning Constrained by Privacy Preservation

Mdhaffar¹,

Tommasi

Estève³

2021

Speech and Computer

View full text Add to dashboard Cite

This paper investigates different approaches in order to improve the performance of a speech recognition system for a given speaker by using no more than 5 minutes of speech from this speaker, and without exchanging data from other users/speakers. Inspired by the federated learning paradigm, we consider speakers that have access to a personalized database of their own speech, learn an acoustic model and collaborate with other speakers in a network to improve their model. Several local personalizations are explored depending on how aggregation mechanisms are performed. We study the impact of selecting, in an adaptive way, a subset of speakers's models based on a notion of similarity. We also investigate the effect of weighted averaging of fine-tuned and global models. In our approach, only neural acoustic model parameters are exchanged and no audio data is exchanged. By avoiding communicating their personal data, the proposed approach tends to preserve the privacy of speakers. Experiments conducted on the TEDLIUM 3 dataset show that the best improvement is given by averaging a subset of different acoustic models fine-tuned on several user datasets. Our approach applied to HMM/TDNN acoustic models improves quickly and significantly the ASR performance in terms of WER (for instance in one of our two evaluation datasets, from 14.84% to 13.45% with less than 5 minutes of speech per speaker)

show abstract

Section: Related Workmentioning

confidence: 99%

Study on Acoustic Model Personalization in a Context of Collaborative Learning Constrained by Privacy Preservation

Mdhaffar¹,

Tommasi

Estève³

2021

Speech and Computer

View full text Add to dashboard Cite

show abstract

“…The process restarts and loops until convergence or after a fixed number of rounds. The utility and training efficiency of the FL AMs have been successfully studied in recent works [1][2][3][4][5][6], and these topics are beyond the scope of the current paper. Alternatively, we focus on the privacy aspect of this framework.…”

Section: Federated Learning For Asr Acoustic Modelsmentioning

confidence: 99%

“…Federated learning (FL) for automatic speech recognition (ASR) has recently become an active area of research [1][2][3][4][5][6]. To preserve the privacy of the users' data in the FL framework, the model is updated in a distributed fashion instead of communicating the data directly from clients to a server.…”

Section: Introductionmentioning

confidence: 99%

Privacy attacks for automatic speech recognition acoustic models in a federated learning framework

Tomashenko,

Mdhaffar,

Tommasi

et al. 2021

Preprint

View full text Add to dashboard Cite

This paper investigates methods to effectively retrieve speaker information from the personalized speaker adapted neural network acoustic models (AMs) in automatic speech recognition (ASR). This problem is especially important in the context of federated learning of ASR acoustic models where a global model is learnt on the server based on the updates received from multiple clients. We propose an approach to analyze information in neural network AMs based on a neural network footprint on the so-called Indicator dataset. Using this method, we develop two attack models that aim to infer speaker identity from the updated personalized models without access to the actual users' speech data. Experiments on the TED-LIUM 3 corpus demonstrate that the proposed approaches are very effective and can provide equal error rate (EER) of 1-2%.

show abstract

“…For the LS task, we used byte-pair encoding (BPE) [38] to create 16,000 subword units. The optimizer settings are the same as described in [18]. The LS corpus contains approximately 1000 hours of read speech for training.…”

Section: Accent Adaptation Taskmentioning

confidence: 99%

Sequence-Level Self-Learning with Multiple Hypotheses

et al. 2020

Self Cite

View full text Add to dashboard Cite

In this work, we develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR). For untranscribed speech data, the hypothesis from an ASR system must be used as a label. However, the imperfect ASR result makes unsupervised learning difficult to consistently improve recognition performance especially in the case that multiple powerful teacher models are unavailable. In contrast to conventional unsupervised learning approaches, we adopt the multi-task learning (MTL) framework where the n-th best ASR hypothesis is used as the label of each task. The seq2seq network is updated through the MTL framework so as to find the common representation that can cover multiple hypotheses. By doing so, the effect of the hard-decision errors can be alleviated. We first demonstrate the effectiveness of our self-learning methods through ASR experiments in an accent adaptation task between the US and British English speech. Our experiment results show that our method can reduce the WER on the British speech data from 14.55% to 10.36% compared to the baseline model trained with the US English data only. Moreover, we investigate the effect of our proposed methods in a federated learning scenario.

show abstract

A Federated Approach in Training Acoustic Models

Cited by 38 publications

References 3 publications

Study on Acoustic Model Personalization in a Context of Collaborative Learning Constrained by Privacy Preservation

Study on Acoustic Model Personalization in a Context of Collaborative Learning Constrained by Privacy Preservation

Privacy attacks for automatic speech recognition acoustic models in a federated learning framework

Sequence-Level Self-Learning with Multiple Hypotheses

Contact Info

Product

Resources

About