State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Chung-Cheng, Chiu; Tara, N Sainath; Wu, Yonghui; Rohit, Prabhavalkar; Nguyen, Patrick; Chen, Zhifeng; Kannan, Anjuli; Ron, Jeremy; Kanishka, Rao; Ekaterina, Gonina; Navdeep, Jaitly; Li, Bo; Chorowski, Jan; Michiel, Bacchiani

doi:10.48550/arxiv.1712.01769

Cited by 29 publications

(47 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The second encoder in the multistage model, which takes the transcript as input, also uses an embedding layer of the same size. All decoders use 4-headed additive attention [31,17,22]. Our Baseline is the multistage model in which the two stages that do ASR and NLU are trained independently, but using the same training data.…”

Section: Modelmentioning

confidence: 99%

From Audio to Semantics: Approaches to End-to-End Spoken Language Understanding

Haghani

Narayanan

Bacchiani

et al. 2018

2018 IEEE Spoken Language Technology Workshop (SLT)

Self Cite

128

165

View full text Add to dashboard Cite

Conventional spoken language understanding systems consist of two main components: an automatic speech recognition module that converts audio to a transcript, and a natural language understanding module that transforms the resulting text (or top N hypotheses) into a set of domains, intents, and arguments. These modules are typically optimized independently. In this paper, we formulate audio to semantic understanding as a sequence-to-sequence problem [1]. We propose and compare various encoder-decoder based approaches that optimize both modules jointly, in an end-to-end manner. Evaluations on a real-world task show that 1) having an intermediate text representation is crucial for the quality of the predicted semantics, especially the intent arguments and 2) jointly optimizing the full system improves overall accuracy of prediction. Compared to independently trained models, our best jointly trained model achieves similar domain and intent prediction F 1 scores, but improves argument word error rate by 18% relative.

show abstract

Section: Modelmentioning

confidence: 99%

From Audio to Semantics: Approaches to End-to-End Spoken Language Understanding

Haghani

Narayanan

Bacchiani

et al. 2018

2018 IEEE Spoken Language Technology Workshop (SLT)

Self Cite

128

165

View full text Add to dashboard Cite

show abstract

“…The LAS architecture has achieved state-of-the-art word error rates (WER) on a task with two orders of magnitude more training data than here [9], but on smaller datasets hybrid TDNN-HMM ASR approaches are still considerably better. Table 1 shows the results of our ASR model contrasted with those reported by XNMT in [7], on the TED-LIUM development and test sets.…”

Section: Word Error Ratesmentioning

confidence: 92%

The MeMAD Submission to the IWSLT 2018 Speech Translation Task

Sulubacak¹,

Tiedemann²,

Rouhe³

et al. 2018

Preprint

View full text Add to dashboard Cite

This paper describes the MeMAD project entry to the IWSLT Speech Translation Shared Task, addressing the translation of English audio into German text. Between the pipeline and end-to-end model tracks, we participated only in the former, with three contrastive systems. We tried also the latter, but were not able to finish our end-to-end model in time.All of our systems start by transcribing the audio into text through an automatic speech recognition (ASR) model trained on the TED-LIUM English Speech Recognition Corpus (TED-LIUM). Afterwards, we feed the transcripts into English-German text-based neural machine translation (NMT) models. Our systems employ three different translation models trained on separate training sets compiled from the English-German part of the TED Speech Translation Corpus (TED-TRANS) and the OPENSUBTITLES2018 section of the OPUS collection.In this paper, we also describe the experiments leading up to our final systems. Our experiments indicate that using OPENSUBTITLES2018 in training significantly improves translation performance. We also experimented with various pre-and postprocessing routines for the NMT module, but we did not have much success with these.Our best-scoring system attains a BLEU score of 16.45 on the test set for this year's task.

show abstract

“…There has been a remarkable growth in the interest towards deep neural networks (DNNs) in the last decade, as they surpassed previous state-of-the-art machine learning models in many tasks, such as speech recognition [5] and natural language processing [3]. Aside from theoretical developments in DNN architectures and training methods, there has been two trends that still fuel this growth to date: increasing computing power, and availability of large data sets.…”

Section: Introductionmentioning

confidence: 99%

UnSplit: Data-Oblivious Model Inversion, Model Stealing, and Label Inference Attacks Against Split Learning

Erdogan¹,

Küpçü²,

Çiçek³

2021

Preprint

View full text Add to dashboard Cite

Training deep neural networks requires large scale data, which often forces users to work in a distributed or outsourced setting, accompanied with privacy concerns. Split learning framework aims to address this concern by splitting up the model among the client and the server. The idea is that since the server does not have access to client's part of the model, the scheme supposedly provides privacy. We show that this is not true via two novel attacks. (1) We show that an honest-butcurious split learning server, equipped only with the knowledge of the client neural network architecture, can recover the input samples and also obtain a functionally similar model to the client model, without the client being able to detect the attack.(2) Furthermore, we show that if split learning is used naively to protect the training labels, the honest-but-curious server can infer the labels with perfect accuracy. We test our attacks using three benchmark datasets and investigate various properties of the overall system that affect the attacks' effectiveness. Our results show that plaintext split learning paradigm can pose serious security risks and provide no more than a false sense of security. 1 1 Supplementary code can be found at https://github.com/ege-erdogan/unsplit.Preprint. Under review.

show abstract

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Cited by 29 publications

References 0 publications

From Audio to Semantics: Approaches to End-to-End Spoken Language Understanding

From Audio to Semantics: Approaches to End-to-End Spoken Language Understanding

The MeMAD Submission to the IWSLT 2018 Speech Translation Task

UnSplit: Data-Oblivious Model Inversion, Model Stealing, and Label Inference Attacks Against Split Learning

Contact Info

Product

Resources

About