Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates

Kudo, Takeo

doi:10.18653/v1/p18-1007

Cited by 742 publications

(689 citation statements)

References 28 publications

(42 reference statements)

Supporting

Mentioning

677

Contrasting

Unclassified

Order By: Relevance

“…A SentencePiece tokenizer [15] is also provided by the library. Subword tokenization [16] [17], such as that provided by SentencePiece, has been used in many recent NLP breakthroughs [18] [19].…”

Section: Textmentioning

confidence: 99%

Fastai: A Layered API for Deep Learning

2020

View full text Add to dashboard Cite

fastai is a deep learning library which provides practitioners with high-level components that can quickly and easily provide state-of-the-art results in standard deep learning domains, and provides researchers with low-level components that can be mixed and matched to build new approaches. It aims to do both things without substantial compromises in ease of use, flexibility, or performance. This is possible thanks to a carefully layered architecture, which expresses common underlying patterns of many deep learning and data processing techniques in terms of decoupled abstractions. These abstractions can be expressed concisely and clearly by leveraging the dynamism of the underlying Python language and the flexibility of the PyTorch library. fastai includes: a new type dispatch system for Python along with a semantic type hierarchy for tensors; a GPU-optimized computer vision library which can be extended in pure Python; an optimizer which refactors out the common functionality of modern optimizers into two basic pieces, allowing optimization algorithms to be implemented in 4-5 lines of code; a novel 2-way callback system that can access any part of the data, model, or optimizer and change it at any point during training; a new data block API; and much more. We have used this library to successfully create a complete deep learning course, which we were able to write more quickly than using previous approaches, and the code was more clear. The library is already in wide use in research, industry, and teaching.

show abstract

Section: Textmentioning

confidence: 99%

Fastai: A Layered API for Deep Learning

2020

View full text Add to dashboard Cite

show abstract

“…For CTC training, we use word-pieces as our target. During training, the reference is tokenized to 5000 sub-word units using sentencepiece 1 with a uni-gram language model [15]. Neural networks are thus used to produce a posterior distribution for 5001 symbols (5000 sub-word units plus blank symbol) every frame.…”

Section: Target Unitsmentioning

confidence: 99%

DEJA-VU: Double Feature Presentation and Iterated Loss in Deep Transformer Networks

Tjandra

Liu

Zhang

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Deep acoustic models typically receive features in the first layer of the network, and process increasingly abstract representations in the subsequent layers. Here, we propose to feed the input features at multiple depths in the acoustic model. As our motivation is to allow acoustic models to re-examine their input features in light of partial hypotheses we introduce intermediate model heads and loss function. We study this architecture in the context of deep Transformer networks, and we use an attention mechanism over both the previous layer activations and the input features. To train this model's intermediate output hypothesis, we apply the objective function at each layer right before feature re-use. We find that the use of such intermediate losses significantly improves performance by itself, as well as enabling input feature re-use. We present results on both Librispeech, and a large scale video dataset, with relative improvements of 10 -20% for Librispeech and 3.2 -13% for videos.

show abstract

“…We apply the word-piece approach of Kudo (2018), which computes a word-piece unigram LM using a word-piece inventory V P . Each wordpiece x i ∈ V P is associated with a unigram probability p(x i ).…”

Section: Morphologically Rich Languagesmentioning

confidence: 99%

Federated Learning of N-Gram Language Models

Chen

Suresh

Mathews

et al. 2019

Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

View full text Add to dashboard Cite

We propose algorithms to train productionquality n-gram language models using federated learning. Federated learning is a distributed computation platform that can be used to train global models for portable devices such as smart phones. Federated learning is especially relevant for applications handling privacy-sensitive data, such as virtual keyboards, because training is performed without the users' data ever leaving their devices. While the principles of federated learning are fairly generic, its methodology assumes that the underlying models are neural networks. However, virtual keyboards are typically powered by n-gram language models for latency reasons.

show abstract

Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates

Cited by 742 publications

References 28 publications

Fastai: A Layered API for Deep Learning

Fastai: A Layered API for Deep Learning

DEJA-VU: Double Feature Presentation and Iterated Loss in Deep Transformer Networks

Federated Learning of N-Gram Language Models

Contact Info

Product

Resources

About