2018
DOI: 10.48550/arxiv.1803.09519
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Self-Attentional Acoustic Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
38
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 21 publications
(38 citation statements)
references
References 0 publications
0
38
0
Order By: Relevance
“…There have been abundant studies on Transformers for end-toend speech recognition, particularly in the context of the S2S model with attention [5,6,7,16], as well as Transformer Transducers [8,17]. In [5,10], the authors compared RNNs with transformers for various speech recognition tasks, and obtained competitive or even better results with Transformers.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…There have been abundant studies on Transformers for end-toend speech recognition, particularly in the context of the S2S model with attention [5,6,7,16], as well as Transformer Transducers [8,17]. In [5,10], the authors compared RNNs with transformers for various speech recognition tasks, and obtained competitive or even better results with Transformers.…”
Section: Related Workmentioning
confidence: 99%
“…For speech recognition, Transformers have achieved competitive recognition accuracy compared to RNN-based counterparts within both end-to-end [5,6,7,8,9,10] and hybrid [11,12] frameworks. However, the superior results are usually achieved in the offline condition, while in the streaming fashion, Transformers have shown significant degradation in terms of accuracy from previous results [5,12], even in a condition of a large latency constraint.…”
Section: Introductionmentioning
confidence: 99%
“…Using the notation in [51], we reshape audio in the following way, where X is a sequence of amplitudes X = {x 0 , x 1 , ..., x n }, l is the sequence length, and d is the hidden dimension:…”
Section: B Feature Extractionmentioning
confidence: 99%
“…There have been a few studies on transformers for end-to-end speech recognition, particularly for sequence-to-sequence with attention model [10,11,12], as well as transducer [13] and CTC models [14]. In [10], the authors compared RNNs with transformers for various speech recognition and synthesis tasks, and obtained competitive or even better results with transformers.…”
Section: Related Workmentioning
confidence: 99%
“…CNNs, on the other hand, require multiple layers to capture the correlations between the two features which are very distant in the time space, although dilation that uses large strides can reduce the number of layers that is required. While there have been many studies on end-to-end speech recognition using transformers [10,11,12,13,14], their applications for hybrid acoustic models are less well understood. In this paper, we study the more standard transformer for speech recognition within the hybrid framework, and provide further insight to this model through experiments on the Librispeech public dataset.…”
Section: Introductionmentioning
confidence: 99%