ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8682954
|View full text |Cite
|
Sign up to set email alerts
|

Self-attention Aligner: A Latency-control End-to-end Model for ASR Using Self-attention Network and Chunk-hopping

Abstract: Self-attention network, an attention-based feedforward neural network, has recently shown the potential to replace recurrent neural networks (RNNs) in a variety of NLP tasks. However, it is not clear if the self-attention network could be a good alternative of RNNs in automatic speech recognition (ASR), which processes the longer speech sequences and may have online recognition requirements. In this paper, we present a RNN-free end-to-end model: self-attention aligner (SAA), which applies the self-attention ne… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
63
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 80 publications
(65 citation statements)
references
References 21 publications
1
63
1
Order By: Relevance
“…For AISHELL-2, we use all the train data (1000 hours) for training, mix the three development sets for validation and use the three test sets for evaluation. For HKUST, we use the same training (∼168 hours), validation and evaluation set as [15]. The training of LM on AISHELL-2 and HKUST uses the text from respective training set.…”
Section: Methodsmentioning
confidence: 99%
“…For AISHELL-2, we use all the train data (1000 hours) for training, mix the three development sets for validation and use the three test sets for evaluation. For HKUST, we use the same training (∼168 hours), validation and evaluation set as [15]. The training of LM on AISHELL-2 and HKUST uses the text from respective training set.…”
Section: Methodsmentioning
confidence: 99%
“…Recently, there have been several works that have applied selfattention mechanism in speech recognition and achieved comparable results with traditional hybrid models [6,17,21]. Different from these, we introduce self-attention mechanism into transducer-based model.…”
Section: Related Workmentioning
confidence: 99%
“…We also propose a chunk-flow mechanism to realize online decoding. Different from chunk-hopping mechanism in [21], which segments an entire utterance into several overlapped chunks as the inputs, we utilize a sliding window at each layer to limit the scope of the self-attention. Chunk-flow mechanism is more analogous to the time-restricted self-attention layer [16].…”
Section: Related Workmentioning
confidence: 99%
“…However, Gaussian masking still requires the entire input sequence. Dong et al [25] introduced a chunk hopping mechanism to the CTC-Transformer model to support online recognition, which degraded the standard Transformer since it ignored the global context.…”
Section: Relation With Prior Workmentioning
confidence: 99%