Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-3015
|View full text |Cite
|
Sign up to set email alerts
|

Conformer: Convolution-augmented Transformer for Speech Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
754
1
7

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 1,608 publications
(870 citation statements)
references
References 0 publications
1
754
1
7
Order By: Relevance
“…Following the success of Transformer-based modeling of speech features [7,11], we choose convolution-augmented Transformer or Conformer [7] for the denoising network. For modeling long-term and short-term patterns, it relies on self-attention mechanism and specially designed convolution modules respectively.…”
Section: Baseline Conformer Transformer Enhancement Networkmentioning
confidence: 99%
See 2 more Smart Citations
“…Following the success of Transformer-based modeling of speech features [7,11], we choose convolution-augmented Transformer or Conformer [7] for the denoising network. For modeling long-term and short-term patterns, it relies on self-attention mechanism and specially designed convolution modules respectively.…”
Section: Baseline Conformer Transformer Enhancement Networkmentioning
confidence: 99%
“…For modeling long-term and short-term patterns, it relies on self-attention mechanism and specially designed convolution modules respectively. Moreover, it combines the power of relative Position Encoding (PE) scheme and Macaron-style half-step Feed-Forward Networks (FFNs) [7]. We additionally include Squeeze-and-Excitation [12] module (squeeze factor of 8) after the 1D Depthwise Convolution inside the convolution module of Conformer.…”
Section: Baseline Conformer Transformer Enhancement Networkmentioning
confidence: 99%
See 1 more Smart Citation
“…Conformer [6] is designed as an architecture with a multi-head attention module and a convolutional module, and a pair of feedforward network modules. The multi-head attention module and feedforward network module follow the form of the Transformer, and the convolutional module is the key part of the Conformer, which is mainly a depthwise convolutional layer that is sandwiched between two point-wise convolutional layers.…”
Section: Related Workmentioning
confidence: 99%
“…Over the past few years, some end-to-end models for offline applications [3][4][5][6][7] have gained performance comparable to that of humans. However, these models cannot be directly applied to real-time scenarios because of their high latency.…”
Section: Introductionmentioning
confidence: 99%