Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2021
DOI: 10.1109/asru51503.2021.9687874
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
16
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 37 publications
(21 citation statements)
references
References 25 publications
0
16
0
Order By: Relevance
“…The back-end networks use an Efficient Conformer architecture. The Efficient Conformer encoder was proposed in [7], it is composed of several stages where each stage comprises a number of Conformer blocks [16] using grouped attention with relative positional encodings. The temporal sequence is progressively downsampled using strided convolutions and projected to wider feature dimensions, lowering the amount of computation while achieving better performance.…”
Section: Model Architecturementioning
confidence: 99%
See 3 more Smart Citations
“…The back-end networks use an Efficient Conformer architecture. The Efficient Conformer encoder was proposed in [7], it is composed of several stages where each stage comprises a number of Conformer blocks [16] using grouped attention with relative positional encodings. The temporal sequence is progressively downsampled using strided convolutions and projected to wider feature dimensions, lowering the amount of computation while achieving better performance.…”
Section: Model Architecturementioning
confidence: 99%
“…The Efficient Conformer [7] proposed to replace Multi-Head Self-Attention (MHSA) [44] in earlier encoder layers with grouped attention. Grouped MHSA reduce attention complexity by grouping neighbouring temporal elements along the feature dimension before applying scaled dot-product attention.…”
Section: Patch Attentionmentioning
confidence: 99%
See 2 more Smart Citations
“…It uses * Equal contribution a convolution module to capture local context dependencies in addition to the long context captured by the self-attention module. The conformer architecture was investigated for different end-to-end systems such as attention encoder-decoder models [12,13], and recurrent neural network transducer [10,14]. Nevertheless, there has been no work investigating the impact of using a conformer AM for hybrid ASR systems.…”
Section: Introduction and Related Workmentioning
confidence: 99%