2020
DOI: 10.48550/arxiv.2004.14840
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multiresolution and Multimodal Speech Recognition with Transformers

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 0 publications
0
2
0
Order By: Relevance
“…Multi-modal tasks were traditionally associated with visual question answering (Goyal et al, 2017), image captioning (Gurari et al, 2020), audiovisual speech recognition (Paraskevopoulos et al, 2020), or cross-modal retrieval (Wang et al, 2016). With success of competitions like the Hateful Memes Challenge (Kiela et al, 2020), more research focused on multi-modal offensive classification.…”
Section: Introductionmentioning
confidence: 99%
“…Multi-modal tasks were traditionally associated with visual question answering (Goyal et al, 2017), image captioning (Gurari et al, 2020), audiovisual speech recognition (Paraskevopoulos et al, 2020), or cross-modal retrieval (Wang et al, 2016). With success of competitions like the Hateful Memes Challenge (Kiela et al, 2020), more research focused on multi-modal offensive classification.…”
Section: Introductionmentioning
confidence: 99%
“…Transformers [21] are powerful neural architectures that lately have been used in ASR [22][23][24], SLU [25], and other audio-visual applications [26] with great success, mainly due to their attention mechanism. Only until recently, the attention concept has also been applied to beamforming, specifically for speech and noise mask estimations [9,27].…”
Section: Introductionmentioning
confidence: 99%