2023
DOI: 10.2139/ssrn.4342070
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Fusion for Audio-Image and Video Action Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 134 publications
(178 reference statements)
0
1
0
Order By: Relevance
“…Data Preprocessing: The video and audio data were preprocessed separately, as described in the following subsections. The video data was transformed into frames, while the audio data was converted into six audio-image representations following [14], [23]. Standard normalization techniques were applied to both modalities.…”
Section: Proposed Methodologymentioning
confidence: 99%
“…Data Preprocessing: The video and audio data were preprocessed separately, as described in the following subsections. The video data was transformed into frames, while the audio data was converted into six audio-image representations following [14], [23]. Standard normalization techniques were applied to both modalities.…”
Section: Proposed Methodologymentioning
confidence: 99%