2022
DOI: 10.1155/2022/1204909
|View full text |Cite
|
Sign up to set email alerts
|

Research on Video Captioning Based on Multifeature Fusion

Abstract: Aiming at the problems that the existing video captioning models pay attention to incomplete information and the generation of expression text is not accurate enough, a video captioning model that integrates image, audio, and motion optical flow is proposed. A variety of large-scale dataset pretraining models are used to extract video frame features, motion information, audio features, and video sequence features. An embedded layer structure based on self-attention mechanism is designed to embed single-mode fe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 40 publications
(38 reference statements)
0
2
0
Order By: Relevance
“…e proposed attention mechanism is inspired by the human visual mechanism [29], and the basic idea is to weaken irrelevant information and increase the attention of focused information during the operation. In this paper, the time-dimensional features Tand spatial-dimensional features F are fused by using the multiscale feature fusion attention mechanism as shown in Figure 3.…”
Section: Multiscale Feature Fusionmentioning
confidence: 99%
“…e proposed attention mechanism is inspired by the human visual mechanism [29], and the basic idea is to weaken irrelevant information and increase the attention of focused information during the operation. In this paper, the time-dimensional features Tand spatial-dimensional features F are fused by using the multiscale feature fusion attention mechanism as shown in Figure 3.…”
Section: Multiscale Feature Fusionmentioning
confidence: 99%
“…e proposed attention mechanism is inspired by the human visual mechanism, and the basic idea is to weaken irrelevant information and increase the attention of focused information during the operation [27]. In this paper, learner features u i , attribute features s j of learning resources, and text features t j of learning resources are fused using the multiscale feature fusion attention mechanism shown in Figure 4.…”
Section: Multiscale Feature Fusionmentioning
confidence: 99%