2022
DOI: 10.21203/rs.3.rs-2033641/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Personalized Short Video Recommendation Method Based on Multimodal Feature Fusion

Abstract: In order to achieve accurate parsing of multiple elements of short videos, multimodal short video processing with text, image, and speech as the three elements needs to be realized. Specifically, text information is the external marker data presented by short videos, and text information retrieval can currently be achieved by crawler crawling technology. Image information is the frame data extracted from the short video at certain time intervals by means of frame processing, and there are many types of image f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 30 publications
(33 reference statements)
0
2
0
Order By: Relevance
“…• Switching: The system gives the user more than one option for each recommendation item and chooses the best one based on what the user wants. RMs are selected based on the context in which they are being used [19]. The recommender selector criteria for the item-level sensitive dataset should be determined by the user's profile or other characteristics, such as interests.…”
Section: Hybrid Methodsmentioning
confidence: 99%
“…• Switching: The system gives the user more than one option for each recommendation item and chooses the best one based on what the user wants. RMs are selected based on the context in which they are being used [19]. The recommender selector criteria for the item-level sensitive dataset should be determined by the user's profile or other characteristics, such as interests.…”
Section: Hybrid Methodsmentioning
confidence: 99%
“…Song et al 24 proposed a video-audio based emotion recognition system, it uses VGG16-Net and Mel frequency Cepstral coefficients (MFCC) to extract video and audio features, which takes advantages of wealth of information in video and audio in order to improve the successive classification rate. The study 25 developed a multimodal data fusion short video recommendation method for multi-modal data fusion to make full use of the similarities and differences between different models, it increases the model's understanding ability of user behaviour, and improves the effect of short video recommendation. Ruan et al 26 proposed a framework for combining audio and video generation using a multimodal diffusion model, it uses a sequential multi-modal U-Net to finalize the joint denoising process, and simultaneously brings an engaging viewing and listening experience for high-quality real video.…”
Section: Multi-modal Joint Decisionmentioning
confidence: 99%