Exploration Entropy for Reinforcement Learning

“…e problem of cold start is also unable to adapt to the short-term interest changes of users and make effective information recommendations. erefore, many scholars began to try to use reinforcement learning [23] to solve the problems in the recommendation system. Reinforcement learning is a learning algorithm based on the interaction of the environment.…”

Section: Related Workmentioning

confidence: 99%

“…Later, through the study of many scientists, a relatively complete system, approximate dynamic programming was formed. Reinforcement learning [23] is a dynamic interactive learning strategy algorithm.…”

Section: Related Workmentioning

confidence: 99%

Research on Sports Dance Video Recommendation Method Based on Style

Sun

Tang

2022

Scientific Programming

View full text Add to dashboard Cite

At present, sports dance teaching still tends to “demonstration” training. Students have limited time and space for autonomous learning, and their enthusiasm for participation is not high, which leads to a decline in classroom learning efficiency. In view of this, video teaching has become popular in sports dance classrooms, providing a new model for sports dance teaching. Video recommendation is particularly important for the improvement of teaching quality. A sports dance video recommendation method based on style is proposed. The factorization machine model is used to combine features and process high-dimensional sparse features, the deep neural network model is adopted as the value function network of the deep Q-learning algorithm, and the deep Q-learning algorithm is used as the decision function to solve the recommendation accuracy and diversity question. Through the application experiment of sports dance video recommendation, it is resulted that the recommendation accuracy of the proposed model is slightly higher than that of traditional recommendation algorithm and the recommendation diversity is obviously better than that of traditional recommendation algorithm. The advantages and feasibility of the proposed model are verified.

show abstract

“…This draws on the Exploration Entropy in a full reinforcement learning problem [38] where multiple states are associated with an agent.…”

Section: Uncertainty Evaluation Of Proactive Caching Systemsmentioning

confidence: 99%

Proactive Edge Caching in Vehicular Networks: An Online Bandit Learning Approach

Wang¹,

Grace²

2022

Preprint

View full text Add to dashboard Cite

By bringing content close to end-users, proactive caching plays a vital role in improving the user experience in wireless networks. Caching content at the network edge proactively has been particularly effective in fast-changing vehicular networks. The objective of this paper is to address the proactive caching problem at the next roadside unit (RSU) in vehicular networks using reinforcement learning techniques. The paper proposes two proactive caching algorithms based on multi-armed bandit (MAB) learning, namely non-contextual MAB-based (MAB) and contextual MAB-based (cMAB). Additionally, the paper also investigates the uncertainty associated with proactive caching systems in the form of entropy with a specifically extended Subjective Logic framework, providing an insight into the underlying link between prediction accuracy and uncertainty. Two cities: Las Vegas, USA with grid road layout and Manchester, UK with more complex and historical layout, are considered in the simulation. Results have shown the generality of the proposed schemes in cities with different road layouts. Performance of the two proposed MAB-based systems is compared with two non-contextual baseline system: Equal Probability-based and Probability-based, and one contextual baseline system named Compact Prediction Tree+ based. Both proposed systems outperformed their counterparts. In terms of the prediction accuracy, cMAB has reached 75% and 80\% accuracy in Las Vegas and Manchester respectively, and MAB reaches over 50% in both testing cities. Regarding the benefits to the vehicular network, cMAB and MAB perform similarly in both cities irrespective of the road layout. Particularly, the paper shows that on average 75% and 81% content fragments are proactively served with cMAB and over 50% with MAB in Las Vegas and Manchester, which is consistent with the prediction accuracy associated with the schemes.

show abstract

Exploration Entropy for Reinforcement Learning

Cited by 14 publications

References 35 publications

Proactive Edge Caching in Vehicular Networks: An Online Bandit Learning Approach

Proactive Edge Caching in Vehicular Networks: An Online Bandit Learning Approach

Research on Sports Dance Video Recommendation Method Based on Style

Proactive Edge Caching in Vehicular Networks: An Online Bandit Learning Approach

Contact Info

Product

Resources

About