The fifth generation of mobile networks evolved to serve applications with distinct requirements, which results in a high management complexity due to simultaneous real-time tasks. In the physical layer, code words that allow proper data exchange between the Base Station (BS) and the served users must be chosen. While, in higher layers, the BS must choose users to be served in a given transmission opportunity. There are approaches based on Machine Learning (ML) to solve these combined tasks. However, due to the high amount of possible inputs, a challenge is the availability of data to train the models. In some cases, there may not even exist a predefined optimal answer to use as a "label" for supervised approaches. In this paper, we evaluate solutions for the combined problems of beam selection and user scheduling with Reinforcement Learning (RL), which does not need labels, as a solution for problems without a predefined answer. The algorithms were proposed for Problem Statement 6 of the challenge organized by the International Telecommunication Union (ITU) in 2021, which ranked as the finalists. We compare the approaches in relation to the cumulative reward received by the agents and show a performance comparison of different RL approaches by comparing them with baselines developed for the challenge. The paper also shows how the action taken by the trained agents affect network operation by comparing the number of packets transmitted, which is highly related to the proper selection of users and code words.
Sarcasm detection identifies natural language expressions whose intended meaning is different from what is implied by its surface meaning. It finds applications in many NLP tasks such as opinion mining, sentiment analysis, etc. Today, social media has given rise to an abundant amount of multimodal data where users express their opinions through text and images. Our paper aims to leverage multimodal data to improve the performance of the existing systems for sarcasm detection. So far, various approaches have been proposed that uses text and image modality and a fusion of both. We propose a novel architecture that uses the RoBERTa model with a co-attention layer on top to incorporate context incongruity between input text and image attributes. Further, we integrate feature-wise affine transformation by conditioning the input image through FiLMed ResNet blocks with the textual features using the GRU network to capture the multimodal information. The output from both the models and the CLS token from RoBERTa is concatenated and used for the final prediction. Our results demonstrate that our proposed model outperforms the existing state-of-the-art method by 6.14% F1 score on the public Twitter multimodal sarcasm detection dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.