This paper introduces Meta-Q-Learning (MQL), a new off-policy algorithm for meta-Reinforcement Learning (meta-RL). MQL builds upon three simple ideas. First, we show that Q-learning is competitive with state of the art meta-RL algorithms if given access to a context variable that is a representation of the past trajectory. Second, using a multi-task objective to maximize the average reward across the training tasks is an effective method to meta-train RL policies. Third, past data from the meta-training replay buffer can be recycled to adapt the policy on a new task using off-policy updates. MQL draws upon ideas in propensity estimation to do so and thereby amplifies the amount of available data for adaptation. Experiments on standard continuous-control benchmarks suggest that MQL compares favorably with state of the art meta-RL algorithms.
Nowadays mobile phones are not only communication devices, but also a source of rich sensory data that can be collected and exploited by distributed people-centric sensing applications. Among them, environmental monitoring and emergency response systems can particularly benefit from people-based sensing. Due to the limited resources of mobile devices, sensed data are usually offloaded to the cloud. However, state-of-the art solutions lack a unified approach suitable to support diverse applications, while reducing the energy consumption of the mobile device. In this paper, we specifically address mobile devices as rich sources of multimodal data collected by users. In this context, we propose an integrated framework for storing, processing and delivering sensed data to people-centric applications deployed in the cloud. Our integrated platform is the foundation of a new delivery model, namely, Mobile Application as a Service (MAaaS), which allows the creation of people-centric applications across different domains, including participatory sensing and mobile social networks. We specifically address a case study represented by an emergency response system for fire detection and alerting. Through a prototype testbed implementation, we show that the proposed framework can reduce the energy consumption of mobile devices, while satisfying the application requirements.
Optimal selection of a subset of items from a given set is a hard problem that requires combinatorial optimization. In this paper, we propose a subset selection algorithm that is trainable with gradient based methods yet achieves near optimal performance via submodular optimization. We focus on the task of identifying a relevant set of sentences for claim verification in the context of the FEVER task. Conventional methods for this task look at sentences on their individual merit and thus do not optimize the informativeness of sentences as a set. We show that our proposed method which builds on the idea of unfolding a greedy algorithm into a computational graph allows both interpretability and gradient based training. The proposed differentiable greedy network (DGN) outperforms discrete optimization algorithms as well as other baseline methods in terms of precision and recall.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.