Rasool Fakoor scite author profile

This paper introduces Meta-Q-Learning (MQL), a new off-policy algorithm for meta-Reinforcement Learning (meta-RL). MQL builds upon three simple ideas. First, we show that Q-learning is competitive with state of the art meta-RL algorithms if given access to a context variable that is a representation of the past trajectory. Second, using a multi-task objective to maximize the average reward across the training tasks is an effective method to meta-train RL policies. Third, past data from the meta-training replay buffer can be recycled to adapt the policy on a new task using off-policy updates. MQL draws upon ideas in propensity estimation to do so and thereby amplifies the amount of available data for adaptation. Experiments on standard continuous-control benchmarks suggest that MQL compares favorably with state of the art meta-RL algorithms.

show abstract

An integrated cloud-based framework for mobile phone sensing

Fakoor

Raj

Nazi

et al. 2012

View full text Add to dashboard Cite

Nowadays mobile phones are not only communication devices, but also a source of rich sensory data that can be collected and exploited by distributed people-centric sensing applications. Among them, environmental monitoring and emergency response systems can particularly benefit from people-based sensing. Due to the limited resources of mobile devices, sensed data are usually offloaded to the cloud. However, state-of-the art solutions lack a unified approach suitable to support diverse applications, while reducing the energy consumption of the mobile device. In this paper, we specifically address mobile devices as rich sources of multimodal data collected by users. In this context, we propose an integrated framework for storing, processing and delivering sensed data to people-centric applications deployed in the cloud. Our integrated platform is the foundation of a new delivery model, namely, Mobile Application as a Service (MAaaS), which allows the creation of people-centric applications across different domains, including participatory sensing and mobile social networks. We specifically address a case study represented by an emergency response system for fire detection and alerting. Through a prototype testbed implementation, we show that the proposed framework can reduce the energy consumption of mobile devices, while satisfying the application requirements.

show abstract

Differentiable Greedy Networks

Fakoor¹,

Shakeri²,

Sethy³

et al. 2018

Preprint

View full text Add to dashboard Cite

Optimal selection of a subset of items from a given set is a hard problem that requires combinatorial optimization. In this paper, we propose a subset selection algorithm that is trainable with gradient based methods yet achieves near optimal performance via submodular optimization. We focus on the task of identifying a relevant set of sentences for claim verification in the context of the FEVER task. Conventional methods for this task look at sentences on their individual merit and thus do not optimize the informativeness of sentences as a set. We show that our proposed method which builds on the idea of unfolding a greedy algorithm into a computational graph allows both interpretability and gradient based training. The proposed differentiable greedy network (DGN) outperforms discrete optimization algorithms as well as other baseline methods in terms of precision and recall.

show abstract

Deep Quantile Aggregation

Fakoor¹,

Kim²,

Mueller³

et al. 2021

Preprint

View full text Add to dashboard Cite

P3O: Policy-on Policy-off Policy Optimization

Fakoor¹,

Chaudhari²,

Smola³

2019

Preprint

View full text Add to dashboard Cite

Continuous Doubly Constrained Batch Reinforcement Learning

Fakoor¹,

Mueller²,

Asadi³

et al. 2021

Preprint

View full text Add to dashboard Cite

Reliant on too many experiments to learn good actions, current Reinforcement Learning (RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration. We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment. The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data. This leads to particularly severe extrapolation when our candidate policies diverge from one that generated the data. We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates. Over a comprehensive set of 32 continuous-action batch RL benchmarks, our approach compares favorably to state-of-the-art methods, regardless of how the offline data were collected.

show abstract

Constrained Convolutional-Recurrent Networks to Improve Speech Quality with Low Impact on Recognition Accuracy

Fakoor

Tashev

et al. 2018

View full text Add to dashboard Cite

For a speech-enhancement algorithm, it is highly desirable to simultaneously improve perceptual quality and recognition rate. Thanks to computational costs and model complexities, it is challenging to train a model that effectively optimizes both metrics at the same time. In this paper, we propose a method for speech enhancement that combines local and global contextual structures information through convolutional-recurrent neural networks that improves perceptual quality. At the same time, we introduce a new constraint on the objective function using a language model/decoder that limits the impact on recognition rate. Based on experiments conducted with real user data, we demonstrate that our new context-augmented machinelearning approach for speech enhancement improves PESQ and WER by an additional 24.5% and 51.3%, respectively, when compared to the best-performing methods in the literature.

show abstract

Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline

Caccia¹,

Mueller²,

Kim³

et al. 2022

Preprint

View full text Add to dashboard Cite

We study task-agnostic continual reinforcement learning (TACRL) in which standard RL challenges are compounded with partial observability stemming from task agnosticism, as well as additional difficulties of continual learning (CL), i.e., learning on a non-stationary sequence of tasks. Here we compare TACRL methods with their soft upper bounds prescribed by previous literature: multi-task learning (MTL) methods which do not have to deal with non-stationary data distributions, as well as task-aware methods, which are allowed to operate under full observability. We consider a previously unexplored and straightforward baseline for TACRL, replay-based recurrent RL (3RL), in which we augment an RL algorithm with recurrent mechanisms to address partial observability and experience replay mechanisms to address catastrophic forgetting in CL. Studying empirical performance in a sequence of RL tasks, we find surprising occurrences of 3RL matching and overcoming the MTL and task-aware soft upper bounds. We lay out hypotheses that could explain this inflection point of continual and task-agnostic learning research. Our hypotheses are empirically tested in continuous control tasks via a large-scale study of the popular multi-task and continual learning benchmark Meta-World. By analyzing different training statistics including gradient conflict, we find evidence that 3RL's outperformance stems from its ability to quickly infer how new tasks relate with the previous ones, enabling forward transfer.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Rasool Fakoor

Meta-Q-Learning

An integrated cloud-based framework for mobile phone sensing

Differentiable Greedy Networks

Deep Quantile Aggregation

P3O: Policy-on Policy-off Policy Optimization

Continuous Doubly Constrained Batch Reinforcement Learning

Constrained Convolutional-Recurrent Networks to Improve Speech Quality with Low Impact on Recognition Accuracy

Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline

Contact Info

Product

Resources

About