Decoupling Representation Learning from Reinforcement Learning

Stooke, Adam; Lee, Kimin; Abbeel, Pieter; Laskin, Michael

doi:10.48550/arxiv.2009.08319

Cited by 18 publications

(25 citation statements)

References 16 publications

(28 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is primarily because XGBoost being an "instructive" process, has access to complete data during training which allows it to learn a better representation of the data compared to a DRL agent trained in an episodic manner. These problems can be potentially be resolved by handling the distribution shift in offline reinforcement learning [19], using a better curriculum strategy [30] or by solving for the representation learning problem [41].…”

Section: Discussionmentioning

confidence: 99%

Application of Deep Reinforcement Learning to Payment Fraud

Siddharth¹,

Kayathwal²,

Wadhwa³

et al. 2021

Preprint

View full text Add to dashboard Cite

The large variety of digital payment choices available to consumers today has been a key driver of e-commerce transactions in the past decade. Unfortunately, this has also given rise to cybercriminals and fraudsters who are constantly looking for vulnerabilities in these systems by deploying increasingly sophisticated fraud attacks. A typical fraud detection system employs standard supervised learning methods where the focus is on maximizing the fraud recall rate. However, we argue that such a formulation can lead to sub-optimal solutions. The design requirements for these fraud models requires that they are robust to the high-class imbalance in the data, adaptive to changes in fraud patterns, maintain a balance between the fraud rate and the decline rate to maximize revenue, and be amenable to asynchronous feedback since usually there is a significant lag between the transaction and the fraud realization. To achieve this, we formulate fraud detection as a sequential decision-making problem by including the utility maximization within the model in the form of the reward function. The historical decline rate and fraud rate define the state of the system with a binary action space composed of approving or declining the transaction. In this study, we primarily focus on utility maximization and explore different reward functions to this end. The performance of the proposed Reinforcement Learning system has been evaluated for two publicly available fraud datasets using Deep Q-learning and compared with different classifiers. We aim to address the rest of the issues in future work. CCS CONCEPTS• Applied computing → Online banking; Secure online transactions; Online shopping.

show abstract

Section: Discussionmentioning

confidence: 99%

Application of Deep Reinforcement Learning to Payment Fraud

Siddharth¹,

Kayathwal²,

Wadhwa³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Namely, Laskin et al [22] use data augmentations as positive samples and all other samples in the batch, as well as their augmentations as the negative ones. Similarly, the work in [34] uses contrastive learning to associate pairs of observations separated by a short time difference, hence uses (near) future observations as positive queries and all other samples in the batch as negative ones. Both AE-based methods and contrastive learning focus on compression of observation as the main goal for SRL.…”

Section: Srl Modelmentioning

confidence: 99%

“…Namely we use RAE and contrastive methods. Although contrastive learning has shown superior results to AE-based approaches [22], [34], these methods have many advantages. They are simple to implement, allow for integrating self-supervised objectives such as jigsaw puzzle [25], enable multi-modal and multi-view fusion [1], [23], as well as task-specific objectives such as contact prediction [23].…”

Section: Srl Modelmentioning

confidence: 99%

Seeking Visual Discomfort: Curiosity-driven Representations for Reinforcement Learning

Aljalbout¹,

Ulmer²,

Triebel³

2021

Preprint

View full text Add to dashboard Cite

Vision-based reinforcement learning (RL) is a promising approach to solve control tasks involving images as the main observation. State-of-the-art RL algorithms still struggle in terms of sample efficiency, especially when using image observations. This has led to increased attention on integrating state representation learning (SRL) techniques into the RL pipeline. Work in this field demonstrates a substantial improvement in sample efficiency among other benefits. However, to take full advantage of this paradigm, the quality of samples used for training plays a crucial role. More importantly, the diversity of these samples could affect the sample efficiency of vision-based RL, but also its generalization capability. In this work, we present an approach to improve sample diversity for state representation learning. Our method enhances the exploration capability of RL algorithms, by taking advantage of the SRL setup. Our experiments show that our proposed approach boosts the visitation of problematic states, improves the learned state representation, and outperforms the baselines for all tested environments. These results are most apparent for environments where the baseline methods struggle. Even in simple environments, our method stabilizes the training, reduces the reward variance, and promotes sample efficiency.

show abstract

“…Partly in response to these and related shortcomings, some of the AI community has suggested that it may be desirable to decouple feature importance from representation learning [97,92,106]. For scientific inquiry, however, this decoupling is only useful if the result is human-comprehensible and interpretable (as defined herein).…”

Section: Explainability Versus Interpretabilitymentioning

confidence: 99%

Learning from learning machines: a new generation of AI technology to meet the needs of science

Pion-Tonachini¹,

Bouchard²,

Martín³

et al. 2021

Preprint

View full text Add to dashboard Cite

We outline emerging opportunities and challenges to enhance the utility of AI for scientific discovery. The distinct goals of AI for industry versus the goals of AI for science create tension between identifying patterns in data versus discovering patterns in the world from data. If we address the fundamental challenges associated with "bridging the gap" between domain-driven scientific models and data-driven AI learning machines, then we expect that these AI models can transform hypothesis generation, scientific discovery, and the scientific process itself. Generalization versus extrapolationAchieving scientific understanding from data-driven AI models embedded in a scientific workflow requires access to data and data representations in order to discover patterns-not just in the data, but ultimately in the world. This consists of at least two components:1. The capacity to learn models from data that lead to testable hypotheses-potentially about data that are very different than the original data used to train the models.

show abstract

Decoupling Representation Learning from Reinforcement Learning

Cited by 18 publications

References 16 publications

Application of Deep Reinforcement Learning to Payment Fraud

Application of Deep Reinforcement Learning to Payment Fraud

Seeking Visual Discomfort: Curiosity-driven Representations for Reinforcement Learning

Learning from learning machines: a new generation of AI technology to meet the needs of science

Contact Info

Product

Resources

About