Federated reinforcement learning: techniques, applications, and open challenges

Qi, Jiaju; Zhou, Qihao; Lei, Lei; Zheng, Kan

doi:10.20517/ir.2021.02

Cited by 59 publications

(8 citation statements)

References 75 publications

(105 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The TV penalty occurred when investigating the connection between the local and global policy advantage, i.e., Theorem I, to tackle the data heterogeneity, while the KL penalty came from (12) to constrain the policy update at one agent.…”

Section: Corollary I the Conditionmentioning

confidence: 99%

“…Due to the distributed data and computing power in large-scale applications such as autonomous driving, the training of RL algorithms under the FL framework is inevitable. Unfortunately, many challenges faced by supervised FL, e.g., the data heterogeneity and communication bottleneck, are still valid and even worse for FRL [12]. For example, centralized policy gradient methods already suffer from high variance which is detrimental to the convergence speed and training performance [13]- [15], and data heterogeneity imposes another layer of difficulty for the convergence of FRL.…”

Section: Introductionmentioning

confidence: 99%

“…The heterogeneity issue of FRL is different from that of the supervised FL and the related study is very limited [21]- [23]. There are mainly two types of RL methods, namely, the action-value method and the policy gradient (PG) method [11], [12]. The first performs action selection according to the value function, while the second utilizes ML models to approximate the policy and update the policy by gradient based algorithms.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

FedKL: Tackling Data Heterogeneity in Federated Reinforcement Learning by Penalizing KL Divergence

Xie¹,

Song²

2022

Preprint

View full text Add to dashboard Cite

As a distributed learning paradigm, Federated Learning (FL) faces the communication bottleneck issue due to many rounds of model synchronization and aggregation. Heterogeneous data further deteriorates the situation by causing slow convergence. Although the impact of data heterogeneity on supervised FL has been widely studied, the related investigation for Federated Reinforcement Learning (FRL) is still in its infancy. In this paper, we first define the type and level of data heterogeneity for policy gradient based FRL systems. By inspecting the connection between the global and local objective functions, we prove that local training can benefit the global objective, if the local update is properly penalized by the total variation (TV) distance between the local and global policies. A necessary condition for the global policy to be learn-able from the local policy is also derived, which is directly related to the heterogeneity level. Based on the theoretical result, a Kullback-Leibler (KL) divergence based penalty is proposed, which, different from the conventional method that penalizes the model divergence in the parameter space, directly constrains the model outputs in the distribution space. By jointly penalizing the divergence of the local policy from the global policy with a global penalty and constraining each iteration of the local training with a local penalty, the proposed method achieves a better trade-off between training speed (step size) and convergence. Experiment results on two popular RL experiment platforms demonstrate the advantage of the proposed algorithm over existing methods in accelerating and stabilizing the training process with heterogeneous data.

show abstract

Section: Corollary I the Conditionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

FedKL: Tackling Data Heterogeneity in Federated Reinforcement Learning by Penalizing KL Divergence

Xie¹,

Song²

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…HFRL may be noted to have similarities to "Parallel RL". Parallel RL is a long studied field of RL, where agent gradients are transferred amongst each other [5,10,11] .…”

Section: Federated Reinforcement Learningmentioning

confidence: 99%

AVDDPG – Federated reinforcement learning applied to autonomous platoon control

Boin¹,

Lei²,

Yang³

2022

Intell Robot

Self Cite

View full text Add to dashboard Cite

Since 2016 federated learning (FL) has been an evolving topic of discussion in the artificial intelligence (AI) research community. Applications of FL led to the development and study of federated reinforcement learning (FRL). Few works exist on the topic of FRL applied to autonomous vehicle (AV) platoons. In addition, most FRL works choose a single aggregation method (usually weight or gradient aggregation). We explore FRL's effectiveness as a means to improve AV platooning by designing and implementing an FRL framework atop a custom AV platoon environment. The application of FRL in AV platooning is studied under two scenarios: (1) Inter-platoon FRL (Inter-FRL) where FRL is applied to AVs across different platoons; (2) Intra-platoon FRL (Intra-FRL) where FRL is applied to AVs within a single platoon. Both Inter-FRL and Intra-FRL are applied to a custom AV platooning environment using both gradient and weight aggregation to observe the performance effects FRL can have on AV platoons relative to an AV platooning environment trained without FRL. It is concluded that Intra-FRL using weight aggregation (Intra-FRLWA) provides the best performance for controlling an AV platoon. In addition, we found that weight aggregation in FRL for AV platooning provides increases in performance relative to gradient aggregation. Finally, a performance analysis is conducted for Intra-FRLWA versus a platooning environment without FRL for platoons of length 3, 4 and 5 vehicles. It is concluded that Intra-FRLWA largely out-performs the platooning environment that is trained without FRL.

show abstract

“…Still, many robots, especially those involved in tasks as consequential as search-and-rescue and space exploration [8], will have to adapt to ever-changing environmental conditions and continue to optimize and update their internal policies over the course of their lifetime [9]. As such, to untether these methods from the confines of a lab, data collection and storage must be memory efficient enough to allow for either low-latency networking [10], [11] or for sufficient experience data to be stored on-board edge computing devices [12]. As fast and secure updates may not be possible in remote locations or when using bandwidth-constrained or highlatency cloud networks [13], it is imperative to find ways to reduce the overall memory footprint of DRL training.…”

Section: Introductionmentioning

confidence: 99%

Just Round: Quantized Observation Spaces Enable Memory Efficient Learning of Dynamic Locomotion

Grossman¹,

Plancher²

2022

Preprint

View full text Add to dashboard Cite

Deep reinforcement learning (DRL) is one of the most powerful tools for synthesizing complex robotic behaviors. But training DRL models is incredibly compute and memory intensive, requiring large training datasets and replay buffers to achieve performant results. This poses a challenge for the next generation of field robots that will need to learn on the edge to adapt to their environment. In this paper, we begin to address this issue through observation space quantization. We evaluate our approach using four simulated robot locomotion tasks and two state-of-the-art DRL algorithms, the on-policy Proximal Policy Optimization (PPO) and off-policy Soft Actor-Critic (SAC) and find that observation space quantization reduces overall memory costs by as much as 4.2× without impacting learning performance.

show abstract

Federated reinforcement learning: techniques, applications, and open challenges

Cited by 59 publications

References 75 publications

FedKL: Tackling Data Heterogeneity in Federated Reinforcement Learning by Penalizing KL Divergence

FedKL: Tackling Data Heterogeneity in Federated Reinforcement Learning by Penalizing KL Divergence

AVDDPG – Federated reinforcement learning applied to autonomous platoon control

Just Round: Quantized Observation Spaces Enable Memory Efficient Learning of Dynamic Locomotion

Contact Info

Product

Resources

About