“…Moreover, enabling an effective CDRL among heterogeneous agents (as expected in IoT applications) is very challenging, due to dissimilarities of agents (e.g., different action spaces), environments, and diversity of DRL tasks. In the CDRL context, the heterogeneity of environments, modeled as Markov decision processes (MDPs), as well as agents and their tasks can be expressed in two main forms: 1) distinct DRL tasks that are conceptually similar (i.e., semantically related tasks) or completely dissimilar [8], [9] and 2) distinct environments represented by different MDPs [10], [11]. Most existing works, such as in [5]- [7], study CDRL among homogeneous agents, i.e., agents with the same action space.…”