“…Our resulting algorithm, E2DC, shows performance gains in difficult continuous control tasks and improvements in distribution matching. Future work includes integrating our changes with orthogonal advances in distributional RL (Kuznetsov et al, 2020) and more expressive policies (Yue et al, 2020;Ward et al, 2019). Due to our method potentially learning more accurate distributions of the true returns, our work here can be leveraged for specific use cases, such as risk-seeking policies in stock market trading strategies, or risk-averse learning for robotics.…”