“…Policy-based algorithms are also popular in this field, including (deep) policy gradient methods [100,241], A2C [241], PPO [51,140], and DDPG [235]. The benchmark strategies studied in these papers include the Almgren-Chriss solution [102,100], the TWAP strategy [159,51,140], the VWAP strategy [140], and the SnL policy [158,235]. In some models the trader is allowed to buy or sell the asset at each time point [108,241,217,56], whereas there are also many models where only one trading direction is allowed [158,102,100,159,51,181,235,140].…”