“…In the ablation study, we investigate the importance of our network architecture design choices, training scheme, and reward function. In addition, we compare our reward function with an existing reward from Li et al (2021). We, therefore, divided our ablation study into two parts.…”