“…These approaches can be broadly categorized into three main lines of work: 1) fine-tuning DMs on carefully curated image-prompt datasets (Dai et al, 2023;Podell et al, 2023); 2) maximizing explicit reward functions, either through multi-step diffusion generation outputs (Prabhudesai et al, 2023;Clark et al, 2023;Lee et al, 2023) or policy gradient-based reinforcement learning (RL) methods (Fan et al, 2024;Black et al, 2023;Ye et al, 2024). 3) implicit reward maximization, exemplified by Diffusion-DPO (Wallace et al, 2024) and Diffusion-KTO (Yang et al, 2024), directly utilizes raw preference data without the need for explicit reward functions.…”