Tag: RL

Rethinking the Role of PPO in RLHF – The Berkeley Artificial Intelligence Research Blog

Rethinking the Role of PPO in RLHF TL;DR: In RLHF, there’s tension between the reward learning phase, which uses human

Training Diffusion Models with Reinforcement Learning replay Diffusion models have recently emerged as the de facto standard for generating complex,