Search Articles

Overview #

Paper link
LLMs are fine-tuned using RLHF for alignment
- This has not been widely explored in text-to-image models
DPO was recently formulated as a simpler alternative to RLHF
- The policy