Any plan for supporting DPO? #846

lorabit110 · 2024-01-08T19:45:08Z

🚀 Feature Request

Support DPO (Direct Preference Optimization) loss and data loader.

Motivation

Many recent open LLMs have achieved promising results from using DPO instead of RL-style tuning like PPO for alignment. And it seems to require less changes to llm-foundry than RLHF.

pretidav · 2024-05-09T14:50:23Z

same question here

lorabit110 added the enhancement New feature or request label Jan 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any plan for supporting DPO? #846

Any plan for supporting DPO? #846

lorabit110 commented Jan 8, 2024

pretidav commented May 9, 2024

Any plan for supporting DPO? #846

Any plan for supporting DPO? #846

Comments

lorabit110 commented Jan 8, 2024

🚀 Feature Request

Motivation

pretidav commented May 9, 2024