reinforcement learning improvement #31

Ski-ing · 2024-07-12T09:40:26Z

How significant is the improvement in code generation performance metrics attributed to the Group Relative Policy Optimization (GRPO) within the reinforcement learning component?

DeepSeekPH · 2024-07-15T03:18:24Z

The performance of GRPO varies depending on the test sets. Generally, GRPO demonstrates an improvement of approximately 0.5 points on code generation test sets. The enhancements on math-related benchmarks are more substantial.

Ski-ing · 2024-07-15T09:17:27Z

Thanks for your reply

pkuzqh closed this as completed Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reinforcement learning improvement #31

reinforcement learning improvement #31

Ski-ing commented Jul 12, 2024

DeepSeekPH commented Jul 15, 2024

Ski-ing commented Jul 15, 2024

reinforcement learning improvement #31

reinforcement learning improvement #31

Comments

Ski-ing commented Jul 12, 2024

DeepSeekPH commented Jul 15, 2024

Ski-ing commented Jul 15, 2024