Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add(shentao): add a recent work on segment-level dense RLHF #67

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

Shentao-YANG
Copy link
Contributor

Actually, I only change the following

### 2025
- [Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model](https://arxiv.org/abs/2501.02790)
  - Yueqin Yin, Shentao Yang, Yujia Xie, Ziyi Yang, Yuting Sun, Hany Awadalla, Weizhu Chen, and Mingyuan Zhou
  - Keyword: Segment-level Reward Model, Dense Reward RLHF Framework, Improved PPO training for LLMs
  - Code: [Official](https://github.com/yinyueqin/DenseRewardRLHF-PPO)

Not sure why it shows so many changes...

@PaParaZz1
Copy link
Member

Thanks for your contribution, please modify the name of this pull request as previous format. Then we will merge this PR.

For the weird so many changes problem, maybe you can reopen another PR with a new branch based on the latest main branch. This problem maybe be caused by some git operations like git rebase.

@Shentao-YANG Shentao-YANG changed the title Add a recent work on segment-level dense RLHF add(shentao): add a recent work on segment-level dense RLHF Jan 26, 2025
@Shentao-YANG
Copy link
Contributor Author

Thank you for the reply. I modified the name of this pull request.
My PR was already based on the latest main branch at the time of opening it. In case you cannot merge it due to git issues, maybe could you just paste my added item onto the main branch? (It is just 5 lines so this way could save our time.) Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.