feat(examples): release an example implementation of T-V reward model. #2

calico-1226 · 2024-07-04T17:23:52Z

Description

We provide an example implementation of training a preference predictor reward model on our dataset. This model translates abstract human values into quantifiable and optimizable scalar metrics. Consequently, the reward model can partially replace human evaluators in assessing outputs from video generation models and act as a supervisory signal to enhance the performance of these models.

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.
You can use the syntax close #15213 if this solves the issue #15213

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

My change requires a change to the documentation.
I have updated the tests accordingly. (required for a bug fix or a new feature)
I have updated the documentation accordingly.
I have reformatted the code using make format. (required)
I have checked the code using make lint. (required)
I have ensured make test pass. (required)

Co-authored-by: linghaiCTL <[email protected]>

feat(examples): release an example implementation of T-V reward model.

fe7513c

Co-authored-by: linghaiCTL <[email protected]>

calico-1226 added the enhancement New feature or request label Jul 4, 2024

calico-1226 self-assigned this Jul 4, 2024

chore(conda-recipe.yaml): add conda-recipe.yaml for installation

49994a5

calico-1226 force-pushed the rm branch from b837fb0 to 49994a5 Compare July 4, 2024 18:35

calico-1226 added 2 commits July 5, 2024 02:43

chore(.github): remove duplicate lint checks

7a49e00

chore(Makefile): ignore files from other projects when run addlicense

ccf29a0

calico-1226 merged commit d0b8344 into PKU-Alignment:main Jul 4, 2024
1 check passed

calico-1226 deleted the rm branch July 4, 2024 19:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(examples): release an example implementation of T-V reward model. #2

feat(examples): release an example implementation of T-V reward model. #2

calico-1226 commented Jul 4, 2024

feat(examples): release an example implementation of T-V reward model. #2

feat(examples): release an example implementation of T-V reward model. #2

Conversation

calico-1226 commented Jul 4, 2024

Description

Motivation and Context

Types of changes

Checklist