Major bug: Chat template is not actually applied in run_sft.py and run_dpo.py

  In run_sft.py and run_dpo.py, it says that it applies the chat template. But this is not actually done.
    
In the code below, column_names contains all the names of the columns, and this is given as a remove_columns argument to the mpa function. This means that all columns are skipped and the map is not applied at all. I actually tested by adding prints inside the apply_chat_template and I observed no printout at all, unless I removed the "remove_columns=column_names" in which case I was able to see printouts. Thus, the map is not applied at all and I wonder how it can even work.

   column_names = list(raw_datasets["train"].features)
   #####################
    # Apply chat template
    #####################
    raw_datasets = raw_datasets.map(
        apply_chat_template,
        fn_kwargs={"tokenizer": tokenizer, "task": "dpo"},
        num_proc=data_args.preprocessing_num_workers,
        remove_columns=column_names,
        desc="Formatting comparisons with prompt template",
    )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Major bug: Chat template is not actually applied in run_sft.py and run_dpo.py #125

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Major bug: Chat template is not actually applied in run_sft.py and run_dpo.py #125

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions