Description
In run_sft.py and run_dpo.py, it says that it applies the chat template. But this is not actually done.
In the code below, column_names contains all the names of the columns, and this is given as a remove_columns argument to the mpa function. This means that all columns are skipped and the map is not applied at all. I actually tested by adding prints inside the apply_chat_template and I observed no printout at all, unless I removed the "remove_columns=column_names" in which case I was able to see printouts. Thus, the map is not applied at all and I wonder how it can even work.
column_names = list(raw_datasets["train"].features)
#####################
# Apply chat template
#####################
raw_datasets = raw_datasets.map(
apply_chat_template,
fn_kwargs={"tokenizer": tokenizer, "task": "dpo"},
num_proc=data_args.preprocessing_num_workers,
remove_columns=column_names,
desc="Formatting comparisons with prompt template",
)