Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is LLAVA chat template correct? #5489

Open
1 task done
mibejjh opened this issue Sep 20, 2024 · 0 comments
Open
1 task done

Is LLAVA chat template correct? #5489

mibejjh opened this issue Sep 20, 2024 · 0 comments
Labels
pending This problem is yet to be addressed

Comments

@mibejjh
Copy link

mibejjh commented Sep 20, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

llamafactory 0.9.0
python 3.11.9

Reproduction

export finetuned any llava based model from webui with llava template

Expected behavior

In tokenizer_config.json in exported model,
chat_template looks like below, but there is no Image related keyword at all. It looks like vicuna's template.

"chat_template": "{% set system_message = 'A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user\\'s questions.' %}{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% endif %}{% if system_message is defined %}{{ system_message }}{% endif %}{% for message in loop_messages %}{% set content = message['content'] %}{% if message['role'] == 'user' %}{{ 'USER: ' + content + ' ASSISTANT:' }}{% elif message['role'] == 'assistant' %}{{ content + '</s>' }}{% endif %}{% endfor %}",

When I checked chat_template at 76 line of src/llamafactory/train/tuner.py (export_model()),
there are tokenizer and processor, but they have different chat templates.

tokenizer.chat_template, which is loaded from get_template_and_fix_tokenizer(), is same as above
while processor.chat_template is totally different like below, And I believe processor's chat template could be correct.

"{% for message in messages %}{% if message['role'] != 'system' %}{{ message['role'].upper() + ': '}}{% endif %}{# Render all images first #}{% for content in message['content'] | selectattr('type', 'equalto', 'image') %}{{ '<image>\n' }}{% endfor %}{# Render all text next #}{% if message['role'] != 'assistant' %}{% for content in message['content'] | selectattr('type', 'equalto', 'text') %}{{ content['text'] + ' '}}{% endfor %}{% else %}{% for content in message['content'] | selectattr('type', 'equalto', 'text') %}{% generation %}{{ content['text'] + ' '}}{% endgeneration %}{% endfor %}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ 'ASSISTANT:' }}{% endif %}"

I'm not sure which chat template is used during trainning and which one is for inference.

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

1 participant