Sample packing masks the <end_of_turn> token #2259
-
Beta Was this translation helpful? Give feedback.
Answered by
NanoCode012
Jan 16, 2025
Replies: 3 comments 7 replies
-
Hello! Yes, that could be one possible reason behind your issue. Could I see how your dataset config yaml looks like and perhaps a sample demo row (could be fake data) of your dataset? |
Beta Was this translation helpful? Give feedback.
0 replies
-
# model specific
base_model: mistralai/Mistral-7B-v0.3 # it was some internal models but mistral should also repro the bug
chat_template: gemma
# hot hyperparameters
output_dir: ./outputs/mistral-gemma-it
sequence_len: 4096
sample_packing: true
gradient_accumulation_steps: 1
micro_batch_size: 2
learning_rate: 1e-5
# dataset -- can be anything that uses OpenAI chat format / role: ..., content: ...
datasets:
- path: [PLACE HOLDER]
type: chat_template
field_messages: messages
trust_remote_code: true
# utility
resume_from_checkpoint:
logging_steps: 10
warmup_steps: 100
max_grad_norm: 1.0
save_strategy: "no" # only save the final checkpoint
# trivial
flash_attention: true
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
bf16: true
tf32: false
pad_to_sequence_len: true # mem stability
train_on_inputs: false
seed: 666
num_epochs: 1
optimizer: adamw_torch_fused
lr_scheduler: cosine |
Beta Was this translation helpful? Give feedback.
7 replies
-
I dont have a good public dataset in mind but anything that uses OpenAI message formats can repro this issue. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Yes, my bad. I just re-checked code on this.
Could you change the EOS token to :