Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why CrossEntropyLoss is zero,i #692

Open
aizhweiwei opened this issue Aug 6, 2024 · 2 comments
Open

why CrossEntropyLoss is zero,i #692

aizhweiwei opened this issue Aug 6, 2024 · 2 comments
Labels
type/question An issue that's a question

Comments

@aizhweiwei
Copy link

❓ The question

System/Peak GPU Memory (MB)=6,784

2024-08-06 09:59:26.181 intern-studio-160750:0 olmo.train:908 INFO [step=1/739328,epoch=0]
optim/total_grad_norm=231.7
train/CrossEntropyLoss=12.18
train/Perplexity=195,153
throughput/total_tokens=1,048,576
throughput/total_training_Gflops=5,103,640
throughput/total_training_log_Gflops=15.45
System/Peak GPU Memory (MB)=46,911
2024-08-06 10:00:05.520 intern-studio-160750:0 olmo.train:908 INFO [step=2/739328,epoch=0]
optim/total_grad_norm=0.0002
train/CrossEntropyLoss=1.7872662283480167e-06
train/Perplexity=1.000
throughput/total_tokens=2,097,152
throughput/total_training_Gflops=10,207,281
throughput/total_training_log_Gflops=16.14
throughput/device/tokens_per_second=26,668
throughput/device/batches_per_second=0.0254
System/Peak GPU Memory (MB)=53,695
2024-08-06 10:00:44.815 intern-studio-160750:0 olmo.train:908 INFO [step=3/739328,epoch=0]
optim/total_grad_norm=7.725906669975302e-08
train/CrossEntropyLoss=0.0
train/Perplexity=1.0000
throughput/total_tokens=3,145,728
throughput/total_training_Gflops=15,310,922
throughput/total_training_log_Gflops=16.54
throughput/device/tokens_per_second=26,676
throughput/device/batches_per_second=0.0254
2024-08-06 10:01:24.324 intern-studio-160750:0 olmo.train:908 INFO [step=4/739328,epoch=0]
optim/total_grad_norm=2.965892065276421e-08
train/CrossEntropyLoss=0.0
train/Perplexity=1.0000
throughput/total_tokens=4,194,304
throughput/total_training_Gflops=20,414,563
throughput/total_training_log_Gflops=16.83
throughput/device/tokens_per_second=26,630
throughput/device/batches_per_second=0.0254
2024-08-06 10:02:03.863 intern-studio-160750:0 olmo.train:908 INFO [step=5/739328,epoch=0]
optim/total_grad_norm=1.9301344522659747e-08
train/CrossEntropyLoss=0.0
train/Perplexity=1.0000
throughput/total_tokens=5,242,880
throughput/total_training_Gflops=25,518,204
throughput/total_training_log_Gflops=17.05
throughput/device/tokens_per_second=26,603
throughput/device/batches_per_second=0.0254

@aizhweiwei aizhweiwei added the type/question An issue that's a question label Aug 6, 2024
@aizhweiwei
Copy link
Author

torchrun --nproc_per_node=1 scripts/train.py configs/official/OLMo-0.4B.yaml --save_overwrite

@2015aroras
Copy link
Collaborator

It's hard to say without seeing the config. My guess would be that you're training on a single batch/instance, which the model can learn almost immediately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/question An issue that's a question
Projects
None yet
Development

No branches or pull requests

2 participants