The training loss #53

Vickeyhw · 2023-11-28T17:46:51Z

Thanks for your great work! When I run the code use:
python3 tools/train.py --cfg configs/imagenet/r34_r18/dot.yaml
The training loss is much larger than the kd method in the first few epochs, and the test acc is also low, is it normal?

The text was updated successfully, but these errors were encountered:

Zzzzz1 · 2023-11-29T06:36:20Z

The loss scale is too large. Did you change the batch-size or num-gpus?

Vickeyhw · 2023-11-29T07:49:25Z

@Zzzzz1 I use the original batch size 512 on 8 2080ti. After re-ran the code, I got the following results:

It seems still unstable and much worse than the vannila kd.

JinYu1998 · 2023-11-30T12:03:51Z

@Vickeyhw How long does it take you to run an epoch please, I find it very strange that it takes me 100 minutes to run a 1/4 Epoch on 8*3090.

Vickeyhw · 2023-11-30T14:24:35Z

@JinYu1998 23min/epoch.

JinYu1998 · 2023-11-30T14:27:50Z

@JinYu1998 23min/epoch.

Thanks for your response, I think I've identified the problem. Since my data is not on SSD, the io issue is causing slow training...

Vickeyhw changed the title ~~The training loss curve~~ The training loss Nov 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The training loss #53

The training loss #53

Vickeyhw commented Nov 28, 2023

Zzzzz1 commented Nov 29, 2023

Vickeyhw commented Nov 29, 2023

JinYu1998 commented Nov 30, 2023

Vickeyhw commented Nov 30, 2023

JinYu1998 commented Nov 30, 2023

The training loss #53

The training loss #53

Comments

Vickeyhw commented Nov 28, 2023

Zzzzz1 commented Nov 29, 2023

Vickeyhw commented Nov 29, 2023

JinYu1998 commented Nov 30, 2023

Vickeyhw commented Nov 30, 2023

JinYu1998 commented Nov 30, 2023