-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to train fast (grouped-conv) versions of csdarknet53 and csdarknet19 #6
Comments
@AlexeyAB Thanks, I will get free gpus after finish training |
@WongKinYiu Hi,
So you should combine two networks with
Suggestion:
Did you try to train with DropBlock, does it work well? |
Yes, If I change output channel of CSPResNet50 and CSPDarknet53 to 2048, I think it can achieve better results but with large amount of computation. Do you need an ImageNet pre-trained model which has The DropBlock models are still training. Currently, the models get a little bit lower accuracy then the models without DropBlock at same epoch. But it may because DropBlock need more epochs to get converge. |
@AlexeyAB Hello, The model with DropBlock gets lower accuracy than without it (79.8 vs 79.1). |
@WongKinYiu Hi, This is already done: https://github.com/AlexeyAB/darknet/blob/d51d89053afc4b7f50a30ace7b2fcf1b2ddd7598/src/dropout_layer_kernels.cu#L28-L31
So we should try:
|
@WongKinYiu Hi,
Try to train please these 2 models - both use:
Also did you try to train Also did you try to train |
OK, will train these two models. No, the inference speed of |
All with MISH-activation and 608x608 network resolution on GeForce RTX 2070:
So it may make sense to train the model |
I have only two free gpus currently, will train |
|
@WongKinYiu Thanks! So SpineNet-49 is worse than csdarknet53 and csresnext50 at least for ImageNet
Also I fixed label_smoothing for Detector (not for Classifier) AlexeyAB/darknet@81290b0 So you can try to train Detector with new label_smoothing. Usage
Since old label_smoothing worked well for the Classifier, but worked bad for the Detector. |
Yes, SpineNet-49 has fewer params and flops, but CSPDarkNet-53 faster and more accurate for Classifier. |
|
@WongKinYiu Thanks! Do you mean |
@AlexeyAB Oh, sorry, it is |
@WongKinYiu But I think |
Yes, I will get a free gpu after about 4 days. Thanks. |
@WongKinYiu
Will we try to test new label_smoothing for Detector? |
for classifier, for detector, i will also do ablation study for |
@AlexeyAB @WongKinYiu have you had any success with label smoothing? I just learned about it recently, but was confused about a few things:
|
But unfortunately, all of mixup, cosine lr, and label smoothing get worse results in my experiments. |
@WongKinYiu ah thanks, that's super informative! That solves a big mystery for me then. I tried to apply it to both obj loss and class loss at the same time, and it destroyed my NMS because every anchor single was above threshold (of 0.001). I implemented cosine lr scheduler a couple weeks ago, it worked well (+0.3 mAP) though I noticed it worked better if I raised the initial LR. Before with the traditional step scheduler I was using about lr0=0.006, now with the cosine scheduler I use lr0=0.010 to get that +0.3 increase on COCO.
|
@WongKinYiu see ultralytics/yolov3#238 (comment) for the cosine scheduler implementation. These are the training plots for the two runs (step and cos lr). Interestingly the val losses are better at the end with step, and you can see cos obj loss is starting to overtrain at the end, but the cos final mAP is still slightly higher. I'm not quite sure what that means. |
@WongKinYiu do you know what the value of epsilon should be in eqn3 of the BoF paper? If I assume epsilon=0.1 the classification target values (after a sigmoid) would be
Does that seem right?? |
In their case they seem to be using epsilon as |
It seems only YOLOv3 can apply label smooth. |
After how many iterations? Try to train without CBN. I noticed that CBN worse accuracy on most of my models. I train this cfg-file for 2300 iterations on MS COCO and don't get iou=0 or Nan loss: csresnext50sub-spp-asff-bifpn-rfb-db.cfg.txt (just to know there is label_smooth_eps=0.1, dynamic_minibatch=1, mosaic=1, BiFPN, ASFF, RFB, DropBlock - do not cause problems. |
about 40 iterations, now i change 608/64/64 back to 416/64/32 and still performs normal at 1500 iterations. update: becomes all zero at 3xxx iterations. |
@WongKinYiu Nice! Do you currently train CSResNext50-PANet and CSDarknet53-PANet with Mosaic,Genetic,Mish... which are based on the best of these models https://github.com/WongKinYiu/CrossStagePartialNetworks/blob/master/imagenet/results.md ? |
yes, the training of CSPResNext50-PANet-SPP with csresnext50-gamma.cfg pretrained model will finish in 1~2 weeks. |
@WongKinYiu Try to train with I successfully trained such model without Nan or zero-IoU: csresnext50sub-spp-asff-bifpn-rfb-db.cfg.txt
|
@AlexeyAB start training. |
Does BiFPN+ASFF+RFB+DB training go well without Nan/IoU=0? |
i resume training from 2k iterations several times when Nan/IoU=0 occurs, now training go to 7k iterations without Nan/IoU=0. |
@WongKinYiu This is strange, since I didn't get Nan/IoU=0 at all. |
@AlexeyAB hmm... i get IoU=0 3 times of this cfg, i already test previous cfg on cuda 9.0/10.0/10.1/10.2 before, all of training meet same situation. |
@WongKinYiu Maybe this is a temporary phenomenon, which itself will be corrected, and which should not be paid attention to reaching ~10,000 iterations? |
@AlexeyAB @WongKinYiu hi i want to use cspdarknet53-panet-spp in this repo readme for custom object training how many layers should i extract from weights file using |
@WongKinYiu Hi, What pre-trained should I use for CSPDarknet53-PANet-SPP model? Thanks |
Hello, which cfg do you want to use? |
@WongKinYiu Hi,
I already used CSPResNeXt50-PANet-SPP and I got a good result but the training time is high and I am going to use CSPDarknet53-PANet-SPP.
Which cfg is it good for this case? the accuracy is importance for me. Thanks |
for this case, you can use:
If your dataset is larger than mscoco, you can considerate using imagenet pretrained model (partial 104). If you hope the model converge quickly, you can use mscoco pretrained model (partial 135). |
@WongKinYiu Hi, My dataset is larger than mscoco. Can I use a 608 network size to get higher accuracy? Thanks |
does your dataset contains many small object? |
@WongKinYiu Hi, Yes, some objects are small. Where can I download the pre-trained for CSPDarknet53-PANet-SPP(Mish)? |
here #6 (comment) |
@WongKinYiu Hi, Thanks for the reply, are these command right to generate pre-trained?
darknet partial cfg/cd53paspp-omega.cfg cd53paspp-omega_final.weights cd53paspp-omega_final.weights.conv.104 104
darknet partial cfg/cd53paspp-omega.cfg cd53paspp-omega_final.weights cd53paspp-omega_final.weights.conv.135 135 |
No, the weights file of imagenet pretrained model is csdarknet53-omega_final.weights. |
@WongKinYiu Hi, Sorry, I thought that I have to make the pre-trained with the command. Thanks a lot |
@WongKinYiu Hi, I have a problem that sometimes some pictures are not detected or detected wrong. I attached my model and some images for testing. Could you please check it and guide me? I have about 2K images per class. Please give me some information about the hyperparameters for my case. Thanks in advance |
hello, how u calculate anchors? |
@WongKinYiu Hi, Thanks for the reply Did you test it?
darknet detector calc_anchors a.obj -num_of_clusters 9 -width 608 -height 608
please rename 1.txt to 1.zip. |
where we can add dropblock in yolov4 cfg?? |
@WongKinYiu Hi,
Since CSPDarkNet53 is better than CSPResNeXt50 for Detector, try to train these 4 models:
csdarknet19-fast.cfg
contains DropBlock, so use the latest version of Darknet that uses fast random-functions for DropBlock.The text was updated successfully, but these errors were encountered: