-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proper way to finetune on own data #56
Comments
Hi, sorry for the late reply. |
@AlessioTonioni Thanks for your answer! This helps. I have a few more quick questions: The SSIM values seem to oscillate while training and not going down. Do you think increasing the batch size to greater than 1, or setting either of shuffle to True, or augment to True would help? Not sure if these affect the finetuning process or not. The 3000 images I am training on are a sequence of images from a video feed. I can give this a try. The images are also captured using a robot moving in a straight line and the stereo camera facing the left side of the robot (as opposed to facing the front of the vehicle as in the Kitti dataset with images captured along the direction of the motion). For some stereo pairs, the left part of the left image is not captured in the right image. Would this be a problem when finetuning since the network tries to sample the left image from the right image? Do you recommend retraining using the unsupervised loss from scratch (since I don't have GT disparities) instead of initializing with the Kitti dataset weights? If so, do you have hyperparameters configuration that I can start experimenting with? Thanks! |
Yes, definitely use a bigger batch size and randomization of the data with shuffle and augment. If possible use also more data if the 3K frames are all coming from a single sequence.
That's the common case in stereo, it's an implicit problem related to occlusions, for that pixel no usefull training gradient can be computed.
No, but maybe try to start from the network trained on synthetic data if your scenario is quite different from KITTI (as it seems). |
Hello! Thanks for the great work. I have a question concerning the proper way to finetune the pretrained network with a custom stereo dataset (~3000 images ) with no groundtruth.
I am currently using Stereo_Online_Adaptation.py to finetune the provided network trained with MADNet/kitti with MODE=FULL, epochs=100, is_training=False, batch_size=1, shuffle=False, augment=False and using all 3000 images. My aim is not to have real-time stereo but to obtain a good disparity map as part of a postprocessing offline step for each of the 3000 images.
Are there any constraints on the disparity range that the network can detect (for example, the network seems to output a very low disparity if the object is close to the camera?)
I am passing a dummy GT disparity so should I expect any of (EPE, SSIM and bad3) to decrease?
Step:16800 bad3:0.95 EPE:66.88 SSIM:0.07 f/b time:0.077593 Missing time:5:39:12.643258
Step:16900 bad3:0.98 EPE:68.59 SSIM:0.10 f/b time:0.073424 Missing time:5:20:51.903022
Step:17000 bad3:0.95 EPE:68.69 SSIM:0.04 f/b time:0.073451 Missing time:5:20:51.445041
Step:17100 bad3:0.97 EPE:71.85 SSIM:0.05 f/b time:0.073457 Missing time:5:20:45.630999
Step:17200 bad3:0.98 EPE:88.16 SSIM:0.21 f/b time:0.073391 Missing time:5:20:21.049521
Step:17300 bad3:0.94 EPE:64.01 SSIM:0.03 f/b time:0.073562 Missing time:5:20:58.482237
Step:17400 bad3:0.98 EPE:91.38 SSIM:0.19 f/b time:0.073448 Missing time:5:20:21.286057
Step:17500 bad3:0.97 EPE:66.32 SSIM:0.04 f/b time:0.073532 Missing time:5:20:35.964231
Does this seem like the correct procedure to finetune the network? Do you have any other recommendations? Thanks
The text was updated successfully, but these errors were encountered: