You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
According to the paper, we need to train the coarse layers at first, and than fix it and train the refine part. And I think in this code, we only need turn the flag REFINE_TRAIN into False to do the first step and than REFINE_TRAIN=True to do the second step right?
However after I set REFINE_TRAIN=True, I found all variables from coarse network were still trainable. I think it is because the setting of trainable flag in the function '_variable_on_gpu' in model_part.py file is neglected.
Another problem is about the learning rate. According to the paper, learning rates are 0.001 for coarse convolutional layers 1-5, 0.1 for coarse full layers 6 and 7, 0.001 for fine layers 1 and 3, and 0.01 for fine layer 2. But the initial learning rates are set to 0.0001 for all layers in the code. With this learning rate, I cannot get a good result even after training for more than two days, compared with the performance mentioned in the paper. So I'm just wondering does anyone get a good result with this code and how to train the network to obtain such a result?
At last, thanks for providing such a clean and readable implementation :)
The text was updated successfully, but these errors were encountered:
Thank you for your comment.
Firstly, you are right regarding "_variable_on_gpu", it is a bug in my migration processing for the public version.
2nd, about learning rates, I'm not sure how to change learning rates each layer.
So If possible, could you tell me the way? but, as far as I remember, the paper does not use Adam optimizer. Adam can adjust learning rate for each weight.
Thanks.
According to the paper, we need to train the coarse layers at first, and than fix it and train the refine part. And I think in this code, we only need turn the flag REFINE_TRAIN into False to do the first step and than REFINE_TRAIN=True to do the second step right?
However after I set REFINE_TRAIN=True, I found all variables from coarse network were still trainable. I think it is because the setting of trainable flag in the function '_variable_on_gpu' in model_part.py file is neglected.
Another problem is about the learning rate. According to the paper, learning rates are 0.001 for coarse convolutional layers 1-5, 0.1 for coarse full layers 6 and 7, 0.001 for fine layers 1 and 3, and 0.01 for fine layer 2. But the initial learning rates are set to 0.0001 for all layers in the code. With this learning rate, I cannot get a good result even after training for more than two days, compared with the performance mentioned in the paper. So I'm just wondering does anyone get a good result with this code and how to train the network to obtain such a result?
At last, thanks for providing such a clean and readable implementation :)
The text was updated successfully, but these errors were encountered: