You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for sharing this great work!
Q1: I wonder where to use pair-wise distillation loss, apply it at the end of the encoder (for example, 1/16*HW feature map of ResNet) or apply it at every scale of the encoder ( 1/16, 1/8, 1/4...)?
Q2: Can pair-wise distillation work when Teacher's encoder and Student's encoder has different downsample rate, (eg. student downsample input 1/8, while teacher downsamples input 1/16), or decoder structure?
Q3: Can this method used to distill from VNL to structure like FastDepth (different with VNL-student in the decoder), because VNL-student may have heavy decoder.
The text was updated successfully, but these errors were encountered:
I‘m also confused about the distillation loss for the other two tasks, but especially about the pixel-wise loss.
The pixel-wise loss in the paper is for the segmentation task and is KL divergence, which is obviously not suitable for the depth task.
I really wonder how the pixel-wise loss is implemented, though the author explains this doesn't work for the depth task.
Thank you for sharing this great work!
Q1: I wonder where to use pair-wise distillation loss, apply it at the end of the encoder (for example, 1/16*HW feature map of ResNet) or apply it at every scale of the encoder ( 1/16, 1/8, 1/4...)?
Q2: Can pair-wise distillation work when Teacher's encoder and Student's encoder has different downsample rate, (eg. student downsample input 1/8, while teacher downsamples input 1/16), or decoder structure?
Q3: Can this method used to distill from VNL to structure like FastDepth (different with VNL-student in the decoder), because VNL-student may have heavy decoder.
The text was updated successfully, but these errors were encountered: