Question about contrastive distillation loss #17

SkrighYZ · 2024-01-22T16:59:40Z

Hi,

I have a few questions about the simclr code.

Line 21 in b5b0929

logits = torch.einsum("if, jf -> ij", p, z) / temperature

It seems that the predicted features (p) are not in the negatives, which is different from what's suggested in the paper (appendix B). I understand that you switch p and z here (for a symmetric loss?)

cassle/cassle/distillers/contrastive.py

Lines 65 to 68 in b5b0929

    
           distill_loss = ( 
        
               simclr_distill_loss_func(p1, p2, frozen_z1, frozen_z2, self.distill_temperature) 
        
               + simclr_distill_loss_func(frozen_z1, frozen_z2, p1, p2, self.distill_temperature) 
        
           ) / 2

but there is still no comparisons between different samples in p.

In the paper the distillation loss is applied to the two views independently. Based on the code above, does it mean that we should use them jointly to reproduce the result?

cassle/cassle/losses/simclr.py

Lines 30 to 33 in b5b0929

    
           logit_mask = torch.ones_like(pos_mask, device=device) 
        
           logit_mask.fill_diagonal_(True) 
        
           logit_mask[:, b:].fill_diagonal_(True) 
        
           logit_mask[b:, :].fill_diagonal_(True)

The four lines of code here seem to make logit_mask an all-ones matrix. In my understanding we should assign the diagonals to False. Am I missing something?

TIA

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about contrastive distillation loss #17

Question about contrastive distillation loss #17

SkrighYZ commented Jan 22, 2024 •

edited

Loading

Question about contrastive distillation loss #17

Question about contrastive distillation loss #17

Comments

SkrighYZ commented Jan 22, 2024 • edited Loading

SkrighYZ commented Jan 22, 2024 •

edited

Loading