The rank_answer function in BLIP is different from that in ALBEF

Thank you for your wonderful paper. When I read the function _rank_answer_ in blip_vqa.py, I find it different from that in ALBEF.
In BLIP, we calculate the log-likelihood of generating the entire answer sequence:

``        
        log_probs_sum = -output.loss
        log_probs_sum = log_probs_sum.view(num_ques,k)

        max_topk_ids = log_probs_sum.argmax(dim=1) 
        max_ids = topk_ids[max_topk_ids>=0,max_topk_ids]
``

But in ALBEF, log-likelihood of predicting the second token from cls token is added:

``
        answer_loss = output.loss 
        answer_loss = answer_loss.view(input_ids.size(0),-1)
        
        # topk_prob: first token probability
        topk_probs = topk_probs.view(-1,1)
        log_probs = torch.cat([topk_probs.log(), -answer_loss],dim=1)

        # re-calculate log probabilities for the answer sequences using chain rule
        log_probs_sum = log_probs.sum(1)
        log_probs_sum = log_probs_sum.view(num_ques,k)

        topk_probs = F.softmax(log_probs_sum, dim=-1)
        # get top-k after re-ranking
        topk_probs, rerank_id = topk_probs.topk(k,dim=1) 
        topk_ids = torch.gather(topk_ids, 1, rerank_id)    
``

Could you tell me why this is？



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The rank_answer function in BLIP is different from that in ALBEF #208

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The rank_answer function in BLIP is different from that in ALBEF #208

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions