Skip to content

The rank_answer function in BLIP is different from that in ALBEF #208

Open
@littleFlyDance

Description

@littleFlyDance

Thank you for your wonderful paper. When I read the function rank_answer in blip_vqa.py, I find it different from that in ALBEF.
In BLIP, we calculate the log-likelihood of generating the entire answer sequence:

``
log_probs_sum = -output.loss
log_probs_sum = log_probs_sum.view(num_ques,k)

    max_topk_ids = log_probs_sum.argmax(dim=1) 
    max_ids = topk_ids[max_topk_ids>=0,max_topk_ids]

``

But in ALBEF, log-likelihood of predicting the second token from cls token is added:

``
answer_loss = output.loss
answer_loss = answer_loss.view(input_ids.size(0),-1)

    # topk_prob: first token probability
    topk_probs = topk_probs.view(-1,1)
    log_probs = torch.cat([topk_probs.log(), -answer_loss],dim=1)

    # re-calculate log probabilities for the answer sequences using chain rule
    log_probs_sum = log_probs.sum(1)
    log_probs_sum = log_probs_sum.view(num_ques,k)

    topk_probs = F.softmax(log_probs_sum, dim=-1)
    # get top-k after re-ranking
    topk_probs, rerank_id = topk_probs.topk(k,dim=1) 
    topk_ids = torch.gather(topk_ids, 1, rerank_id)    

``

Could you tell me why this is?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions