Open
Description
Thank you for your wonderful paper. When I read the function rank_answer in blip_vqa.py, I find it different from that in ALBEF.
In BLIP, we calculate the log-likelihood of generating the entire answer sequence:
``
log_probs_sum = -output.loss
log_probs_sum = log_probs_sum.view(num_ques,k)
max_topk_ids = log_probs_sum.argmax(dim=1)
max_ids = topk_ids[max_topk_ids>=0,max_topk_ids]
``
But in ALBEF, log-likelihood of predicting the second token from cls token is added:
``
answer_loss = output.loss
answer_loss = answer_loss.view(input_ids.size(0),-1)
# topk_prob: first token probability
topk_probs = topk_probs.view(-1,1)
log_probs = torch.cat([topk_probs.log(), -answer_loss],dim=1)
# re-calculate log probabilities for the answer sequences using chain rule
log_probs_sum = log_probs.sum(1)
log_probs_sum = log_probs_sum.view(num_ques,k)
topk_probs = F.softmax(log_probs_sum, dim=-1)
# get top-k after re-ranking
topk_probs, rerank_id = topk_probs.topk(k,dim=1)
topk_ids = torch.gather(topk_ids, 1, rerank_id)
``
Could you tell me why this is?
Metadata
Metadata
Assignees
Labels
No labels