You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, I would like to thank you for sharing your work.
I have read both the paper and the code and I found some parts that I cannot understand so I would like to ask some questions here.
About the heatmap loss, in the Eq. 6 of paper, it is written that $Wa$ is a weight such that the loss around the area that has high correlation to the input condition has higher priority factors.
From my understanding from Figure 2, it seems that the heatmap is used for simple multiplication or simple mask.
But, after checking the code, it seems that the obtained heatmap is not directly used as simple mask. After heatmap is obtained, the heatmap is passed to VAE encoder as shown here:
Why is it necessary to pass the obtained heatmap to VAE encoder?
Why do you need 1+ in loss_simple = torch.mul(self.get_loss(model_output, target, mean=False),(1+self.pose_loss_weight*back_to_embed_pose_add_weight)).mean([1, 2, 3])
About obtaining the heatmap as shown in this part of the code.
The way the heatmap is calculated makes the pixel which are greater than threshold has value of zero and otherwise. I thought that the normal way is to assign 1 to pixels where value is greater than threshold. Why the other way is performed here?
I would really appreciate it if you could guide me to understand your work more correctly.
Thank you very much.
The text was updated successfully, but these errors were encountered:
Dear the authors of HumanSD.
First of all, I would like to thank you for sharing your work.
I have read both the paper and the code and I found some parts that I cannot understand so I would like to ask some questions here.
About the heatmap loss, in the Eq. 6 of paper, it is written that$Wa$ is a weight such that the loss around the area that has high correlation to the input condition has higher priority factors.
From my understanding from Figure 2, it seems that the heatmap is used for simple multiplication or simple mask.
But, after checking the code, it seems that the obtained heatmap is not directly used as simple mask. After heatmap is obtained, the heatmap is passed to VAE encoder as shown here:
HumanSD/ldm/models/diffusion/ddpm.py
Line 2011 in c5db29d
After that, the obtained embedding is used to mask the loss here:
HumanSD/ldm/models/diffusion/ddpm.py
Line 2026 in c5db29d
My questions are the following:
loss_simple = torch.mul(self.get_loss(model_output, target, mean=False),(1+self.pose_loss_weight*back_to_embed_pose_add_weight)).mean([1, 2, 3])
HumanSD/ldm/models/diffusion/ddpm.py
Line 1998 in c5db29d
The way the heatmap is calculated makes the pixel which are greater than threshold has value of zero and otherwise. I thought that the normal way is to assign 1 to pixels where value is greater than threshold. Why the other way is performed here?
I would really appreciate it if you could guide me to understand your work more correctly.
Thank you very much.
The text was updated successfully, but these errors were encountered: