You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank for the excellent work and the well written code! But it seems that there is a minor error in class DenseAtt(nn.Module), which is used in the attention based aggregation part.
If we set "--use-att 1 --local-agg 1", which means that the algorithm will use the attention mechanism to update the embedding of the node, we will use class DenseAtt(nn.Module) .
However, there seems something wrong in line 26 here here
According to the formula (8) in sec 4.3 of the original paper, we should use the softmax function to compute the attention weight, but the code here here seems use the sigmoid function and multiplies with the adjacent matrix.
The text was updated successfully, but these errors were encountered:
Thank you for bringing this up. I just found this issue now while going through the code. Was hoping there might be a reason for this.
Also, since we are working with local aggregation, the sigmoid bit (supposed to be softmax) would need to use only the neighbours of each node for calculation. So non-neighbours can be masked with large values before passing to softmax, which would result in zero for those values.
I find that set "--use-att 1 --local-agg 1", the result is far lower that the original paper,and will consume more time.But with "--use-att 1 --local-agg 0", the result is right with the paper.Why will this happen?
Thank for the excellent work and the well written code! But it seems that there is a minor error in class DenseAtt(nn.Module), which is used in the attention based aggregation part.
If we set "--use-att 1 --local-agg 1", which means that the algorithm will use the attention mechanism to update the embedding of the node, we will use class DenseAtt(nn.Module) .
However, there seems something wrong in line 26 here here
According to the formula (8) in sec 4.3 of the original paper, we should use the softmax function to compute the attention weight, but the code here here seems use the sigmoid function and multiplies with the adjacent matrix.
The text was updated successfully, but these errors were encountered: