CS224W - Bag of Tricks for Node Classification with GNN - Non interactive GAT #9832
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add
interactive_attn
parameter toGATConv
andGATv2Conv
.Part of #9831, this allows “Non-interactive” attention as described in “Bag of Tricks for Node Classification with Graph Neural Networks”, where target node features are not used for computing attention coefficients.
Details
GATConv
by passing(x, None)
to forward, however this methodGATv2Conv
kwargs
models.GAT
in_channels
can be used with non-bipartite graphs. E.g. users might want a separate set of features when a node is a source vs target.s
andt
in the notation asi
andj
are respectively the target and source nodes with the defaultflow
. This is also more consistent withGATConv
’slin_src
andlin_dst
. Similarly, according to the message passing documentation,flow
, though the papers use the opposite notation.GATConv.forward
by reusing the tuplein_channels
path wheniteractive_attn=False
andin_channels
is an integer, but this could be confusingBenchmarks
The paper does not provide any metrics or performance references.
benchmarks/citation
reveals non-interactive is faster by ~2-16% on Colab's T4s with typically no effect on performance. It may be slightly worse on Cora v1+no_random_splits and v2+random_splits but slightly better on PubMed v1+random_splits.Deltas below are expressed in the direction
interactive - non_interactive
. For Arxiv we used the default hyperparameters frombenchmarks/citation/gat.py
with a batch norm inserted, and for v2 the same as v1, though these are surely suboptimal settings.Full metrics: [interactive] [non-interactive]