- Apply different weights to different nodes in the neighborhood without prior knowledge of the graph's structure
- Spectral vs non-spectral approaches to graph convolutions
- Spectral filters learn on Laplacian eigenbasis, hence they are dependent on graph structure
- Non-spectral filters use convolution difrectly on the graph operating as close neighbors
- Sampling a fixed-size neighborhood (weighted for attention, self link for self-attention) and aggregatin3 Attention g
- input: node features h={h1,h2,h3... (vectors)}
- output: ditto
- At least one linear transformation is required to have higher level features. A weight matrix 'W' can be used for this.
- Where eij is the importance of node j's features to node i
- we only compute it for nodes j in i's neighborhood.
- Also || is concatenation
- To stabilize the process of self-attention we have found extending one mechanism to employ multi-head attention
- Highly efficient as computing attention can be parallelized across multiple edges in O(|V|FF' + |E|F') where F is # of node features.