Skip to content

Latest commit

 

History

History
23 lines (20 loc) · 1.79 KB

Graph Attention Network.md

File metadata and controls

23 lines (20 loc) · 1.79 KB

Key ideas

  • Apply different weights to different nodes in the neighborhood without prior knowledge of the graph's structure
  • Spectral vs non-spectral approaches to graph convolutions
    • Spectral filters learn on Laplacian eigenbasis, hence they are dependent on graph structure
    • Non-spectral filters use convolution difrectly on the graph operating as close neighbors
      • Sampling a fixed-size neighborhood (weighted for attention, self link for self-attention) and aggregatin3 Attention g

Attention Layer

  • input: node features h={h1,h2,h3... (vectors)}
  • output: ditto
  • At least one linear transformation is required to have higher level features. A weight matrix 'W' can be used for this.
  • Screenshot 2022-05-27 at 16 43 31
  • Screenshot 2022-05-27 at 16 43 43
  • Where eij is the importance of node j's features to node i
    • we only compute it for nodes j in i's neighborhood.
  • Also || is concatenation
  • To stabilize the process of self-attention we have found extending one mechanism to employ multi-head attention

Screenshot 2022-05-27 at 16 45 49

Screenshot 2022-05-27 at 16 46 10

  • Highly efficient as computing attention can be parallelized across multiple edges in O(|V|FF' + |E|F') where F is # of node features.