Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
jordan7186 authored May 29, 2024
1 parent a263358 commit 740064e
Show file tree
Hide file tree
Showing 39 changed files with 2,771 additions and 1 deletion.
152 changes: 151 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,151 @@
# kangnn-experiment
# Experiments on using Kolmogorov-Arnold Networks (KAN) on Graph Learning

This repository contains some quick experimental results for comparing the performance of MLP, GN (GCN), KAN, and KAN+GNN on several benchmark datasets on graph learning (specifically, node classfication).

## TL;DR (for now)
- **Using KANs or KAN + GNNs usually introduces a lot of model parameters.** This makes really skeptical to use KANs or KAN+GNNs compared to MLPs or GNNs. **(Perhaps we need a more effective way to merge KANs with GNNs)**
- Make the model (especially the KAN part) as light as possible.
- KAN+GNN generally performs great on homophilic datasets, but really suffers on heterophilic datasets (even worse than GCNs).
- KANs shines more on heterophilic datasets.
- Learning rate is the most important hyperparameter for KANs and KAN+GNNs.


## KAN and KAN+GNN (with reference to the original repo)
To build KAN and KAN+GNN, I have used the implementation of [Efficient-KAN](https://github.com/Blealtan/efficient-kan) for all KAN and KAN+GNN experiments. For KAN+GNN, I have combined the Efficient-KAN with [GraphKAN](https://github.com/WillHua127/GraphKAN-Graph-Kolmogorov-Arnold-Networks), which defines each KAN+GNN layer with (KAN $\rightarrow$ `torch.sparse.spmm` with the adjacency matrix). The detailed settings are all set as default unless mentinoed explicitly. The utility functions including data splits are also from [GraphKAN](https://github.com/WillHua127/GraphKAN-Graph-Kolmogorov-Arnold-Networks). (I do not claim any ownership of the Efficient-KAN and GraphKAN code.)

## Datasets
The following datasets are used in the experiments:
- `Cora`
- `Citeseer`
- `Pubmed`
- `Cornell`
- `Texas`
- `Wisconsin`

Note that `Cora`, `Citeseer`, and `Pubmed` are homophilic, while `Cornell`, `Texas`, and `Wisconsin` are heterophilic datasets.

## Hyperparameter tuning
The following hyperparameters are tuned for each model. For all cases, the maximum number of epochs is set to 1000 except for GNNs. For KAN and KAN+GNN, I have also considered the option of projecting the input features to the hidden dimension as the first step
### MLP
- Hidden dim: [16, 32, 64]
- Num. layers: [1, 2, 3]
- Learning rate: [0.01, 0.001, 0.0001]
### KAN
- Hidden dim: [16, 32, 64]
- Num. layers: [1, 2]
- Project with MLP to hidden dim as the first step (`Proj`): [True, False]
- Learning rate: [0.1, 0.01, 0.001, 0.0001]
### GNN
- Architecture: GCN
- Hidden dim: [16, 32, 64]
- Num. layers: [1, 2, 3]
- Learning rate: [0.1, 0.01, 0.001, 0.0001]
### KAN+GNN
- Hidden dim: [16, 32, 64]
- Num. layers for KAN in each layer: [1, 2]
- Num. layers for message passing (`spmm`) in each layer: [1, 2, 3]
- Project with MLP to hidden dim as the first step (`Proj`): [True, False]
- Learning rate: [0.1, 0.01, 0.001, 0.0001]

## Result 1: Best performers

Results after hyperparameter tuning for different datasets.

- KAN+GNN generally performs great on homophilic datasets, but really suffers on heterophilic datasets (even worse than GCNs).
- KANs shines more on heterophilic datasets.
- Using KANs or KAN + GNNs usually introduces a lot of model parameters. This makes really skeptical to use KANs or KAN+GNNs compared to MLPs or GNNs.

### Cora
| Model | Validation accuracy | Test accuracy | Number of parameters | Best epoch | Hidden dim | Num. layers | Learning rate |
| --- | --- | --- | --- | --- | --- | --- | --- |
| MLP | 0.712177 | 0.737274 | 10,038 | 2 | 16 | 1 | 0.1 |
| KAN | 0.804428 | 0.760263 | 921,600 (`Proj`=`False`) | 84 | 64 | 2 | 0.001 |
| GCN | 0.889299 | 0.866995 | 95,936 | 18 | 64 | 2 | 0.1 |
| KAN+GNN | **0.907749** | **0.875205** | 458,560 (`Proj`=`False`) | 105 | 32 | 1 (KAN) / 1 (spmm) | 0.1 |

### Citeseer
| Model | Validation accuracy | Test accuracy | Number of parameters | Best epoch | Hidden dim | Num. layers | Learning rate |
| --- | --- | --- | --- | --- | --- | --- | --- |
| MLP | 0.760902 | 0.723056 | 22,224 | 3 | 16 | 1 | 0.1 |
| KAN | 0.801504 | 0.757162 | 593,440 (`Proj`=`False`) | 65 | 16 | 2 | 0.01 |
| GCN | **0.831579** | **0.815825** | 119,584 | 38 | 32 | 2 | 0.01 |
| KAN+GNN | **0.831579** | 0.809004 | 458,560 (`Proj`=`False`) | 104 | 64 | 1 (KAN) / 1 (spmm) | 0.1 |

### Pubmed
| Model | Validation accuracy | Test accuracy | Number of parameters | Best epoch | Hidden dim | Num. layers | Learning rate |
| --- | --- | --- | --- | --- | --- | --- | --- |
| MLP | 0.890439 | 0.885932 | 36,675 | 80 | 64 | 3 | 0.001 |
| KAN | 0.884098 | 0.881115 | 80,480 (`Proj`=`False`) | 319 | 16 | 2 | 0.01 |
| GCN | 0.887649 | 0.864639 | 8,560 | 191 | 16 | 3 | 0.1 |
| KAN+GNN | **0.906416** | **0.905703** | 80,480 (`Proj`=`False`) | 330 | 16 | 1 (KAN) / 2 (spmm) | 0.01 |


### Cornell
| Model | Validation accuracy | Test accuracy | Number of parameters | Best epoch | Hidden dim | Num. layers | Learning rate |
| --- | --- | --- | --- | --- | --- | --- | --- |
| MLP | 0.918919 | **0.914894** | 27,381 | 37 | 16 | 2 | 0.001 |
| KAN | **0.972973** | 0.829787 | 1,093,120 (`Proj`=`False`) | 46 | 64 | 2 | 0.001 |
| GCN | 0.810811 | 0.723404 | 27,536 | 5 | 16 | 2 | 0.1 |
| KAN+GNN | 0.891892 | 0.617021 | 275,840 (`Proj`=`False`) | 78 | 16 | 1 (KAN) / 3 (spmm) | 0.001 |



### Wisconsin
| Model | Validation accuracy | Test accuracy | Number of parameters | Best epoch | Hidden dim | Num. layers | Learning rate |
| --- | --- | --- | --- | --- | --- | --- | --- |
| MLP | **0.98** | **0.9125** | 109,509 | 4 | 64 | 2 | 0.1 |
| KAN | **0.98** | **0.9125** | 546,560 (`Proj`=`False`) | 39 | 32 | 2 | 0.01 |
| GCN | 0.84 | 0.6125 | 55,584 | 3 | 32 | 2 | 0.1 |
| KAN+GNN | 0.82 | 0.65 | 32,368 (`Proj`=`True`) | 148 | 16 | 2 | 0.001 |



### Texas
| Model | Validation accuracy | Test accuracy | Number of parameters | Best epoch | Hidden dim | Num. layers | Learning rate |
| --- | --- | --- | --- | --- | --- | --- | --- |
| MLP | 0.972973 | **0.852459** | 54,757 | 48 | 32 | 2 | 0.01 |
| KAN | **1.0** | 0.704918 | 1,093,120 (`Proj`=`False`) | 23 | 64 | 2 | 0.01 |
| GCN | 0.918919 | 0.754098 | 55,584 | 25 | 32 | 2 | 0.0001 |
| KAN+GNN | 0.918919 | 0.737705 | 74,976 (`Proj`=`True`) | 1 | 32 | 2 | 0.1 |


## Result 2 (SHAP analysis): Rule of thumb on hyperparameter settings

For this, I fit an XGBoost model to predict the test performance of each model based on the hyperparameters. Then, I have used the SHAP values to get the 'imporatnce' of each hyperparameter. Some trends are:

### KAN+GNN
- Learning rate is the hyperparameter to tune if you want the most bang for the buck.
- The number of KAN per layers is more important than the number of message passing layers. In general, make the model as light as possible.

Figure: SHAP analysis for Cora on KAN + GNN
![alt text](/kangnn_experiment/images/Cora_KANGNN_SHAP.png "Cora_KANGNN_SHAP")

### KAN
- Similar to KAN+GNN, learning rate is the most important hyperparameter.
- Also similar to KAN+GNN, make the KAN as light as possible.

Figure: SHAP analysis for Citeseer on KAN
![alt text](/kangnn_experiment/images/Citeseer_KAN_SHAP.png "Citeseer_KAN_SHAP")

## Result 3: Test performance vs. Number of parameters

I have also plotted the test performance vs. the number of parameters for all cases during hyperparameter tuning. Some notes on the figure:

- I have used the log scale for the x-axis (number of parameters) to make the plot more readable.
- During tuning, there may be some cases where there may have multiple models with the same number of parameters. In such cases, I have highlighted the best performer with the most non-transparent color.

Here are some observations:

- In general, it is very easy to build a heavy model using KANs or KAN+GNNs.
- For homophilic datasets, introduce GNNs to the mix. The performance usually depends on the specific dataset.
- For heterophilic datasets, non-GNN types (MLP, KAN) usually perform better with a larger margin.

Figure: Test performance vs. Number of parameters for Cora
![alt text](/kangnn_experiment/images/Cora_Param.png "Cora_Param.png")

Figure: Test performance vs. Number of parameters for Wisconsin
![alt text](/kangnn_experiment/images/Wisconsin_Param.png "Wisconsin_Param.png")

## Note

This is an ongoing investigation, and some results may change in the future. Thanks to all the authors of the Efficient-KAN and GraphKAN repositories for their awesome work!
Binary file added images/Citeseer_GCN_SHAP.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Citeseer_KANGNN_SHAP.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Citeseer_KAN_SHAP.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Citeseer_MLP_SHAP.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Citeseer_Param.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Cora_GCN_SHAP.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Cora_KANGNN_SHAP.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Cora_KAN_SHAP.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Cora_MLP_SHAP.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Cora_Param.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Cornell_GCN_SHAP.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Cornell_KANGNN_SHAP.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Cornell_KAN_SHAP.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Cornell_MLP_SHAP.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Cornell_Param.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Pubmed_GCN_SHAP.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Pubmed_KANGNN_SHAP.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Pubmed_KAN_SHAP.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Pubmed_MLP_SHAP.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Pubmed_Param.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Texas_GCN_SHAP.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Texas_KANGNN_SHAP.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Texas_KAN_SHAP.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Texas_MLP_SHAP.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Texas_Param.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Wisconsin_GCN_SHAP.png
Binary file added images/Wisconsin_KANGNN_SHAP.png
Binary file added images/Wisconsin_KAN_SHAP.png
Binary file added images/Wisconsin_MLP_SHAP.png
Binary file added images/Wisconsin_Param.png
Loading

0 comments on commit 740064e

Please sign in to comment.