Skip to content

radinhamidi/A-Robust-Neural-Approach-for-Searching-over-Incomplete-Graphs

Repository files navigation

Robust Neural Approach for Searching over Incomplete Graphs

Model Architecture

The task of searching over large keyword graphs aims to identify a subgraph where the nodes collectively cover the input query keywords. Although finding an exact solution to this problem is NP-hard, we address it by proposing a novel graph neural network representation learning technique specifically tailored for graphs with missing information. We propose a novel keyword graph representation learning method that incorporates complementary aspects of graphs: global, local, adjusted, and feature semantics. Considering these multiple aspects, our approach remains robust and resilient to missing information. We adopt and fine-tune a transformer-based model to aggregate the various features of a graph to generate rich representations, recognizing the pivotal role of keywords in this task. We show through experiments on real-world data that our method outperforms the state-of-the-art approaches and is particularly robust in the face of missing values, underscoring its ability to effectively handle incomplete graphs.

Requirements

These are the most important libraries used to implement the methods. You need to install them using the pip or conda command before running the codes:

Numpy
Math
Pytorch 1.9.x +
Transformers
Torch Geometric
Tqdm

Please take note that some miscellaneous libraries may be needed to run the codes.

Dataset

We use CiteSeer, Amazon Video, Amazon Toy and DBLP datasets for evaluation. Datasets, along its preprocessed version, are available in dataset directory.

Run The Code

After preparing the input data, the proposed method and baselines can be run together. You can then simply run the following command in the terminal:

 `python <train.py>`

* Take note that parameters are initiated in the args.py file. You can customize your configuration there.

After running each model, the average performance for 20 runs will be saved and the results will be shown at the end when all methods are evaluated.

Results

Here are the sample results that you will get by running the code for Hit@100 without missing edges:

Datasets $r_{w}$ 30% Missing Keywords 50% Missing Keywords 70% Missing Keywords
$n_{q}$ 3 5 7 9 3 5 7 9 3 5 7 9
CiteSeer GCN 15.72 16.05 16.33 13.59 12.56 14.01 20.18 19.86 12.79 16.57 15.46 15.12
GAT 19.02 17.06 22.7 22.46 15.51 22.53 21.53 18.62 16.82 14.9 19.66 23.04
ChebConv 17.74 20.24 23.04 13.98 15.59 23.2 20.63 20.38 16.92 22.7 18.99 23.24
GraphSAGE 0.74 0.59 1.23 1.16 3.39 2.36 2.53 1.87 10.45 7.14 8.57 7.74
BLINK+SAT 8.81 7.23 5.65 4.42 9.14 5.97 5.47 4.88 9.82 8.34 8.75 5.49
PCA 11.2 8.42 7.05 6.05 9.87 7.33 6.8 5.66 8.4 6.83 6.07 5.46
Conv-PCA 11.56 11.11 9.08 10.32 8.14 10.05 10.76 9.25 12.69 11.06 10.83 7.3
KS-PCA 25.7 30.06 35.01 36.38 26.08 27.56 30.84 33.82 26.08 28.52 32.08 36
KS-GNN 29.52 36.37 37.92 39 30.07 36.41 38.85 39 27.08 37.48 37.66 36.02
Proposed Method 34.84 43.3 45.05 46.49 34.67 42.68 44.95 46.94 34.79 43.85 45.73 47.03
Video GCN 13.66 16.06 19.56 25.28 16.14 19.46 19.99 23.85 6.49 6.79 11.31 12.06
GAT 10.47 13.4 9.15 11.35 11.29 12.16 14.5 19.12 8.84 12.26 13.46 13.84
ChebConv 4.45 4.3 4.57 8.49 8.19 12.91 8.4 7.05 5.77 5.96 9.44 13.92
GraphSAGE 0.49 0.30 0.06 0.03 0.44 0.14 0.00 0.05 0.34 0.35 0.21 0.16
BLINK+SAT 10.21 9.86 10.99 14.87 8.55 6.92 8.63 5.82 1.18 1.15 4.38 3.35
PCA 1.54 0.91 0.55 0.61 1.71 0.72 0.71 0.57 1.66 0.95 0.66 0.55
Conv-PCA 1.81 2.46 1.58 2.38 2.43 1.49 1.66 2.54 2.42 2.37 2.80 3.37
KS-PCA 10.19 12.23 16.15 21.37 11.26 15.62 19.51 23.87 10.66 15.57 19.17 25.36
KS-GNN 21.43 23.36 22.92 26.79 22.54 22.57 30.41 33.41 21.01 16.48 22.01 28.47
Proposed Method 23.33 30.13 34.89 38.71 22.76 29.56 34.27 38.11 21.6 27.71 32.73 36.18
Toy GCN 18.97 21.27 19.13 21.44 18.47 20.26 19.07 22.34 18.03 19.40 18.91 20.73
GAT 20.84 22.41 22.6 19.65 21.39 27.43 29.82 31.09 19.16 20.87 21.36 21.80
ChebConv 15.5 18.08 22.63 12.0 14.94 21.01 24.27 19.85 16.3 17.48 19.97 12.35
GraphSAGE 0.74 0.88 5.11 4.21 4.24 12.49 6.50 6.10 1.45 4.39 6.71 1.09
BLINK+SAT 6.47 8.17 8.12 10.54 6.79 9.59 10.99 11.93 6.44 8.56 11.55 8.15
PCA 1.15 0.68 0.63 0.51 1.01 0.69 0.51 0.44 0.67 0.47 0.44 0.30
Conv-PCA 21.34 21.99 23.76 25.40 19.17 19.72 20.21 25.22 16.99 21.61 23.95 24.90
KS-PCA 27.23 27.73 31.58 33.79 25.78 28.94 31.04 32.82 18.35 22.03 25.50 26.25
KS-GNN 28.56 29.85 29.55 34.28 24.65 29.16 31.27 33.25 21.78 27.41 25.55 30.17
Proposed Method 31.82 40.5 43.87 45.31 32.19 41.41 43.49 45.18 24.09 36.31 40.75 41.92
DBLP GCN 2.75 5.27 3.05 8.04 7.27 2.44 7.17 7.6 5.86 3.85 9.12 11.31
GAT 4.41 4.77 10.09 2.35 5.93 5.22 10.74 3.68 5.26 12.97 8.72 9.63
ChebConv 4.56 9.41 5.68 8.96 6.4 4.17 8.5 8.68 7.54 11.03 8.63 17.36
GraphSAGE 0.42 0.49 0.36 0.82 0.29 0.43 0.53 0.05 0.07 0.01 0.01 0.00
BLINK+SAT 3.26 6.09 4.01 6.65 3.49 1.66 5.42 3.95 4.25 2.42 3.12 4.24
PCA 3.78 2.57 2.25 2.38 3.06 2.55 2.21 2.15 2.97 2.51 2.02 1.88
Conv-PCA 9.00 13.24 13.29 16.87 5.93 6.56 7.64 10.62 7.00 10.52 10.03 15.68
KS-PCA 15.28 21.41 25.61 31.64 14.98 20.73 23.21 31.72 12.49 19.49 21.23 28.63
KS-GNN 16.21 24.94 29.55 33.51 16.52 22.73 26.85 30.69 15.57 24.15 27.12 29.06
Proposed Method 23.11 32.29 39.38 40.54 24.01 33.46 38.01 42.13 23.33 32.46 39.16 40.86

Here are the results that you will get by running the code for Hit@100 with 30% missing edges:

Datasets $r_{w}$ 30% Missing Keywords 50% Missing Keywords 70% Missing Keywords
$n_{q}$ 3 5 7 9 3 5 7 9 3 5 7 9
CiteSeer GCN 19.00 17.86 21.19 22.44 13.7 15.18 15.82 11.46 11.27 19.34 22.43 21.33
GAT 10.58 8.88 9.9 12.44 12.2 18.52 14.36 21.24 12.96 15.98 14.83 12.41
ChebConv 18.33 28.27 21.06 17.54 17.89 16.54 28.33 26.67 18.99 22.42 29.54 28.77
GraphSAGE 1.25 1.03 0.75 1.03 3.26 2.45 3.34 1.62 9.44 7.86 7.59 5.45
BLINK+SAT 9.19 8.14 5.68 3.71 8.26 6.79 6.04 8.57 7.88 8.33 7.69 8.3
PCA 11.2 8.42 7.05 6.05 9.87 7.33 6.8 5.66 8.4 6.83 6.07 5.46
Conv-PCA 11.6 10.97 9.05 10.56 9.77 10.61 12.73 9.31 12.92 11.83 12.25 8.45
KS-PCA 26.08 30.06 32.26 34.65 23.8 26.45 29.11 31.81 26.06 26.57 31.42 34.53
KS-GNN 28.96 35.75 39.44 39.4 26.48 34.92 36.31 37.3 24.2 27.84 28.19 33.41
Proposed Method 32.79 41.72 42.45 43.63 33.28 40.74 41.59 42.48 33.17 41.72 41.72 43.16
Video GCN 2.83 2.95 4.5 4.69 1.62 2.8 4.14 2.94 2.19 2.19 3.01 3.04
GAT 3.01 2.13 3.01 2.52 2.51 3.99 3.76 3.84 4.27 3.0 4.31 3.64
ChebConv 3.35 4.08 4.85 4.76 2.65 3.03 4.46 4.54 1.96 2.55 5.67 2.9
GraphSAGE 0.09 0.11 0.02 0.00 0.26 0.05 0.00 0.04 1.65 1.65 2.22 1.29
BLINK+SAT 1.67 1.85 2.48 1.44 0.08 0.99 4.96 2.97 2.19 1.77 0.78 1.21
PCA 1.54 0.91 0.55 0.61 1.71 0.72 0.71 0.57 1.66 0.95 0.66 0.55
Conv-PCA 1.05 1.59 0.83 2.01 1.43 0.81 0.87 1.23 1.25 1.25 1.31 1.38
KS-PCA 3.82 4.13 4.88 7.13 3.96 4.52 5.33 6.11 3.64 4.58 5.17 6.55
KS-GNN 8.08 8.34 12.88 11.82 6.84 7.68 4.18 11.12 6.37 10.31 13.92 10.07
Proposed Method 15.31 18.74 21.8 24.95 14.74 18.98 22.09 25.03 13.87 17.17 20.02 22.39
Toy GCN 2.64 2.37 1.78 2.99 2.45 2.55 2.04 2.93 2.18 2.21 2.11 2.30
GAT 2.22 1.45 2.77 0.92 1.01 0.87 1.11 0.64 1.16 0.93 0.74 0.88
ChebConv 6.93 6.15 11.69 7.82 7.25 8.25 9.03 8.91 3.09 3.83 4.46 5.17
GraphSAGE 0.04 0.02 0.01 0.00 0.28 0.00 0.01 0.02 0.02 0.18 0.29 0.23
BLINK+SAT 3.69 3.03 4.85 5.66 2.56 1.34 1.86 5.46 1.76 1.39 4.73 4.45
PCA 1.15 0.68 0.63 0.51 1.01 0.69 0.51 0.44 0.67 0.47 0.44 0.30
Conv-PCA 11.64 10.46 11.23 12.29 9.31 9.00 9.08 11.61 6.74 8.87 8.99 9.56
KS-PCA 13.80 13.03 13.55 15.67 11.35 11.03 11.09 13.02 6.84 8.37 9.50 10.43
KS-GNN 13.82 13.28 14.38 15.22 12.51 12.54 12.68 12.59 9.41 8.99 12.83 11.15
Proposed Method 22.72 28.62 30.26 31.13 22.09 28.57 30.19 31.85 17.32 22.96 23.49 24.03
DBLP GCN 4.85 6.71 5.89 3.62 2.91 10.47 9.53 8.5 2.21 2.79 1.41 4.56
GAT 3.27 3.74 7.37 7.11 3.71 4.49 5.55 8.04 5.63 7.73 8.34 7.63
ChebConv 5.51 8.51 12.55 4.24 7.66 12.24 11.94 13.09 6.3 6.44 10.21 14.2
GraphSAGE 0.51 0.20 0.42 0.34 0.67 0.25 0.40 0.90 0.40 0.05 0.04 0.02
BLINK+SAT 4.78 3.45 6.74 4.58 3.94 3.66 5.05 3.97 2.96 2.75 2.83 7.29
PCA 3.78 2.57 2.25 2.38 3.06 2.55 2.21 2.15 2.97 2.51 2.02 1.88
Conv-PCA 8.35 11.8 12.51 14.51 5.25 6.77 7.03 9.91 6.46 9.59 9.47 14.33
KS-PCA 15.05 19.99 22.61 28.57 13.68 19.19 21.59 28.96 11.67 17.371 18.38 24.51
KS-GNN 15.91 20.49 25.09 29.04 15.79 19.86 22.15 29.71 12.07 17.18 19.23 24.19
Proposed Method 22.29 29.88 35.22 37.36 19.22 25.86 29.46 30.90 19.79 26.36 29.65 31.81

Ablation Study

To study the impact of each component of our approach on its overall performance. The results of the study are as follows:

Datasets $r_{e}$ 50% Missing Edges
$r_{w}$ 50% Missing Keywords
$n_{q}$ 3 5 7 9
CiteSeer $\mathcal{AS}$ 25.48 30.74 33.24 34.17
$\mathcal{GS}$ 25.27 30.41 32.95 33.83
$\mathcal{LS}$ 25.35 30.05 33.79 34.01
$\mathcal{GS} + \mathcal{AS}$ 25.95 31.65 34.11 34.75
$\mathcal{LS} + \mathcal{AS}$ 26.21 32.71 34.17 35.32
$\mathcal{GS} + \mathcal{LS}$ 26.19 32.80 34.94 35.36
Proposed Method (All) 27.3 34.57 35.48 36.04
Video $\mathcal{AS}$ 11.21 14.37 16.97 18.77
$\mathcal{GS}$ 11.43 14.89 17.08 19.58
$\mathcal{LS}$ 11.84 14.90 17.10 19.71
$\mathcal{GS} + \mathcal{AS}$ 12.31 15.35 18.56 20.19
$\mathcal{LS} + \mathcal{AS}$ 12.49 15.58 18.63 20.34
$\mathcal{GS} + \mathcal{LS}$ 12.09 15.12 18.07 20.11
Proposed Method (All) 13.71 16.63 19.09 21.45
Toy $\mathcal{AS}$ 17.39 23.42 24.74 25.19
$\mathcal{GS}$ 17.60 23.90 24.83 25.67
$\mathcal{LS}$ 17.95 23.98 24.89 25.79
$\mathcal{GS} + \mathcal{AS}$ 18.07 24.07 25.41 26.14
$\mathcal{LS} + \mathcal{AS}$ 18.31 24.44 25.39 26.85
$\mathcal{GS} + \mathcal{LS}$ 18.60 24.82 25.58 26.92
Proposed Method (All) 19.9 25.54 27.03 28.34
DBLP $\mathcal{AS}$ 16.06 23.20 27.59 28.33
$\mathcal{GS}$ 16.47 23.72 27.62 28.40
$\mathcal{LS}$ 16.19 23.93 27.48 28.67
$\mathcal{GS} + \mathcal{AS}$ 18.38 24.37 28.69 29.14
$\mathcal{LS} + \mathcal{AS}$ 17.84 24.02 28.13 29.01
$\mathcal{GS} + \mathcal{LS}$ 18.29 24.49 28.72 29.21
Proposed Method (All) 18.86 25.02 29.33 30.65

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages