Robust Neural Approach for Searching over Incomplete Graphs

The task of searching over large keyword graphs aims to identify a subgraph where the nodes collectively cover the input query keywords. Although finding an exact solution to this problem is NP-hard, we address it by proposing a novel graph neural network representation learning technique specifically tailored for graphs with missing information. We propose a novel keyword graph representation learning method that incorporates complementary aspects of graphs: global, local, adjusted, and feature semantics. Considering these multiple aspects, our approach remains robust and resilient to missing information. We adopt and fine-tune a transformer-based model to aggregate the various features of a graph to generate rich representations, recognizing the pivotal role of keywords in this task. We show through experiments on real-world data that our method outperforms the state-of-the-art approaches and is particularly robust in the face of missing values, underscoring its ability to effectively handle incomplete graphs.

Requirements

These are the most important libraries used to implement the methods. You need to install them using the pip or conda command before running the codes:

Numpy
Math
Pytorch 1.9.x +
Transformers
Torch Geometric
Tqdm

Please take note that some miscellaneous libraries may be needed to run the codes.

Dataset

We use CiteSeer, Amazon Video, Amazon Toy and DBLP datasets for evaluation. Datasets, along its preprocessed version, are available in dataset directory.

Run The Code

After preparing the input data, the proposed method and baselines can be run together. You can then simply run the following command in the terminal:

 `python <train.py>`

* Take note that parameters are initiated in the args.py file. You can customize your configuration there.

After running each model, the average performance for 20 runs will be saved and the results will be shown at the end when all methods are evaluated.

Results

Here are the sample results that you will get by running the code for Hit@100 without missing edges:

Datasets	$r_{w}$	30% Missing Keywords				50% Missing Keywords				70% Missing Keywords
Datasets	$n_{q}$	3	5	7	9	3	5	7	9	3	5	7	9
CiteSeer	GCN	15.72	16.05	16.33	13.59	12.56	14.01	20.18	19.86	12.79	16.57	15.46	15.12
	GAT	19.02	17.06	22.7	22.46	15.51	22.53	21.53	18.62	16.82	14.9	19.66	23.04
	ChebConv	17.74	20.24	23.04	13.98	15.59	23.2	20.63	20.38	16.92	22.7	18.99	23.24
	GraphSAGE	0.74	0.59	1.23	1.16	3.39	2.36	2.53	1.87	10.45	7.14	8.57	7.74
	BLINK+SAT	8.81	7.23	5.65	4.42	9.14	5.97	5.47	4.88	9.82	8.34	8.75	5.49
	PCA	11.2	8.42	7.05	6.05	9.87	7.33	6.8	5.66	8.4	6.83	6.07	5.46
	Conv-PCA	11.56	11.11	9.08	10.32	8.14	10.05	10.76	9.25	12.69	11.06	10.83	7.3
	KS-PCA	25.7	30.06	35.01	36.38	26.08	27.56	30.84	33.82	26.08	28.52	32.08	36
	KS-GNN	29.52	36.37	37.92	39	30.07	36.41	38.85	39	27.08	37.48	37.66	36.02
	Proposed Method	34.84	43.3	45.05	46.49	34.67	42.68	44.95	46.94	34.79	43.85	45.73	47.03
Video	GCN	13.66	16.06	19.56	25.28	16.14	19.46	19.99	23.85	6.49	6.79	11.31	12.06
	GAT	10.47	13.4	9.15	11.35	11.29	12.16	14.5	19.12	8.84	12.26	13.46	13.84
	ChebConv	4.45	4.3	4.57	8.49	8.19	12.91	8.4	7.05	5.77	5.96	9.44	13.92
	GraphSAGE	0.49	0.30	0.06	0.03	0.44	0.14	0.00	0.05	0.34	0.35	0.21	0.16
	BLINK+SAT	10.21	9.86	10.99	14.87	8.55	6.92	8.63	5.82	1.18	1.15	4.38	3.35
	PCA	1.54	0.91	0.55	0.61	1.71	0.72	0.71	0.57	1.66	0.95	0.66	0.55
	Conv-PCA	1.81	2.46	1.58	2.38	2.43	1.49	1.66	2.54	2.42	2.37	2.80	3.37
	KS-PCA	10.19	12.23	16.15	21.37	11.26	15.62	19.51	23.87	10.66	15.57	19.17	25.36
	KS-GNN	21.43	23.36	22.92	26.79	22.54	22.57	30.41	33.41	21.01	16.48	22.01	28.47
	Proposed Method	23.33	30.13	34.89	38.71	22.76	29.56	34.27	38.11	21.6	27.71	32.73	36.18
Toy	GCN	18.97	21.27	19.13	21.44	18.47	20.26	19.07	22.34	18.03	19.40	18.91	20.73
	GAT	20.84	22.41	22.6	19.65	21.39	27.43	29.82	31.09	19.16	20.87	21.36	21.80
	ChebConv	15.5	18.08	22.63	12.0	14.94	21.01	24.27	19.85	16.3	17.48	19.97	12.35
	GraphSAGE	0.74	0.88	5.11	4.21	4.24	12.49	6.50	6.10	1.45	4.39	6.71	1.09
	BLINK+SAT	6.47	8.17	8.12	10.54	6.79	9.59	10.99	11.93	6.44	8.56	11.55	8.15
	PCA	1.15	0.68	0.63	0.51	1.01	0.69	0.51	0.44	0.67	0.47	0.44	0.30
	Conv-PCA	21.34	21.99	23.76	25.40	19.17	19.72	20.21	25.22	16.99	21.61	23.95	24.90
	KS-PCA	27.23	27.73	31.58	33.79	25.78	28.94	31.04	32.82	18.35	22.03	25.50	26.25
	KS-GNN	28.56	29.85	29.55	34.28	24.65	29.16	31.27	33.25	21.78	27.41	25.55	30.17
	Proposed Method	31.82	40.5	43.87	45.31	32.19	41.41	43.49	45.18	24.09	36.31	40.75	41.92
DBLP	GCN	2.75	5.27	3.05	8.04	7.27	2.44	7.17	7.6	5.86	3.85	9.12	11.31
	GAT	4.41	4.77	10.09	2.35	5.93	5.22	10.74	3.68	5.26	12.97	8.72	9.63
	ChebConv	4.56	9.41	5.68	8.96	6.4	4.17	8.5	8.68	7.54	11.03	8.63	17.36
	GraphSAGE	0.42	0.49	0.36	0.82	0.29	0.43	0.53	0.05	0.07	0.01	0.01	0.00
	BLINK+SAT	3.26	6.09	4.01	6.65	3.49	1.66	5.42	3.95	4.25	2.42	3.12	4.24
	PCA	3.78	2.57	2.25	2.38	3.06	2.55	2.21	2.15	2.97	2.51	2.02	1.88
	Conv-PCA	9.00	13.24	13.29	16.87	5.93	6.56	7.64	10.62	7.00	10.52	10.03	15.68
	KS-PCA	15.28	21.41	25.61	31.64	14.98	20.73	23.21	31.72	12.49	19.49	21.23	28.63
	KS-GNN	16.21	24.94	29.55	33.51	16.52	22.73	26.85	30.69	15.57	24.15	27.12	29.06
	Proposed Method	23.11	32.29	39.38	40.54	24.01	33.46	38.01	42.13	23.33	32.46	39.16	40.86

Here are the results that you will get by running the code for Hit@100 with 30% missing edges:

Datasets	$r_{w}$	30% Missing Keywords				50% Missing Keywords				70% Missing Keywords
Datasets	$n_{q}$	3	5	7	9	3	5	7	9	3	5	7	9
CiteSeer	GCN	19.00	17.86	21.19	22.44	13.7	15.18	15.82	11.46	11.27	19.34	22.43	21.33
	GAT	10.58	8.88	9.9	12.44	12.2	18.52	14.36	21.24	12.96	15.98	14.83	12.41
	ChebConv	18.33	28.27	21.06	17.54	17.89	16.54	28.33	26.67	18.99	22.42	29.54	28.77
	GraphSAGE	1.25	1.03	0.75	1.03	3.26	2.45	3.34	1.62	9.44	7.86	7.59	5.45
	BLINK+SAT	9.19	8.14	5.68	3.71	8.26	6.79	6.04	8.57	7.88	8.33	7.69	8.3
	PCA	11.2	8.42	7.05	6.05	9.87	7.33	6.8	5.66	8.4	6.83	6.07	5.46
	Conv-PCA	11.6	10.97	9.05	10.56	9.77	10.61	12.73	9.31	12.92	11.83	12.25	8.45
	KS-PCA	26.08	30.06	32.26	34.65	23.8	26.45	29.11	31.81	26.06	26.57	31.42	34.53
	KS-GNN	28.96	35.75	39.44	39.4	26.48	34.92	36.31	37.3	24.2	27.84	28.19	33.41
	Proposed Method	32.79	41.72	42.45	43.63	33.28	40.74	41.59	42.48	33.17	41.72	41.72	43.16
Video	GCN	2.83	2.95	4.5	4.69	1.62	2.8	4.14	2.94	2.19	2.19	3.01	3.04
	GAT	3.01	2.13	3.01	2.52	2.51	3.99	3.76	3.84	4.27	3.0	4.31	3.64
	ChebConv	3.35	4.08	4.85	4.76	2.65	3.03	4.46	4.54	1.96	2.55	5.67	2.9
	GraphSAGE	0.09	0.11	0.02	0.00	0.26	0.05	0.00	0.04	1.65	1.65	2.22	1.29
	BLINK+SAT	1.67	1.85	2.48	1.44	0.08	0.99	4.96	2.97	2.19	1.77	0.78	1.21
	PCA	1.54	0.91	0.55	0.61	1.71	0.72	0.71	0.57	1.66	0.95	0.66	0.55
	Conv-PCA	1.05	1.59	0.83	2.01	1.43	0.81	0.87	1.23	1.25	1.25	1.31	1.38
	KS-PCA	3.82	4.13	4.88	7.13	3.96	4.52	5.33	6.11	3.64	4.58	5.17	6.55
	KS-GNN	8.08	8.34	12.88	11.82	6.84	7.68	4.18	11.12	6.37	10.31	13.92	10.07
	Proposed Method	15.31	18.74	21.8	24.95	14.74	18.98	22.09	25.03	13.87	17.17	20.02	22.39
Toy	GCN	2.64	2.37	1.78	2.99	2.45	2.55	2.04	2.93	2.18	2.21	2.11	2.30
	GAT	2.22	1.45	2.77	0.92	1.01	0.87	1.11	0.64	1.16	0.93	0.74	0.88
	ChebConv	6.93	6.15	11.69	7.82	7.25	8.25	9.03	8.91	3.09	3.83	4.46	5.17
	GraphSAGE	0.04	0.02	0.01	0.00	0.28	0.00	0.01	0.02	0.02	0.18	0.29	0.23
	BLINK+SAT	3.69	3.03	4.85	5.66	2.56	1.34	1.86	5.46	1.76	1.39	4.73	4.45
	PCA	1.15	0.68	0.63	0.51	1.01	0.69	0.51	0.44	0.67	0.47	0.44	0.30
	Conv-PCA	11.64	10.46	11.23	12.29	9.31	9.00	9.08	11.61	6.74	8.87	8.99	9.56
	KS-PCA	13.80	13.03	13.55	15.67	11.35	11.03	11.09	13.02	6.84	8.37	9.50	10.43
	KS-GNN	13.82	13.28	14.38	15.22	12.51	12.54	12.68	12.59	9.41	8.99	12.83	11.15
	Proposed Method	22.72	28.62	30.26	31.13	22.09	28.57	30.19	31.85	17.32	22.96	23.49	24.03
DBLP	GCN	4.85	6.71	5.89	3.62	2.91	10.47	9.53	8.5	2.21	2.79	1.41	4.56
	GAT	3.27	3.74	7.37	7.11	3.71	4.49	5.55	8.04	5.63	7.73	8.34	7.63
	ChebConv	5.51	8.51	12.55	4.24	7.66	12.24	11.94	13.09	6.3	6.44	10.21	14.2
	GraphSAGE	0.51	0.20	0.42	0.34	0.67	0.25	0.40	0.90	0.40	0.05	0.04	0.02
	BLINK+SAT	4.78	3.45	6.74	4.58	3.94	3.66	5.05	3.97	2.96	2.75	2.83	7.29
	PCA	3.78	2.57	2.25	2.38	3.06	2.55	2.21	2.15	2.97	2.51	2.02	1.88
	Conv-PCA	8.35	11.8	12.51	14.51	5.25	6.77	7.03	9.91	6.46	9.59	9.47	14.33
	KS-PCA	15.05	19.99	22.61	28.57	13.68	19.19	21.59	28.96	11.67	17.371	18.38	24.51
	KS-GNN	15.91	20.49	25.09	29.04	15.79	19.86	22.15	29.71	12.07	17.18	19.23	24.19
	Proposed Method	22.29	29.88	35.22	37.36	19.22	25.86	29.46	30.90	19.79	26.36	29.65	31.81

Ablation Study

To study the impact of each component of our approach on its overall performance. The results of the study are as follows:

Datasets	$r_{e}$	50% Missing Edges
	$r_{w}$	50% Missing Keywords
	$n_{q}$	3	5	7	9
CiteSeer	$\mathcal{AS}$	25.48	30.74	33.24	34.17
	$\mathcal{GS}$	25.27	30.41	32.95	33.83
	$\mathcal{LS}$	25.35	30.05	33.79	34.01
	$\mathcal{GS} + \mathcal{AS}$	25.95	31.65	34.11	34.75
	$\mathcal{LS} + \mathcal{AS}$	26.21	32.71	34.17	35.32
	$\mathcal{GS} + \mathcal{LS}$	26.19	32.80	34.94	35.36
	Proposed Method (All)	27.3	34.57	35.48	36.04
Video	$\mathcal{AS}$	11.21	14.37	16.97	18.77
	$\mathcal{GS}$	11.43	14.89	17.08	19.58
	$\mathcal{LS}$	11.84	14.90	17.10	19.71
	$\mathcal{GS} + \mathcal{AS}$	12.31	15.35	18.56	20.19
	$\mathcal{LS} + \mathcal{AS}$	12.49	15.58	18.63	20.34
	$\mathcal{GS} + \mathcal{LS}$	12.09	15.12	18.07	20.11
	Proposed Method (All)	13.71	16.63	19.09	21.45
Toy	$\mathcal{AS}$	17.39	23.42	24.74	25.19
	$\mathcal{GS}$	17.60	23.90	24.83	25.67
	$\mathcal{LS}$	17.95	23.98	24.89	25.79
	$\mathcal{GS} + \mathcal{AS}$	18.07	24.07	25.41	26.14
	$\mathcal{LS} + \mathcal{AS}$	18.31	24.44	25.39	26.85
	$\mathcal{GS} + \mathcal{LS}$	18.60	24.82	25.58	26.92
	Proposed Method (All)	19.9	25.54	27.03	28.34
DBLP	$\mathcal{AS}$	16.06	23.20	27.59	28.33
	$\mathcal{GS}$	16.47	23.72	27.62	28.40
	$\mathcal{LS}$	16.19	23.93	27.48	28.67
	$\mathcal{GS} + \mathcal{AS}$	18.38	24.37	28.69	29.14
	$\mathcal{LS} + \mathcal{AS}$	17.84	24.02	28.13	29.01
	$\mathcal{GS} + \mathcal{LS}$	18.29	24.49	28.72	29.21
	Proposed Method (All)	18.86	25.02	29.33	30.65

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
base_class		base_class
data		data
dblp		dblp
figs		figs
result		result
saved_model		saved_model
DatasetLoader.py		DatasetLoader.py
MethodBertComp.py		MethodBertComp.py
MethodGraphBert.py		MethodGraphBert.py
README.md		README.md
Trainer.py		Trainer.py
args.py		args.py
models.py		models.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Robust Neural Approach for Searching over Incomplete Graphs

Requirements

Dataset

Run The Code

Results

Ablation Study

About

Releases

Packages

Languages

radinhamidi/A-Robust-Neural-Approach-for-Searching-over-Incomplete-Graphs

Folders and files

Latest commit

History

Repository files navigation

Robust Neural Approach for Searching over Incomplete Graphs

Requirements

Dataset

Run The Code

Results

Ablation Study

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages