This repository contains the code implementation of LLM4DyG as described in the paper: LLM4DyG: Can Large Language Models Solve Spatial-Temporal Problems on Dynamic Graphs? (KDD 2024).
Figure 1. The pipeline of LLM4DyG.
Figure 2. Designed tasks in LLM4DyG.
In an era marked by the increasing adoption of Large Language Models (LLMs) for various tasks, there is a growing focus on exploring LLMs' capabilities in handling web data, particularly graph data. Dynamic graphs, which capture temporal network evolution patterns, are ubiquitous in real-world web data. Evaluating LLMs' competence in understanding spatial-temporal information on dynamic graphs is essential for their adoption in web applications, which remains unexplored in the literature.
In this paper, we bridge the gap via proposing to evaluate LLMs' spatial-temporal understanding abilities on dynamic graphs, to the best of our knowledge, for the first time. Specifically, we propose the LLM4DyG benchmark, which includes nine specially designed tasks considering the capability evaluation of LLMs from both temporal and spatial dimensions.
Then, we conduct extensive experiments to analyze the impacts of different data generators, data statistics, prompting techniques, and LLMs on the model performance.
The pipeline is shown in Figure 1 and the designed tasks are shown in Figure 2.
We have tested our codes with the following requirements:
- Python == 3.9
- Pytorch == 2.2.1+cu122
- fschat == 0.2.36
Please follow the following steps to create a virtual environment and install the required packages.
Clone the repository:
git clone [email protected]:wondergo2017/llm4dyg.git
cd llm4dyg
Create a virtual environment:
conda create --name llm4dyg python=3.9 -y
conda activate llm4dyg
Install dependencies of pytorch, fschat. If running LLMs locally, install transformers==4.37.0.
Install this repo as a library:
pip install -e .
The files in this repo are organized as follows:
\llm4dyg
\utils
\task # query/answer generation for each task
prompt.py # prompter constructor
data.py # data generator
api.py # api for querying LLMs
...
runner.py # main running script to manage data, etc.
private.py # openai key (if querying gpt).
\paper # paper framework, etc.
\scripts\
\example # example running scripts
config.py # arguments
start_server.py # running local llm servers
run_one_instance.py # example to query one problem instance
run_one_task.py # example to evaluate on task
run_tasks.py # example to evaluate on all tasks
We introduce three example scripts to run the codes.
- run_one_instance.py, example to query one problem instance
- run_one_task.py, example to evaluate on task
- run_tasks.py, example to evaluate on all tasks
If you want to evaluate LLMs locally, running start_server.py to host the LLM.
To host the LLM locally
python start_server.py --model codellama2-13b -t run --device 0
The model can be the name in Huggingface.
To stop the server
python start_server.py --model codellama2-13b -t clear --device 0
cd scripts/example
python run_one_instance.py --task when_link --T 5 --N 10
see config.py for details of the arguments
The key script in the file is
# generate data
dygen = DyGraphGenERCon() # data generator
obj_task = load_task(task, args) # task
dygprompt = DyGraphPrompt(obj_task, args = args) # prompt generator
info = dygen.sample_dynamic_graph(T = T, N = N , p = p, seed = seed) # generate data
qa = obj_task.generate_qa(info) # generate qa
prompt_qa = dygprompt.generate_prompt_qa(**qa) # generate prompt qa
print('#'*10,'prompt_qa:\n', prompt_qa)
# generate response
model = args.model
prompt = prompt_qa['prompt']
answer = send_prompt(model, prompt, temperature = args.temperature, max_tokens = args.max_tokens)
print('#'*10,'answer:\n', answer)
# score
metric = obj_task.evaluate(qa, answer["content"])
print('#'*10,'metric:\n', metric)
This script save the queries and answers for each problem instance in files for a given task.
cd scripts/example
see config.py for details of the arguments
to generate data
python run_one_task.py --task when_link --T 5 --N 10 -t gen --num_seed 100
to check data
python run_one_task.py --task when_link --T 5 --N 10 -t check
to run model
python run_one_task.py --task when_link --T 5 --N 10 -t run --model codellama2-13b
to evaluate model
python run_one_task.py --task when_link --T 5 --N 10 -t eval --model codellama2-13b
This script save the queries and answers for each problem instance in files for all tasks. The usage is similar to run_one_task.py
to generate data
python run_tasks.py --T 5 --N 10 -t gen
to run model
python run_tasks.py --T 5 --N 10 -t run --model codellama2-13b
to show results
python run_tasks.py --T 5 --N 10 -t eval --model codellama2-13b
If you want to change the construction of data, task or prompt, just make an inheritance to the class Runner
in runner.py or other related classes.
The APIs for making queries to LLMs are based on fastchat. We sincerely appreciate their contributions to the research community.
If you find our repo or paper useful, please star the repo or cite our paper:
@inproceedings{zhang2023LLM4DyG,
title={LLM4DyG: Can Large Language Models Solve Spatial-Temporal Problems on Dynamic Graphs?},
author={Zeyang Zhang and Wang, Xin and Zhang, Ziwei and Li, Haoyang and Qin, Yijian and Zhu, Wenwu},
booktitle={Conference on Knowledge Discovery and Data Mining (ACM SIGKDD)},
year={2024}
}
If you have any questions, please feel free to contact us ([email protected]) or ([email protected])