📖 Paper at Arxiv · 🤗 GraphAgent Model · 🤗 Graph Tokenizer Model · 🤗 GraphAgent Datasets
- Release inference code
- Release model checkpoints
- Release training and evaluation datasets
- Release training code
Real-world data is represented in both structured (e.g., graph connections) and unstructured (e.g., textual, visual information) formats, encompassing complex relationships that include explicit links (such as social connections and user behaviors) and implicit interdependencies among semantic entities, often illustrated through knowledge graphs. In this work, we propose GraphAgent, an automated agent pipeline that addresses both explicit graph dependencies and implicit graph-enhanced semantic inter-dependencies, aligning with practical data scenarios for predictive tasks (e.g., node classification) and generative tasks (e.g., text generation). GraphAgent comprises three key components: (i) a Graph Generator Agent that builds knowledge graphs to reflect complex semantic dependencies; (ii) a Task Planning Agent that interprets diverse user queries and formulates corresponding tasks through agentic self-planning; and (iii) a Task Execution Agent that efficiently executes planned tasks while automating tool matching and invocation in response to user queries. These agents collaborate seamlessly, integrating language models with graph language models to uncover intricate relational information and data semantic dependencies. Through extensive experiments on various graph-related predictive and text generative tasks on diverse datasets, we demonstrate the effectiveness of our GraphAgent across various settings.
# Clone the repository
git clone https://github.com/yourusername/GraphAgent.git
cd GraphAgent
# Create a conda environment
conda create -n graphagent python=3.11
conda activate graphagent
# Install requirements for GraphAgent inference
pip install -r GraphAgent-inference/requirements.txt
We provide several pre-trained checkpoints on 🤗 Hugging Face to power the full potential of GraphAgent:
GraphAgent/GraphAgent-8B
: Graph action model for GraphAgent as a multimodal llama3 that can take graph tokens as input.GraphAgent/GraphTokenizer
: A multimodal graph-text tokenizer for tokenizing graphs into continuous tokens.sentence-transformers/all-mpnet-base-v2
: The sentence transformer for text graph embedding.
You can download these checkpoints to a local dir and replace them in GraphAgent-inference/run.sh
. Or, the program would also automatically download them for you.
We utilize API-based LLM calls for task planning and graph generation. The default planner here is deepseek
, where you can find in GraphAgent-inference/run.sh
. Put your API key in
export OPENAI_API_KEY=""
that is corresponding to the planner.
bash GraphAgent-inference/run.sh
>>> Please enter a user instruction or file path (or type 'exit' to quit):
# use GraphAgent-inference/demo/use_cases/teach_me_accelerate.txt as an example
>>> Please enter a user instruction or file path (or type 'exit' to quit): GraphAgent-inference/demo/use_cases/teach_me_accelerate.txt
Then you will have a close look on how GraphAgent works to achieve your task.
For more detailed and diverse examples on what GraphAgent can do for you, check out our use_cases directory.
IMDB | ACM | Arxiv-Papers | ICLR-Peer Reviews | Related Work Generation | GovReport Summarization | |
---|---|---|---|---|---|---|
Task Type | Predictive | Predictive | Predictive | Predictive | Generative | Generative |
Sub-Task | NC | NC | Paper Classification | Paper Judgement Prediction | Text Generation | Text Summarization |
Pre-defined Graph? | ✓ | ✓ | × | × | × | × |
#Train Samples | 2,400 | - | 5,175 | 3,141 | 4,155 | - |
#Eval Samples | - | 1000 | 500 | 500 | 500 | 304 |
#Tokens | 10M | 0.8M | 30M | 45M | 93M | 2M |
#Pre-defined Graph Nodes | 11,616 | 10,942 | - | - | - | - |
SKG Source | People Entities | Paper | Paper | Paper, Reviews | Multiple Papers | Documents |
#SKG Nodes | 57,120 | 20,388 | 153,555 | 161,592 | 875,921 | 15,621 |
The training code and procedures will be released in future updates. Stay tuned!
Metric | Trained on | SAGE | GAT | HAN | HGT | HetGNN | HiGPT | GraphAgent | Imprv. |
---|---|---|---|---|---|---|---|---|---|
Micro-F1 (%) | IMDB-1 | 32.93±4.18 | 35.67±0.53 | 34.07±1.11 | 32.40±0.14 | 37.43±4.34 | 45.40±0.89 | 51.21±1.32 | 12.8% |
IMDB-40 | 31.73±0.05 | 23.93±1.44 | 26.97±1.94 | 35.60±0.99 | 31.80±0.16 | 50.50±0.77 | 74.98±1.24 | 48.5% | |
Macro-F1 (%) | IMDB-1 | 26.47±2.69 | 29.08±1.31 | 22.50±4.16 | 16.31±0.05 | 31.39±4.68 | 41.77±1.24 | 46.82±1.43 | 12.1% |
IMDB-40 | 31.17±0.17 | 21.41±0.71 | 23.13±1.32 | 27.49±1.22 | 31.44±0.17 | 45.85±0.89 | 74.98±1.12 | 63.5% | |
AUC (%) | IMDB-1 | 49.34±2.47 | 52.48±0.38 | 51.28±0.86 | 50.00±0.00 | 53.18±2.95 | 59.69±0.82 | 64.10±1.25 | 7.4% |
IMDB-40 | 48.67±0.13 | 43.20±1.08 | 45.45±1.46 | 51.48±0.43 | 48.72±0.06 | 63.60±0.51 | 80.90±1.01 | 27.2% |
Method | Model Size | Arxiv-Papers | ICLR-Peer Reviews | ||||
---|---|---|---|---|---|---|---|
Mi-F1 | Ma-F1 | AUC | Mi-F1 | Ma-F1 | AUC | ||
Open-sourced LLMs | |||||||
Llama3-8b | 8B | 0.514 | 0.289 | 0.527 | 0.402 | 0.394 | 0.502 |
Mistral-Nemo | 12B | 0.510 | 0.292 | 0.615 | 0.272 | 0.246 | 0.380 |
Llama3-70b | 70B | 0.630 | 0.330 | 0.635 | 0.434 | 0.421 | 0.551 |
Qwen2-72b | 72B | 0.632 | 0.472 | 0.700 | 0.344 | 0.277 | 0.509 |
API-based Commercial LLMs | |||||||
Deepseek-Chat-V2 | 236B→21B | 0.746 | 0.580 | 0.757 | 0.362 | 0.312 | 0.516 |
GPT4o-mini | - | 0.592 | 0.343 | 0.634 | 0.692* | 0.592 | 0.591 |
Gemini-1.5-Flash | - | 0.748 | 0.504 | 0.714 | 0.684 | 0.487 | 0.533 |
Finetuned LLMs | |||||||
Llama3-8b Finetuned | 8B | 0.794 | 0.593 | 0.736 | 0.620 | 0.554 | 0.553 |
GraphRAG Implementations | |||||||
Llama3-8b + GraphRAG | 8B | 0.516 | 0.288 | 0.601 | 0.430 | 0.427 | 0.517 |
Llama3-70b + GraphRAG | 70B | 0.603 | 0.324 | 0.623 | 0.308 | 0.296 | 0.401 |
GraphAgent-Task Expert | 8B | 0.820 | 0.620 | 0.768 | 0.686 | 0.620* | 0.615* |
GraphAgent-General | 8B | 0.840* | 0.621* | 0.769* | 0.667 | 0.604 | 0.607 |
GraphAgent-Zero-Shot | 8B | 0.739 | 0.512 | 0.701 | 0.538 | 0.531 | 0.563 |
Method | Model Size | PPL-Llama3-70b | PPL-Qwen2-72b | ||
---|---|---|---|---|---|
Mean | Max | Mean | Max | ||
Open-sourced LLMs | |||||
Llama3-8b | 8B | 7.016 | 13.061 | 7.491 | 12.787 |
Mistral-Nemo | 12B | 7.367 | 15.967 | 6.872 | 12.065 |
Llama3-70b | 70B | 6.168 | 14.436 | 5.877 | 12.897 |
Qwen2-72b | 72B | 6.043 | 11.675 | 5.325 | 11.302 |
API-based Commercial LLMs | |||||
Deepseek-Chat-V2 | 236B→21B | 5.632 | 13.483 | 5.144 | 10.337 |
GPT4o-mini | - | 7.277 | 15.480 | 6.818 | 13.267 |
Gemini-1.5-Flash | - | 5.188 | 10.399 | 5.377 | 10.779 |
Finetuned LLMs | |||||
Llama3-8b Finetuned | 8B | 7.682 | 19.452 | 7.629 | 18.757 |
GraphRAG Implementations | |||||
Llama3-8b + GraphRAG | 8B | 7.098 | 18.092 | 6.539 | 14.722 |
Llama3-70b + GraphRAG | 70B | 6.590 | 14.827 | 6.135 | 14.163 |
GraphAgent-Task Expert | 8B | 3.805 | 10.316 | 4.069 | 11.685 |
GraphAgent-General | 8B | 3.618* | 8.000* | 3.867* | 8.775* |
If you find this repository useful, please cite our paper:
@article{graphagent,
title={GraphAgent: Agentic Graph Language Assistant},
author={Yuhao Yang and Jiabin Tang and Lianghao Xia and Xingchen Zou and Yuxuan Liang and Chao Huang},
year={2024},
journal={arXiv preprint arXiv:2412.17029},
}