Skip to content

HKUDS/GraphAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

GraphAgent: Agentic Graph Language Assistant

📖 Paper at Arxiv · 🤗 GraphAgent Model · 🤗 Graph Tokenizer Model · 🤗 GraphAgent Datasets

📋 To-Do List

  • Release inference code
  • Release model checkpoints
  • Release training and evaluation datasets
  • Release training code

🌟 Overview

Real-world data is represented in both structured (e.g., graph connections) and unstructured (e.g., textual, visual information) formats, encompassing complex relationships that include explicit links (such as social connections and user behaviors) and implicit interdependencies among semantic entities, often illustrated through knowledge graphs. In this work, we propose GraphAgent, an automated agent pipeline that addresses both explicit graph dependencies and implicit graph-enhanced semantic inter-dependencies, aligning with practical data scenarios for predictive tasks (e.g., node classification) and generative tasks (e.g., text generation). GraphAgent comprises three key components: (i) a Graph Generator Agent that builds knowledge graphs to reflect complex semantic dependencies; (ii) a Task Planning Agent that interprets diverse user queries and formulates corresponding tasks through agentic self-planning; and (iii) a Task Execution Agent that efficiently executes planned tasks while automating tool matching and invocation in response to user queries. These agents collaborate seamlessly, integrating language models with graph language models to uncover intricate relational information and data semantic dependencies. Through extensive experiments on various graph-related predictive and text generative tasks on diverse datasets, we demonstrate the effectiveness of our GraphAgent across various settings.

🚀 Getting Started

Invoking GraphAgent (Inference)

Installation

# Clone the repository
git clone https://github.com/yourusername/GraphAgent.git
cd GraphAgent

# Create a conda environment
conda create -n graphagent python=3.11
conda activate graphagent

# Install requirements for GraphAgent inference
pip install -r GraphAgent-inference/requirements.txt

Get Pre-trained Models

We provide several pre-trained checkpoints on 🤗 Hugging Face to power the full potential of GraphAgent:

  • GraphAgent/GraphAgent-8B: Graph action model for GraphAgent as a multimodal llama3 that can take graph tokens as input.
  • GraphAgent/GraphTokenizer: A multimodal graph-text tokenizer for tokenizing graphs into continuous tokens.
  • sentence-transformers/all-mpnet-base-v2: The sentence transformer for text graph embedding.

You can download these checkpoints to a local dir and replace them in GraphAgent-inference/run.sh. Or, the program would also automatically download them for you.

Set the Planner and API Token

We utilize API-based LLM calls for task planning and graph generation. The default planner here is deepseek, where you can find in GraphAgent-inference/run.sh. Put your API key in

export OPENAI_API_KEY=""

that is corresponding to the planner.

Inference Examples

bash GraphAgent-inference/run.sh

>>> Please enter a user instruction or file path (or type 'exit' to quit):

# use GraphAgent-inference/demo/use_cases/teach_me_accelerate.txt as an example
>>> Please enter a user instruction or file path (or type 'exit' to quit): GraphAgent-inference/demo/use_cases/teach_me_accelerate.txt

Then you will have a close look on how GraphAgent works to achieve your task.

For more detailed and diverse examples on what GraphAgent can do for you, check out our use_cases directory.

GraphAgent Dataset (Coming Soon!)

IMDB ACM Arxiv-Papers ICLR-Peer Reviews Related Work Generation GovReport Summarization
Task Type Predictive Predictive Predictive Predictive Generative Generative
Sub-Task NC NC Paper Classification Paper Judgement Prediction Text Generation Text Summarization
Pre-defined Graph? × × × ×
#Train Samples 2,400 - 5,175 3,141 4,155 -
#Eval Samples - 1000 500 500 500 304
#Tokens 10M 0.8M 30M 45M 93M 2M
#Pre-defined Graph Nodes 11,616 10,942 - - - -
SKG Source People Entities Paper Paper Paper, Reviews Multiple Papers Documents
#SKG Nodes 57,120 20,388 153,555 161,592 875,921 15,621

Training GraphAgent with Your Own Data (Coming Soon!)

The training code and procedures will be released in future updates. Stay tuned!

📊 Benchmarks

Zero-shot classification task on ACM-1000

Metric Trained on SAGE GAT HAN HGT HetGNN HiGPT GraphAgent Imprv.
Micro-F1 (%) IMDB-1 32.93±4.18 35.67±0.53 34.07±1.11 32.40±0.14 37.43±4.34 45.40±0.89 51.21±1.32 12.8%
IMDB-40 31.73±0.05 23.93±1.44 26.97±1.94 35.60±0.99 31.80±0.16 50.50±0.77 74.98±1.24 48.5%
Macro-F1 (%) IMDB-1 26.47±2.69 29.08±1.31 22.50±4.16 16.31±0.05 31.39±4.68 41.77±1.24 46.82±1.43 12.1%
IMDB-40 31.17±0.17 21.41±0.71 23.13±1.32 27.49±1.22 31.44±0.17 45.85±0.89 74.98±1.12 63.5%
AUC (%) IMDB-1 49.34±2.47 52.48±0.38 51.28±0.86 50.00±0.00 53.18±2.95 59.69±0.82 64.10±1.25 7.4%
IMDB-40 48.67±0.13 43.20±1.08 45.45±1.46 51.48±0.43 48.72±0.06 63.60±0.51 80.90±1.01 27.2%

Complex graph predictive tasks on Arxiv-Papers and ICLR-Peer Reviews

Method Model Size Arxiv-Papers ICLR-Peer Reviews
Mi-F1 Ma-F1 AUC Mi-F1 Ma-F1 AUC
Open-sourced LLMs
Llama3-8b 8B 0.514 0.289 0.527 0.402 0.394 0.502
Mistral-Nemo 12B 0.510 0.292 0.615 0.272 0.246 0.380
Llama3-70b 70B 0.630 0.330 0.635 0.434 0.421 0.551
Qwen2-72b 72B 0.632 0.472 0.700 0.344 0.277 0.509
API-based Commercial LLMs
Deepseek-Chat-V2 236B→21B 0.746 0.580 0.757 0.362 0.312 0.516
GPT4o-mini - 0.592 0.343 0.634 0.692* 0.592 0.591
Gemini-1.5-Flash - 0.748 0.504 0.714 0.684 0.487 0.533
Finetuned LLMs
Llama3-8b Finetuned 8B 0.794 0.593 0.736 0.620 0.554 0.553
GraphRAG Implementations
Llama3-8b + GraphRAG 8B 0.516 0.288 0.601 0.430 0.427 0.517
Llama3-70b + GraphRAG 70B 0.603 0.324 0.623 0.308 0.296 0.401
GraphAgent-Task Expert 8B 0.820 0.620 0.768 0.686 0.620* 0.615*
GraphAgent-General 8B 0.840* 0.621* 0.769* 0.667 0.604 0.607
GraphAgent-Zero-Shot 8B 0.739 0.512 0.701 0.538 0.531 0.563

Content generation on ACL-EMNLP related work instructions.

Method Model Size PPL-Llama3-70b PPL-Qwen2-72b
Mean Max Mean Max
Open-sourced LLMs
Llama3-8b 8B 7.016 13.061 7.491 12.787
Mistral-Nemo 12B 7.367 15.967 6.872 12.065
Llama3-70b 70B 6.168 14.436 5.877 12.897
Qwen2-72b 72B 6.043 11.675 5.325 11.302
API-based Commercial LLMs
Deepseek-Chat-V2 236B→21B 5.632 13.483 5.144 10.337
GPT4o-mini - 7.277 15.480 6.818 13.267
Gemini-1.5-Flash - 5.188 10.399 5.377 10.779
Finetuned LLMs
Llama3-8b Finetuned 8B 7.682 19.452 7.629 18.757
GraphRAG Implementations
Llama3-8b + GraphRAG 8B 7.098 18.092 6.539 14.722
Llama3-70b + GraphRAG 70B 6.590 14.827 6.135 14.163
GraphAgent-Task Expert 8B 3.805 10.316 4.069 11.685
GraphAgent-General 8B 3.618* 8.000* 3.867* 8.775*

📝 Citation

If you find this repository useful, please cite our paper:

@article{graphagent,
      title={GraphAgent: Agentic Graph Language Assistant}, 
      author={Yuhao Yang and Jiabin Tang and Lianghao Xia and Xingchen Zou and Yuxuan Liang and Chao Huang},
      year={2024},
      journal={arXiv preprint arXiv:2412.17029},
}

Releases

No releases published

Packages

No packages published