A specialized GraphQL agent toolkit for LLM interactions with SubQuery SDK-generated APIs, featuring natural language query capabilities and OpenAI-compatible API endpoints.
This toolkit provides LLM agents with the ability to interact with any GraphQL API built with SubQuery SDK through natural language, automatically understanding schemas, validating queries, and executing complex GraphQL operations.
- Natural Language Interface: Ask questions about blockchain data in plain English
- Automatic Schema Understanding: Agents learn PostGraphile v4 patterns and SubQuery entity schemas
- Query Generation & Validation: Converts natural language to valid GraphQL queries with built-in validation
- OpenAI-Compatible API: FastAPI server with streaming and non-streaming endpoints
- SubQuery SDK Optimized: Works with any project built using SubQuery SDK (Ethereum, Polkadot, Cosmos, etc.)
Traditional GraphQL agents face a fundamental challenge: schema size exceeds LLM context limits. Most GraphQL APIs have introspection schemas that are tens of thousands of tokens, making them:
- Too large for most commercial LLMs (exceeding context windows)
- Too expensive for cost-effective query generation
- Too noisy for reliable query construction (low signal-to-noise ratio)
Instead of using raw GraphQL introspection schemas, we developed a compressed, high-density schema representation:
- Compact Format: 100x smaller than full introspection schemas
- Domain-Specific: Contains project-specific entities and relationships
- High Information Density: Only essential types, relationships, and patterns
- Rule-Based: Combined with PostGraphile v4 patterns for query construction
Traditional Approach:
├── Full GraphQL Introspection: ~50,000+ tokens
├── Context Window Usage: 80-95%
└── Result: Often fails or generates invalid queries
Our Approach:
├── Entity Schema: ~500-1,000 tokens
├── PostGraphile Rules: ~200-300 tokens
├── Context Window Usage: 5-10%
└── Result: Reliable, cost-effective query generation
- Entity Schema Teaching: LLM learns project's domain model from compressed schema
- Pattern Recognition: PostGraphile v4 rules guide query structure
- Intelligent Construction: Agent builds queries using learned patterns
- Validation: Real-time schema validation ensures correctness
- 💰 Cost Effective: 10-20x lower token usage than traditional approaches
- 🎯 Higher Accuracy: Domain-specific knowledge reduces errors
- ⚡ Faster Responses: Smaller context means faster processing
- 🔄 Scalable: Works consistently across different LLM models
# Traditional approach (fails with large schemas)
raw_schema = introspect_graphql_schema() # 50k+ tokens
context = f"Schema: {raw_schema}\nQuestion: {user_query}" # Exceeds limits
# Our approach (works reliably)
entity_schema = load_project_entities() # 500 tokens
rules = get_postgraphile_patterns() # 300 tokens
context = f"Entities: {entity_schema}\nRules: {rules}\nQuestion: {user_query}"
- SubQuery SDK Optimized: Specifically designed for APIs built with SubQuery SDK
- PostGraphile v4: Leverages PostGraphile v4 patterns that SubQuery SDK generates
- Entity-Focused: Works best with well-defined blockchain entity relationships
The same philosophy can be applied to other GraphQL ecosystems:
- Hasura: Could use Hasura-specific schema compression + rules
- Apollo Federation: Could compress federated schemas with service patterns
- Custom GraphQL: Could extract domain models + API patterns
- Other ORMs: Could adapt for Prisma, TypeORM, or other ORM-generated schemas
SubQuery SDK Agent (Current)
├── Entity Schema: Project-specific domain models
├── Rules: PostGraphile v4 patterns
└── Scope: Any SubQuery SDK-generated API
Generic GraphQL Agent (Future)
├── Schema Compression: Auto-extract domain models
├── Pattern Recognition: Detect API patterns automatically
├── Multi-Domain: Support multiple GraphQL styles
└── Scope: Any GraphQL API
This approach represents a paradigm shift in GraphQL agent design:
- From: "Give LLM everything and hope it works"
- To: "Give LLM exactly what it needs to succeed"
The result is a more reliable, cost-effective, and performant GraphQL agent that can actually be deployed in production environments.
- GraphQLSource - Connection wrapper for GraphQL endpoints with entity schema support
- GraphQLToolkit - LangChain-compatible toolkit providing all GraphQL tools
- GraphQL Agent Tools - Individual tools for specific GraphQL operations
- FastAPI Server - OpenAI-compatible API with streaming support
graphql_schema_info
- Get raw entity schema with PostGraphile v4 rulesgraphql_type_detail
- Get detailed type information (fallback tool)graphql_query_validator
- Validate GraphQL query syntax against schemagraphql_execute
- Execute GraphQL queries and return results
- Python 3.12+
- OpenAI API Key (for LLM capabilities)
- Dependencies:
# Install dependencies
uv sync
# Set environment variables
export OPENAI_API_KEY="your-openai-api-key-here"
export LLM_MODEL="gpt-4o" # Recommended: gpt-4o or stronger models
export PORT="8000" # Optional, defaults to 8000
Run the agent interactively:
cd examples
python working_example.py
Start the OpenAI-compatible API server:
cd examples
python server.py
The server will start on http://localhost:8000
with endpoints:
POST /v1/chat/completions
- OpenAI-compatible chat completionsGET /v1/models
- List available modelsGET /health
- Health check
from graphql_agent import create_graphql_toolkit
from langchain_openai import ChatOpenAI
from langchain.agents import create_react_agent, AgentExecutor
# Load entity schema (learn more: https://subquery.network/doc/indexer/build/graphql.html)
# Note: This example uses SubQuery Network's schema - replace with your own project's schema
with open("examples/schema.graphql", 'r') as f:
entity_schema = f.read()
# Create toolkit
# Note: This example uses SubQuery Network's API - replace with your own project's endpoint
endpoint = "https://index-api.onfinality.io/sq/subquery/subquery-mainnet"
toolkit = create_graphql_toolkit(endpoint, entity_schema)
# Create agent
llm = ChatOpenAI(model="gpt-4o", temperature=0) # Use gpt-4o or stronger for best results
agent = create_react_agent(llm, toolkit.get_tools(), prompt_template)
executor = AgentExecutor(agent=agent, tools=toolkit.get_tools())
# Query with natural language
result = executor.invoke({
"input": "Show me the top 3 indexers with their project information"
})
# Non-streaming request
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Show me 5 indexers and their rewards"}],
"stream": false
}'
# Streaming request
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "What projects are available?"}],
"stream": true
}'
Note: These examples are from the SubQuery Network demo. For your own project, the queries would be specific to your indexed blockchain data.
The example agent can handle queries like:
- "Show me the first 5 indexers and their IDs"
- "What projects are available? Show me their owners"
- "List all indexers with their project information"
- "What are my staking rewards for wallet 0x123...?"
- "Show me rewards for the last era"
- "Find delegations for a specific indexer"
- "Which indexers have the highest rewards?"
- "Show me project performance metrics"
- "List top performing indexers by era"
- "What types of data can I query?"
- "Show me available project information"
- "What reward data is tracked?"
The agent understands PostGraphile v4 patterns automatically:
- Single:
entityName(id: ID!)
→ Full entity object - Collection:
entityNames(first: Int, filter: EntityFilter)
→ Connection with pagination
filter: {
fieldName: { equalTo: "value" }
amount: { greaterThan: 100 }
status: { in: ["active", "pending"] }
}
orderBy: [FIELD_NAME_ASC, CREATED_AT_DESC]
{
entities(first: 10, after: "cursor") {
nodes { id, field }
pageInfo { hasNextPage, endCursor }
}
}
The agent follows this intelligent workflow:
- Relevance Check: Determines if the question relates to SubQuery Network data
- Schema Analysis: Loads entity schema and PostGraphile rules (once per session)
- Query Construction: Builds GraphQL queries using PostGraphile patterns
- Validation: Validates queries against the live GraphQL schema
- Execution: Executes validated queries to get real data
- Summarization: Provides user-friendly responses based on actual results
For questions unrelated to SubQuery Network (e.g., "How to cook pasta?"), the agent politely declines without using any tools:
"I'm specialized in SubQuery Network data queries. I can help you with indexers, projects, staking rewards, and network statistics, but I cannot assist with cooking. Please ask me about SubQuery Network data instead."
- Purpose: Get raw entity schema with PostGraphile v4 guidance
- Input: None
- Output: Complete entity schema with query construction rules
- Usage: Called once per session to understand data structure
- Purpose: Get specific type definitions (fallback when validation fails)
- Input:
type_name
(string) - Output: Type definition with minimal token usage (depth=0)
- Usage: Only used when validation fails and more type info is needed
- Purpose: Validate GraphQL query syntax and schema compatibility
- Input:
query
(string) - plain text, auto-cleans formatting - Output: Validation result with detailed error messages
- Usage: Always called before query execution
- Purpose: Execute validated GraphQL queries
- Input:
query
(string), optionalvariables
(dict) - Output: Query results or execution errors
- Usage: Called after successful validation to get actual data
# Required
export OPENAI_API_KEY="your-openai-api-key"
# Optional
export LLM_MODEL="gpt-4o" # Default model
export PORT="8000" # Server port
from graphql_agent import create_graphql_toolkit
# With custom headers
headers = {
"Authorization": "Bearer your-token",
"X-API-Key": "your-api-key"
}
toolkit = create_graphql_toolkit(
endpoint="https://your-graphql-endpoint.com/graphql",
entity_schema=schema_content,
headers=headers
)
The toolkit automatically caches GraphQL schemas for performance:
from graphql_agent.base import GraphQLSource
source = GraphQLSource(
endpoint="https://api.example.com/graphql",
entity_schema=schema_content,
schema_cache_ttl=3600 # Cache for 1 hour
)
Important: The example prompts are specifically tailored for the SubQuery Network example to help the LLM accurately determine its capabilities. You should customize the prompt for your specific project:
# Customize this prompt for your project's domain
prompt_template = """You are a GraphQL assistant specialized in [YOUR PROJECT] data queries. You can help users find information about:
- [List your project's main entities and use cases]
- [Specific data types your project indexes]
- [Key relationships and metrics available]
Available tools: {tools}
Tool names: {tool_names}
IMPORTANT: Before using any tools, evaluate if the user's question relates to [YOUR PROJECT] data.
IF NOT RELATED to [YOUR PROJECT] (general questions, other projects, personal advice, etc.):
- DO NOT use any tools
- Politely decline with: "I'm specialized in [YOUR PROJECT] data queries. I can help you with [list key capabilities], but I cannot assist with [their topic]. Please ask me about [YOUR PROJECT] data instead."
IF RELATED to [YOUR PROJECT] data:
[Rest of workflow remains the same]
"""
Why Domain-Specific Prompts Matter:
- Better Boundary Recognition: LLM can accurately determine when it should/shouldn't help
- Improved User Experience: Clear communication about capabilities and limitations
- Reduced Hallucination: LLM won't attempt to answer questions outside its domain
- Professional Responses: Consistent, helpful decline messages for out-of-scope requests
Example Customizations:
- DeFi Project: "specialized in DeFi protocol data... trading volumes, liquidity pools, yield farming..."
- NFT Marketplace: "specialized in NFT marketplace data... collections, sales, floor prices..."
- Gaming Project: "specialized in blockchain gaming data... players, items, achievements..."
subql-graphql-agent/
├── graphql_agent/ # Core toolkit package
│ ├── __init__.py # Package exports
│ ├── base.py # GraphQLSource and GraphQLToolkit
│ ├── tools.py # Individual GraphQL tools
│ └── graphql.py # Schema processing utilities
├── examples/ # Usage examples
│ ├── working_example.py # Interactive agent demo
│ ├── server.py # OpenAI-compatible API server
│ └── schema.graphql # SubQuery entity schema
└── pyproject.toml # Dependencies and configuration
python-dotenv>=1.0.0
- Environment variable loadingfastapi>=0.109.0
- Web framework for API serveruvicorn>=0.27.0
- ASGI serverpydantic>=2.6.0
- Data validationhttpx>=0.27.0
- HTTP clientaiohttp>=3.9.0
- Async HTTP requestsgraphql-core>=3.2.0
- GraphQL query parsing and validation
langchain>=0.1.0
- Agent frameworklangchain-core>=0.1.0
- Core componentslangchain-openai>=0.1.0
- OpenAI integration
pytest>=8.4.1
- Testing framework
Run the test suite:
pytest tests/ -v
The project uses Ruff for linting and formatting:
# Lint
ruff check .
# Format
ruff format .
The toolkit includes comprehensive error handling:
- GraphQL endpoint connectivity problems
- Timeout handling for long-running queries
- Automatic retry for transient failures
- Invalid GraphQL syntax detection
- Schema validation with detailed error messages
- Field existence verification
- Iteration limits with intelligent fallback
- Time limits with partial result extraction
- Graceful handling of incomplete responses
- Always use pagination (
first: N
) for collection queries - Limit nested relationship depth to avoid expensive queries
- Use specific field selection rather than querying all fields
- Consider using
offset
for simple pagination scenarios
- GraphQL schema introspection results are cached (1 hour TTL)
- Entity schema is loaded once per toolkit instance
- No query result caching (always fresh data)
- Connection pooling for HTTP requests
- Automatic cleanup of resources
- Memory-efficient schema processing
Feature | SubQL GraphQL Agent | Generic GraphQL Tools | SQL Agents |
---|---|---|---|
Domain Specialization | ✅ SubQuery SDK | ❌ Generic | ❌ Database only |
Natural Language | ✅ Full support | ✅ SQL focused | |
Schema Understanding | ✅ PostGraphile + Entity | ✅ Table schemas | |
Query Validation | ✅ Pre-execution | ✅ SQL validation | |
Relationship Handling | ✅ @derivedFrom aware | ❌ Manual | ✅ Foreign keys |
API Compatibility | ✅ OpenAI compatible | ❌ Custom only | ❌ Database specific |
This project is licensed under the same terms as the parent project.
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Run linting and tests
- Submit a pull request
Run the comprehensive test suite:
pytest tests/test_graphql_agent.py -v
Test coverage includes:
- ✅ Toolkit creation and configuration
- ✅ Schema info tool functionality
- ✅ Query validation with enhanced schema checking
- ✅ Query execution and error handling
- ✅ Complete workflow testing
Test the GraphQL tools directly:
import asyncio
from graphql_agent import create_graphql_toolkit
async def test_tools():
"""Test GraphQL tools directly."""
endpoint = "https://index-api.onfinality.io/sq/subquery/subquery-mainnet"
# Load entity schema (learn more: https://subquery.network/doc/indexer/build/graphql.html)
with open("examples/schema.graphql", 'r') as f:
entity_schema = f.read()
# Create toolkit
toolkit = create_graphql_toolkit(endpoint, entity_schema)
tools = toolkit.get_tools()
print(f"Available tools: {len(tools)}")
for tool in tools:
print(f"- {tool.name}: {tool.description}")
# Test schema info
schema_tool = tools[0]
result = await schema_tool._arun()
print(f"\nSchema info: {result[:200]}...")
asyncio.run(test_tools())
Test the GraphQL endpoint directly:
curl -X POST https://index-api.onfinality.io/sq/subquery/subquery-mainnet \
-H "Content-Type: application/json" \
-d '{"query": "{ indexers(first: 1) { nodes { id } } }"}'
# Error: No module named 'langchain_openai'
uv add langchain-openai
# Error: No module named 'graphql'
uv add graphql-core
# Error: "Invalid API key"
export OPENAI_API_KEY="sk-your-actual-key"
# Verify API key works
python -c "from langchain_openai import ChatOpenAI; print(ChatOpenAI().invoke('Hello'))"
# Error: "GraphQL query failed"
# Check internet connection and endpoint
curl -I https://index-api.onfinality.io/sq/subquery/subquery-mainnet
Problem: Agent validation passes but execution doesn't happen Solution: Updated prompts now emphasize that validation is NOT the final answer
Problem: Agent tries to use invalid "skip" action
Solution: Fixed prompt format to go directly to Final Answer for non-relevant queries
Problem: Agent reaches iteration limit Solution: Prompt now includes mandatory execution step after validation
# Error: "attempted relative import with no known parent package"
# Make sure to run from correct directory
cd examples
python working_example.py
Enable verbose logging to see agent reasoning:
executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True, # Shows tool selections and reasoning
max_iterations=10,
return_intermediate_steps=True
)
- Schema Caching: Schemas are automatically cached for 1 hour
- Query Optimization: Use pagination and specific field selection
- Model Selection: gpt-4o or stronger models recommended for best performance (gpt-4o-mini works but may have limitations)
- Rate Limiting: Monitor OpenAI API usage to avoid limits
# Production environment variables
export OPENAI_API_KEY="your-production-key"
export LLM_MODEL="gpt-4o"
export PORT="8000"
# Optional: Custom endpoint and headers
export GRAPHQL_ENDPOINT="https://your-custom-endpoint.com/graphql"
export GRAPHQL_HEADERS='{"Authorization": "Bearer token"}'
- Input Validation: All user inputs are validated before processing
- Query Sanitization: GraphQL queries are validated against schema
- Rate Limiting: Implement API rate limits for production use
- Error Handling: Sensitive information is not exposed in error messages
Key metrics to monitor:
- Query success/failure rates
- Average response times
- OpenAI API usage and costs
- GraphQL endpoint health
- Agent reasoning quality
- Horizontal Scaling: Multiple server instances with load balancing
- Caching Strategy: Redis for schema and query result caching
- Connection Pooling: Efficient HTTP connection management
- Resource Limits: Memory and CPU limits for agent execution
This project demonstrates several key technical achievements:
- ✅ Entity Schema Integration: Combines PostGraphile patterns with custom entity definitions
- ✅ Intelligent Query Construction: Automatically generates optimal GraphQL queries
- ✅ Schema Validation: Pre-execution validation prevents runtime errors
- ✅ Domain Specialization: Focused on SubQuery Network terminology and concepts
- ✅ Context Awareness: Understands relationships between indexers, projects, and rewards
- ✅ Error Recovery: Graceful handling of invalid queries with helpful suggestions
- ✅ OpenAI Compatibility: Standard API format for easy integration
- ✅ Streaming Support: Real-time response streaming for better UX
- ✅ Comprehensive Error Handling: Robust error detection and user feedback
- ✅ Easy Integration: Simple toolkit creation with minimal setup
- ✅ Flexible Usage: Both interactive and API modes supported
- ✅ Extensive Documentation: Complete examples and troubleshooting guides
- Conversation Memory: Multi-turn conversation support
- Query Optimization: Automatic performance optimization
- Custom Validators: Domain-specific validation rules
- Enhanced Caching: Intelligent query result caching
- Multi-language Support: Support for additional natural languages
- Visual Query Builder: Web-based query construction interface
- Analytics Dashboard: Query performance and usage analytics
- Plugin Architecture: Extensible tool system for custom domains
For issues and questions:
- Documentation: Check this README and example code in
examples/
- Troubleshooting: Review the troubleshooting section above
- Testing: Run the test suite to verify installation
- Issues: Open a GitHub issue with detailed information about your use case
Include this information when reporting issues:
- Python version and OS
- Error messages and stack traces
- Steps to reproduce the problem
- Expected vs actual behavior
Built for SubQuery Network - Specialized GraphQL agent toolkit for blockchain indexing and staking data.