A comprehensive Node.js-based monitoring system for the XDC Network. This application provides real-time monitoring of blockchain infrastructure with a focus on RPC endpoint monitoring, port monitoring, block propagation, alerting, and visualization.
- RPC URL Monitoring: Mainnet and Testnet endpoint monitoring, downtime detection, latency measurement, peer count analysis
- Multi-RPC Monitoring: Monitor multiple endpoints simultaneously, compare response times, adaptive monitoring frequency
- Advanced Connection Point Checks: HTTP/HTTPS port checks, WebSocket port checks, subscription testing, batch processing
- Intelligent Endpoint Management:
- Priority-based recovery detection for faster reaction to endpoint issues
- Dynamic frequency adjustment based on endpoint health
- Peer count monitoring with dynamic baselines and anomaly detection
- Multi-method verification with primary and fallback strategies
- Block Propagation Monitoring: Block time tracking, slow block detection
- Transaction Monitoring: Automated transaction testing, smart contract deployment testing
- Consensus Monitoring: Masternode performance tracking, epoch transitions, validator penalties
- Alert System: Dashboard alerts, Telegram notifications, webhook notifications
- Metrics Collection: InfluxDB time-series database, Grafana dashboards
The XDC Monitor has been optimized with a modular, maintainable architecture:
- Shared Constants: Configuration values are centralized in the
common/constants
directory - Enhanced Queue System: Resilient job processing with retry, timeout, and prioritization
- Time-Series Data Management: Efficient time window data structures for metrics
- Modular Services: Clean separation of concerns with specialized service modules
- Consensus Monitoring: Specialized monitors for miners, epochs, and rewards
- Batched Processing: Transaction processing uses parallel batching for higher throughput
- Priority-based Queue: Critical operations (like mainnet block processing) get priority
- Efficient Memory Usage: Time-window data structures automatically clean up old data
- Smart Error Handling: Automatic retry with exponential backoff for transient failures
- Code Optimization: Helper methods reduce duplication and improve maintainability
- DRY Principle: Don't Repeat Yourself approach for alert classification and formatting
- Sliding Window Data: Memory-efficient approach for tracking recent state without database overhead
- Framework: NestJS for enterprise-grade dependency injection and modular architecture
- Time Series DB: InfluxDB for efficient storage and querying of time-series metrics
- Visualization: Grafana dashboards for real-time monitoring and alerting
- Container Support: Docker and Docker Compose for easy deployment and scaling
The XDC Monitor includes a comprehensive alert system to notify you of important network events.
The system monitors the following conditions:
-
Average Block Time
- Alerts when the average block time over the last 100 blocks exceeds 2.5 seconds
- Severity: Warning
- Component: blockchain
- Threshold: 2.5 seconds
-
Transaction Errors
- Alerts when more than 3 failed transactions are detected across all blocks in a 5-minute period
- Severity: Warning
- Component: transactions
- Threshold: 3 failed transactions in 5 minutes
-
High Transaction Volume
- Alerts when more than 2000 transactions are processed within a 5-minute period
- Severity: Info
- Component: transactions
- Threshold: 2000 transactions per 5 minutes
-
RPC Response Time
- Alerts when an RPC endpoint takes more than 30 seconds to respond
- Severity: Critical
- Component: rpc
- Threshold: 30 seconds (30,000 ms)
-
Transaction Test Failures
- Alerts when test transactions (normal or contract deployment) consistently fail
- Severity: Warning
- Component: transactions
- Threshold: 3 consecutive failures
-
Test Wallet Balance
- Alerts when test wallet balance falls below the required minimum (0.01 XDC)
- Severity: Warning
- Component: wallet
- Threshold: 0.01 XDC
-
Penalty List Size
- Alerts when the validator penalty list exceeds a configured threshold
- Severity: Warning
- Component: consensus
- Threshold: 20 validators
-
Frequently Penalized Nodes
- Alerts when validators appear in the penalty list too frequently
- Severity: Warning
- Component: consensus
- Threshold: Penalized in 70% or more of recent epochs
Alerts are delivered through multiple channels:
- Grafana UI: Dashboard alerts appear in the Grafana UI (controlled by
ENABLE_DASHBOARD_ALERTS
) - Telegram: Alerts sent to a configured Telegram chat (controlled by
ENABLE_CHAT_NOTIFICATIONS
) - Webhook: Alerts sent to an external service via webhook (controlled by
ENABLE_CHAT_NOTIFICATIONS
and requiresNOTIFICATION_WEBHOOK_URL
) - Server Logs: All alerts are logged in the server's logs
Configure alerts in your .env
file:
# Enable/disable alert channels
ENABLE_DASHBOARD_ALERTS=true
ENABLE_CHAT_NOTIFICATIONS=true
# Telegram configuration
TELEGRAM_BOT_TOKEN="your-telegram-bot-token-here"
TELEGRAM_CHAT_ID="your-telegram-chat-id-here"
TELEGRAM_MAINNET_TOPIC_ID="topic-id-for-mainnet-alerts"
TELEGRAM_TESTNET_TOPIC_ID="topic-id-for-testnet-alerts"
# Webhook configuration
NOTIFICATION_WEBHOOK_URL="https://hooks.slack.com/services/XXX/YYY/ZZZ"
The system uses these environment variables to control alert behavior:
ENABLE_DASHBOARD_ALERTS
: Controls Grafana dashboard alertsENABLE_CHAT_NOTIFICATIONS
: Controls external notifications (Telegram and webhook)TELEGRAM_BOT_TOKEN
&TELEGRAM_CHAT_ID
: Required for Telegram notificationsNOTIFICATION_WEBHOOK_URL
: URL to send webhook alerts (for Slack, Discord, etc.)
You can test the alert system using these API endpoints:
# Test all alerts at once
curl http://your-server:3000/api/testing/trigger-all-alerts
# Test specific alert types
curl http://your-server:3000/api/testing/trigger-alert/block-time
curl http://your-server:3000/api/testing/trigger-alert/tx-errors
curl http://your-server:3000/api/testing/trigger-alert/tx-volume
curl http://your-server:3000/api/testing/trigger-alert/rpc-time
This project uses GitHub Actions for continuous integration and deployment:
The CI workflow consists of three jobs:
-
Validate: Builds and tests the application
- Runs on pushes to
main
andstaging
branches - Runs on pull requests to
main
andstaging
branches - Checks code, builds the application, and verifies the Docker image
- Runs on pushes to
-
Publish: Publishes Docker images
- Triggered only on pushes to
main
andstaging
- Publishes to GitHub Container Registry with appropriate tags
- Triggered only on pushes to
The staging deployment workflow:
- Triggered when pull requests are merged to the
staging
branch - Deploys the application to the staging server via SSH
- Sets up the environment with proper configuration
- Restarts services using Docker Compose
The published Docker images can be pulled from GitHub Container Registry:
# Pull the latest main branch image
docker pull ghcr.io/[organization]/xdc-monitor:main
# Pull a specific commit
docker pull ghcr.io/[organization]/xdc-monitor:sha-abcdef
- Node.js 16.x or higher
- npm or yarn package manager
- Access to XDC Network RPC endpoints
- Docker and Docker Compose (for full stack deployment)
-
Clone the repository:
git clone https://github.com/yourusername/xdc-monitor.git cd xdc-monitor
-
Install dependencies:
npm install
-
Configure the application by creating a
.env
file (see Configuration section) -
Build the application:
npm run build
The project uses environment variables for configuration. Create a .env
file in the project root with the following variables:
# General configuration
BLOCKS_TO_SCAN=10
SCAN_INTERVAL=15
# Monitoring features
ENABLE_RPC_MONITORING=true
ENABLE_PORT_MONITORING=true
ENABLE_BLOCK_MONITORING=true
ENABLE_TRANSACTION_MONITORING=true
ENABLE_CONSENSUS_MONITORING=true
BLOCK_TIME_THRESHOLD=3.0
# Consensus monitoring configuration
CONSENSUS_MONITORING_CHAIN_IDS=50,51
CONSENSUS_SCAN_INTERVAL=15000
# Alert configuration
ENABLE_DASHBOARD_ALERTS=true
ENABLE_CHAT_NOTIFICATIONS=true
NOTIFICATION_WEBHOOK_URL=
# Telegram notification configuration
TELEGRAM_BOT_TOKEN="your-telegram-bot-token-here"
TELEGRAM_CHAT_ID="your-telegram-chat-id-here"
TELEGRAM_MAINNET_TOPIC_ID="topic-id-for-mainnet-alerts"
TELEGRAM_TESTNET_TOPIC_ID="topic-id-for-testnet-alerts"
# Logging configuration
LOG_LEVEL=info
# InfluxDB Configuration
INFLUXDB_URL=http://localhost:8086
INFLUXDB_TOKEN=your-influxdb-token
INFLUXDB_ORG=xdc
INFLUXDB_BUCKET=xdc_metrics
INFLUXDB_ADMIN_USER=admin
INFLUXDB_ADMIN_PASSWORD=secure-password
# Grafana Admin Credentials
GRAFANA_ADMIN_USER=admin
GRAFANA_ADMIN_PASSWORD=secure-password
# Transaction monitoring configuration
ENABLE_TRANSACTION_MONITORING=true
MAINNET_TEST_PRIVATE_KEY=your-test-wallet-private-key-for-mainnet
TESTNET_TEST_PRIVATE_KEY=your-test-wallet-private-key-for-testnet
TEST_RECEIVER_ADDRESS_50=0xReceiverAddressForMainnet
TEST_RECEIVER_ADDRESS_51=0xReceiverAddressForTestnet
The NOTIFICATION_WEBHOOK_URL
configuration allows you to send alert notifications to external services. You can use any webhook-compatible service:
-
Slack Incoming Webhook:
- Create a webhook URL in your Slack workspace (Apps → Create app → Incoming Webhooks)
- Example:
https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXX
-
Discord Webhook:
- Create a webhook URL in your Discord server (Channel settings → Integrations → Webhooks)
- Example:
https://discord.com/api/webhooks/000000000000000000/XXXX
-
Microsoft Teams Webhook:
- Create a webhook in your Teams channel (... menu → Connectors → Incoming Webhook)
-
Custom Webhook Endpoint:
- Any HTTP endpoint that accepts JSON POST requests with alert data
When configured, the system will POST JSON data containing alert information to this URL whenever monitoring conditions trigger an alert.
npm run start:prod
docker-compose up -d
This will start all services:
- XDC Monitor (API and monitoring)
- InfluxDB (metrics storage)
- Grafana (visualization)
# Run only InfluxDB
docker-compose up -d influxdb
# Run only Grafana
docker-compose up -d grafana
For convenience, this project includes a helper script to manage various deployment scenarios:
# Make the script executable (first time only)
chmod +x run.sh
# Show available commands
./run.sh help
# Start the complete stack
./run.sh up
# View logs
./run.sh logs
# Clear influxdb data
./run.sh clear-influxdb
# Rebuild containers (after code changes)
./run.sh rebuild
# Clean up all containers, volumes and networks (fixes Docker issues)
./run.sh clean
- Block Status:
/api/monitoring/block-status
- Current block monitoring information - Block Comparison:
/api/monitoring/block-comparison
- Comparison of block heights across RPCs - RPC Status:
/api/monitoring/rpc-status
- Status of all RPC endpoints - WebSocket Status:
/api/monitoring/websocket-status
- Status of WebSocket connections - Transaction Status:
/api/monitoring/transaction-status
- Status of transaction monitoring - Overall Status:
/api/monitoring/status
- Combined status of all monitoring systems - Notifications Test:
/api/notifications/test
- Test the notification system - Telegram Webhook:
/api/notifications/telegram
- Endpoint for Grafana to send alerts
- Trigger Manual Alert:
/api/testing/trigger-manual-alert?type=error&title=Title&message=Message
- Directly trigger an alert - Simulate Slow Block Time:
/api/testing/simulate-slow-blocktime?seconds=4
- Simulate a slow block time - Simulate RPC Down:
/api/testing/simulate-rpc-down?endpoint=URL
- Simulate an RPC endpoint being down - Simulate RPC Latency:
/api/testing/simulate-rpc-latency?endpoint=URL&latency=500
- Simulate high RPC latency - Run Transaction Test:
/api/testing/run-transaction-test?chainId=50&type=normal
- Manually trigger a transaction test - Test Telegram Topics:
/api/testing/test-telegram-topics
- Test sending alerts to different Telegram topics (Mainnet/Testnet/General) - Generate Weekly Report:
/api/testing/generate-weekly-report?startDays=7&endDays=0
- Generate a detailed weekly alert report as JSON - Get Weekly Report Message:
/api/testing/weekly-report-message?startDays=7&endDays=0
- Get the formatted message that would be sent to Telegram - Send Weekly Report:
/api/testing/send-weekly-report?startDays=7&endDays=0
- Generate and send a weekly report to all configured channels
The application stores the following metrics in InfluxDB:
block_height
- Current block height, tagged withchainId
andendpoint
transaction_count
- Transaction counts by status, tagged withstatus
andchainId
transactions_per_block
- Transactions per block, tagged withstatus
,block_number
, andchainId
rpc_latency
- Response time of RPC endpoints in ms, tagged withendpoint
andchainId
rpc_status
- Status of RPC endpoints (1=up, 0=down), tagged withendpoint
andchainId
websocket_status
- Status of WebSocket endpoints (1=up, 0=down), tagged withendpoint
andchainId
explorer_status
- Status of explorer endpoints (1=up, 0=down), tagged withendpoint
andchainId
faucet_status
- Status of faucet endpoints (1=up, 0=down), tagged withendpoint
andchainId
block_time
- Time between blocks in seconds, tagged withchainId
transaction_monitor
- Transaction test results, tagged withtype
,chainId
, andrpc
transaction_monitor_confirmation_time
- Transaction confirmation time in ms, tagged withtype
,chainId
, andrpc
wallet_balance
- Test wallet balances, tagged withchainId
, with a field for sufficient balancevalidator_summary
- Summary metrics for validators, tagged withchainId
validator_nodes
- Count of masternodes, standbynodes, and penalty nodesconsensus_missed_rounds
- Tracks missed mining rounds with detailed informationconsensus_timeout_periods
- Records timeout periods between blocks with duration and miners skippedconsensus_miner_performance
- Complete mining performance data by validator
The system includes comprehensive transaction monitoring capabilities:
- Automated Testing: Regularly runs test transactions on all active RPC endpoints
- Test Types: Includes both normal value transfers and smart contract deployments
- Multi-Chain Support: Tests both Mainnet (chainId 50) and Testnet (chainId 51)
- Wallet Management: Continuously monitors test wallet balances
- Performance Metrics: Tracks transaction confirmation times and success rates
To use transaction monitoring, you need:
- Test wallets with private keys specified in the configuration
- Sufficient balance in each test wallet (minimum 0.01 XDC)
- Receiver addresses for test transactions
Test transactions are executed every 5 minutes by default, with metrics being recorded in InfluxDB and visualized in Grafana.
The application features a comprehensive alert and reporting system for monitoring blockchain health.
- Multi-level Severity: Alerts are categorized as
error
,warning
, orinfo
- Network-specific Alerting: Alerts can be associated with specific chains (Mainnet or Testnet)
- Component Attribution: Alerts include the source component that triggered them
- Multi-channel Delivery: Supports sending alerts to Telegram, webhooks, and the dashboard
- Intelligent Throttling: Prevents alert floods by limiting frequency of similar alerts
- Smart Alert Classification: Automatically determines network association through chainId and content pattern matching
- Topic-based Routing:
- Alerts for Mainnet (chainId=50) route to a dedicated Mainnet topic
- Alerts for Testnet (chainId=51) route to a dedicated Testnet topic
- General alerts go to the main conversation thread
- Formatted Messages: Clear, well-formatted messages with emoji indicators and detailed information
- HTML Formatting: Uses HTML formatting with monospace tables and Unicode box-drawing characters for bordered tables
The system automatically generates weekly alert reports that provide insights into system health:
-
Comprehensive Statistics:
- Total alert counts by severity (error/warning/info)
- Breakdown by network (Mainnet/Testnet/Other)
- Component-specific analytics
- Most frequent alert types
-
Manual Report Generation:
- Generate reports for custom date ranges
- Get formatted messages for communication channels
- Trigger immediate report delivery to configured channels
-
Report Archiving:
- System maintains the last 4 weeks of reports
- Data is stored in both memory and InfluxDB for reliability
The system uses a robust approach to classify alerts by network:
-
Primary Classification: Uses the chainId field when available (chainId=50 for Mainnet, chainId=51 for Testnet)
-
Pattern-Based Classification: For legacy alerts or those without chainId, analyzes alert title and message content for patterns:
- Mainnet indicators: "mainnet", "chain 50", "chainId 50"
- Testnet indicators: "testnet", "chain 51", "chainId 51"
-
Fallback Category: Alerts that can't be classified as either Mainnet or Testnet are categorized as "Other"
This approach ensures accurate network classification for all alerts, regardless of how they were created.
Weekly reports are displayed using a modular, optimized structure:
- Network-Specific Sections: Dedicated sections for Mainnet, Testnet, and Other alerts
- Severity Tables: Clear breakdown of errors, warnings, and info alerts per network
- Component Tables: Details of affected components with alert counts by severity
- Bordered Tables: All tables use Unicode box-drawing characters for clear visual structure
- Most Frequent Alerts: Summary of the most common alert types across all networks
The system monitors for various alert conditions:
- Block Time Alerts: Warnings when block time exceeds thresholds
- Transaction Error Alerts: Notifications of high transaction error rates
- RPC Response Time Alerts: Alerts for slow or non-responsive RPC endpoints
- High Transaction Volume Alerts: Notifications of unusual transaction activity
- Consensus Alerts: Notifications about consensus issues like missed rounds and validator penalties
You can test the alert system using the testing endpoints:
- Manually trigger individual alerts with
/api/testing/trigger-manual-alert
- Test all alert types at once with
/api/testing/trigger-all-alerts
- Test specific alert categories with
/api/testing/trigger-alert/{type}
- Test network-specific routing with
/api/testing/test-telegram-topics
- Generate and view weekly reports with
/api/testing/generate-weekly-report
- Get formatted report messages with
/api/testing/weekly-report-message
- Send weekly reports to all channels with
/api/testing/send-weekly-report
The project uses InfluxDB for storing metrics and Grafana for visualization. The integration is configured automatically when you start the Docker containers.
This project uses a special approach to manage Grafana configurations:
- The actual Grafana data is stored in
grafana_data/
(ignored by Git) - Version-controlled configurations are stored in
grafana_config/
- Two helper commands synchronize between these directories:
# Export your current Grafana configurations to the version-controlled directory
./run.sh grafana-export
# Import the version-controlled configurations to your local Grafana
./run.sh grafana-import
This project uses several pieces of sensitive information that should never be committed to Git repositories:
- Telegram Bot Token: Used for alerting notifications
- Telegram Chat ID: Identifies where alerts are sent
- API Keys and Tokens: Any other authentication tokens
- Database Credentials: If using external databases
- Private Keys: Never commit wallet private keys to the repository
- Always use
.env
files for sensitive information - Never commit the actual
.env
file to Git - Provide an
.env.example
file with dummy values as a template
- For configs that might contain sensitive data (like alerting configs), use template files
- Name template files with
.example
suffix (e.g.,alertmanager.example.yaml
) - Ensure your
.gitignore
excludes real config files but includes examples
- Clone the repository
- Copy example files:
cp .env.example .env cp grafana_data/provisioning/alerting/alertmanager.example.yaml grafana_data/provisioning/alerting/alertmanager.yaml cp grafana_data/provisioning/alerting/rules.example.yaml grafana_data/provisioning/alerting/rules.yaml
- Fill in your actual credentials in the
.env
file - Do NOT commit your changes to the configuration files
Regularly rotate credentials, especially if:
- Someone leaves the development team
- You suspect a credential has been compromised
- It has been a long time since the last rotation
To use the CI/CD workflows, you need to set up these secrets in your GitHub repository:
STAGING_SSH_KEY
: Private SSH key for connecting to staging serverSTAGING_HOST
: Hostname or IP address of staging serverSTAGING_USER
: Username for SSH connection to staging serverSTAGING_DEPLOY_PATH
: (Optional) Path where the application should be deployedSTAGING_INFLUXDB_TOKEN
: InfluxDB authentication tokenSTAGING_INFLUXDB_ORG
: InfluxDB organization nameSTAGING_INFLUXDB_BUCKET
: InfluxDB bucket nameSTAGING_INFLUXDB_ADMIN_USER
: InfluxDB admin usernameSTAGING_INFLUXDB_ADMIN_PASSWORD
: InfluxDB admin passwordSTAGING_TELEGRAM_BOT_TOKEN
: Telegram bot token for notificationsSTAGING_TELEGRAM_CHAT_ID
: Telegram chat ID for notificationsSTAGING_GRAFANA_ADMIN_USER
: Grafana admin usernameSTAGING_GRAFANA_ADMIN_PASSWORD
: Grafana admin password
The project follows a clean, modular architecture:
src/
├── common/ # Shared code across the entire application
│ ├── constants/ # Configuration constants and defaults
│ │ ├── config.ts # Core configuration constants
│ │ ├── endpoints.ts # Network endpoints definitions
│ │ └── monitoring.ts # Monitoring thresholds and settings
│ └── utils/ # Utility classes and helper functions
├── types/ # TypeScript type definitions
│ ├── blockchain/ # Blockchain data structures
│ ├── monitoring/ # Monitoring configuration interfaces
│ └── rpc/ # RPC endpoints and configuration
├── config/ # Configuration module and service
│ ├── config.module.ts # Configuration module definition
│ └── config.service.ts # Service for accessing configuration
├── blockchain/ # Blockchain interaction services
├── monitoring/ # Core monitoring services
│ ├── alerts.service.ts # Alert configuration and delivery
│ ├── blocks.monitor.ts # Block monitoring implementation
│ ├── rpc.monitor.ts # RPC endpoint monitoring
│ ├── transaction.monitor.ts # Transaction monitoring implementation
│ ├── consensus/ # Consensus monitoring services
│ │ ├── consensus.monitor.ts # Main consensus orchestration service
│ │ ├── miner/ # Masternode mining monitoring
│ │ ├── epoch/ # Epoch and penalty tracking
│ │ └── reward/ # Reward distribution monitoring
│ ├── monitoring.controller.ts # API endpoints for monitoring data
│ ├── notification.controller.ts # Notification endpoints
│ └── testing.controller.ts # Testing endpoints
└── metrics/ # Metrics collection and reporting
-
Environment Variables: Defined in
.env
file with examples in.env.example
-
Config Constants: Centralized in
src/common/constants/config.ts
ENV_VARS
: Mapping of all environment variable namesFEATURE_FLAGS
: Feature toggles for different parts of the systemDEFAULTS
: Default values when environment variables are missingALERTS
: Alert thresholds and configuration
-
Configuration Service: Implemented in
src/config/config.service.ts
- Loads configuration from environment variables
- Provides typed access with validation
- Handles defaults and fallbacks
-
Interfaces: Structured type definitions
MonitoringConfig
: Configuration for all monitoring componentsAlertNotificationConfig
: Configuration for notification channelsInfluxDbConfig
: Configuration for InfluxDB metrics storage
The application includes several powerful utilities:
-
EnhancedQueue: For reliable processing of blocks and transactions
- Priorities for critical tasks
- Automatic retry of failed operations
- Concurrency control and timeout handling
-
TimeWindowData: Efficient time-series data management
- Automatic cleanup of outdated points
- Statistical functions (min, max, average)
- Memory-efficient storage
-
AlertManager: Centralized alert management
- Multiple delivery channels (Telegram, webhook, dashboard)
- Alert throttling to prevent notification storms
- Severity-based prioritization
- Network classification for targeted routing
-
Modular Helpers: Optimized code structure with reusable components
- Smart network detection for Mainnet/Testnet classification
- Standardized table formatting for consistent display
- Component aggregation for detailed reporting
- Runtime optimization through code reuse
-
ConsensusMonitor: Orchestration for consensus monitoring
- Coordinated initialization of component monitors
- Centralized validator data management
- Complete round-trip monitoring for consensus violations
- Sliding window approach for memory-efficient state tracking
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.