Superstream Kafka Analyzer

Interactive CLI for analyzing Kafka health and configuration according to best practices and industry standards.

Made with ❤️ by the Superstream Team

📚 Table of Contents

Features
Prerequisites
Installation
Quick Start
Configuration File Examples
Email Collection
Output Formats
Health Checks
Required Permissions
Analytics & Location Tracking
Validation Process
Output Structure
Development
Testing
Configuration Reference
Troubleshooting
License
Contributing
Support

🚀 Features

Interactive CLI Interface - User-friendly prompts for configuration
Configuration File Support - Load settings from JSON config files
Multi-Layer Validation - Comprehensive connection and security testing
Security Protocol Support - PLAINTEXT, SSL/TLS, SASL, and OIDC authentication
Multiple Output Formats - JSON, CSV, HTML, and TXT reports
Real-time Progress - Visual feedback during analysis
Error Handling - Detailed troubleshooting information
Cross-platform - Works on Windows, macOS, and Linux

📋 Prerequisites

Node.js 16.0.0 or higher
Access to a Kafka cluster

🛠️ Installation

No installation required! Run directly with npx:

npm install -g superstream-kafka-analyzer

🎯 Quick Start

Interactive Mode

# Interactive mode (recommended for first-time users)
npx superstream-kafka-analyzer

Configuration File Mode

# Using a configuration file
npx superstream-kafka-analyzer --config config.json

Configuration File Examples

Available Examples: The full list is under the ./config-examples/ folder:

Basic Configuration - Simple localhost setup
SASL Authentication - Generic SASL setup
Apache Kafka - Apache Kafka with SASL
Apache Kafka (Plaintext) - Apache Kafka without authentication
Apache Kafka (SCRAM) - Apache Kafka with SCRAM authentication
AWS MSK (IAM) - AWS MSK with IAM authentication
AWS MSK (SCRAM) - AWS MSK with SCRAM authentication
Confluent Cloud - Confluent Cloud setup
Confluent Platform - Confluent Platform setup
Aiven Kafka - Aiven Kafka setup
Redpanda - Redpanda setup
OIDC Authentication - OpenID Connect authentication
Azure AD OAuth - Azure Active Directory
Keycloak OAuth - Keycloak OIDC
Okta OAuth - Okta OIDC
Auth0 OIDC - Auth0 authentication
Generic OAuth - Generic OAuth provider
With Timestamp - Include timestamp in filenames
Without Timestamp - No timestamp in filenames

Basic Configuration (config.example.json):

{
  "kafka": {
    "bootstrap_servers": "localhost:9092",
    "clientId": "superstream-analyzer",
    "vendor": "apache",
    "useSasl": false
  },
  "file": {
    "outputDir": "./kafka-analysis",
    "formats": ["html"],
    "includeMetadata": true,
    "includeTimestamp": true
  },
  "email": "[email protected]"
}

SASL Authentication (config.example.sasl.json):

{
  "kafka": {
    "bootstrap_servers": ["kafka1.example.com:9092", "kafka2.example.com:9092", "kafka3.example.com:9092"],
    "clientId": "superstream-analyzer",
    "vendor": "apache",
    "useSasl": true,
    "sasl": {
      "mechanism": "PLAIN",
      "username": "your-username",
      "password": "your-password"
    }
  },
  "file": {
    "outputDir": "./kafka-analysis",
    "formats": ["html"],
    "includeMetadata": true,
    "includeTimestamp": true
  },
  "email": "[email protected]"
}

AWS MSK with SCRAM (config.example.aws-msk.json):

{
  "kafka": {
    "bootstrap_servers": ["b-1.your-cluster.abc123.c2.kafka.us-east-1.amazonaws.com:9092"],
    "clientId": "superstream-analyzer",
    "vendor": "aws-msk",
    "useSasl": true,
    "sasl": {
      "mechanism": "SCRAM-SHA-512",
      "username": "your-msk-username",
      "password": "your-msk-password"
    }
  },
  "file": {
    "outputDir": "./kafka-analysis",
    "formats": ["html"],
    "includeMetadata": true,
    "includeTimestamp": true
  },
  "email": "[email protected]"
}

AWS MSK with IAM (config.example.aws-msk-iam.json):

{
  "kafka": {
    "bootstrap_servers": ["b-1.your-cluster.abc123.c2.kafka.us-east-1.amazonaws.com:9198"],
    "clientId": "superstream-analyzer",
    "vendor": "aws-msk",
    "useSasl": true,
    "sasl": {
      "mechanism": "oauthbearer"
    }
  },
  "file": {
    "outputDir": "./kafka-analysis",
    "formats": ["html"],
    "includeMetadata": true,
    "includeTimestamp": true
  },
  "email": "[email protected]"
}

Confluent Cloud (config.example.confluent-cloud.json):

{
  "kafka": {
    "bootstrap_servers": ["pkc-xxxxx.region.cloud:9092"],
    "clientId": "superstream-analyzer",
    "vendor": "confluent-cloud",
    "useSasl": true,
    "sasl": {
      "mechanism": "PLAIN",
      "username": "your-api-key",
      "password": "your-api-secret"
    }
  },
  "file": {
    "outputDir": "./kafka-analysis",
    "formats": ["html"],
    "includeMetadata": true,
    "includeTimestamp": true
  },
  "email": "[email protected]"
}

Aiven Kafka (config.example.aiven-kafka.json):

{
  "kafka": {
    "brokers": ["kafka-xxxxx-aiven-kafka.aivencloud.com:12345"],
    "clientId": "superstream-analyzer",
    "vendor": "aiven",
    "useSasl": true,
    "sasl": {
      "mechanism": "SCRAM-SHA-256",
      "username": "avnadmin",
      "password": "YOUR_AVNADMIN_PASSWORD"
    },
    "ssl": {
      "ca": "./path/to/ca.pem"
    }
  },
  "file": {
    "outputDir": "./kafka-analysis",
    "formats": ["json", "csv", "html", "txt"],
    "includeMetadata": true,
    "includeTimestamp": true
  },
  "email": "[email protected]"
}

{
  "kafka": {
    "bootstrap_servers": ["your-aiven-cluster.aivencloud.com:12345"],
    "clientId": "superstream-analyzer",
    "vendor": "aiven",
    "useSasl": true,
    "sasl": {
      "mechanism": "oauthbearer",
      "clientId": "your-client-id",
      "clientSecret": "your-client-secret",
      "host": "https://my-oauth-server.com",
      "path": "/oauth/token",
    }
  },
  "file": {
    "outputDir": "./kafka-analysis",
    "formats": ["html"],
    "includeMetadata": true,
    "includeTimestamp": true
  },
  "email": "[email protected]"
}

🔧 Command Line Options

Option	Description	Default
`--config <path>`	Path to configuration file	-

🔐 Security Protocols

PLAINTEXT (No Security)

# Default for local development
npx superstream-kafka-analyzer
# Configure bootstrap servers as: localhost:9092

SASL Authentication

# With SASL credentials
npx superstream-kafka-analyzer
# Configure SASL mechanism and credentials when prompted

OIDC Authentication (OpenID Connect)

The analyzer supports modern OIDC authentication with any OIDC-compliant identity provider including Azure AD, Keycloak, Okta, Auth0, and others.

# With OIDC authentication
npx superstream-kafka-analyzer --config config-oidc.json

Key Features:

Auto-discovery: Automatically discovers OIDC endpoints using well-known discovery documents
Token validation: Optional JWT token validation using JWKS
Multiple grant types: Support for client_credentials, password, and authorization_code flows
Token caching: Automatic token caching to reduce authentication overhead
Vendor-specific presets: Built-in configurations for popular providers

Quick Example:

{
  "kafka": {
    "brokers": ["kafka.example.com:9093"],
    "vendor": "oidc",
    "useSasl": true,
    "sasl": {
      "mechanism": "oauthbearer",
      "discoveryUrl": "https://auth.example.com/.well-known/openid-configuration",
      "clientId": "your-client-id",
      "clientSecret": "your-client-secret",
      "scope": "openid kafka:read",
      "grantType": "client_credentials"
    }
  }
}

📚 For detailed OIDC setup instructions, see:

OIDC Authentication Guide - Complete setup guide with examples for all major providers
Configuration Examples - Vendor-specific configuration templates

📊 Analysis Report

The tool generates comprehensive reports including:

Cluster Information

ZooKeepers details
Broker information (host, port, rack)
Analysis timestamp

Topic Analysis

Total topics and partitions
User vs internal topics
Replication factor distribution
Topic configurations
Error detection

Output Formats

JSON Format

Complete structured data including all cluster and topic information.

📄 View Example JSON Report

CSV Format

Tabular data for easy analysis in spreadsheet applications.

HTML Format

Beautiful formatted report with responsive design and styling.

📄 View Example HTML Report

TXT Format

Simple text summary for quick review.

📄 View Example TXT Report

🔍 Health Checks

The tool performs comprehensive health checks on your Kafka cluster to identify potential issues and provide recommendations:

AWS MSK Health Checks

Replication Factor vs Broker Count: Ensures topics don't have replication factor > broker count
Topic Partition Distribution: Checks for balanced partition distribution across topics
Consumer Group Health: Identifies consumer groups with no active members
Internal Topics Health: Verifies system topics are healthy
Under-Replicated Partitions: Checks if topics have fewer in-sync replicas than configured
Min In-Sync Replicas Configuration: Checks if topics have min.insync.replicas > replication factor
AWS MSK Specific Health: Checks MSK system topics (_amazon_msk*, __consumer_offsets)
Rack Awareness: Verifies rack awareness configuration for better availability
Replica Distribution: Ensures replicas are evenly distributed across brokers
Metrics Configuration: Checks Open Monitoring (port 11001) accessibility
Logging Configuration: Verifies LoggingInfo configuration via AWS SDK
Authentication Configuration: Detects if unauthenticated access is enabled (security risk)
Quotas Configuration: Checks if Kafka quotas are configured and being used
Payload Compression: Checks if payload compression is enabled on user topics
Infinite Retention Policy: Checks if any topics have infinite retention policy enabled

Confluent Cloud Health Checks

Replication Factor vs Broker Count: Ensures topics don't have replication factor > broker count
Topic Partition Distribution: Checks for balanced partition distribution across topics
Consumer Group Health: Identifies consumer groups with no active members
Internal Topics Health: Verifies system topics are healthy
Under-Replicated Partitions: Checks if topics have fewer in-sync replicas than configured
Rack Awareness: Checks rack awareness configuration for better availability
Replica Distribution: Ensures replicas are evenly distributed across brokers
Metrics Configuration: Verifies metrics accessibility
Logging Configuration: Confirms built-in logging availability
Authentication Configuration: Detects if unauthenticated access is enabled (security risk)
Quotas Configuration: Checks if Kafka quotas are configured and being used
Payload Compression: Checks if payload compression is enabled on user topics
Infinite Retention Policy: Checks if any topics have infinite retention policy enabled

Aiven Kafka Health Checks

Replication Factor vs Broker Count: Ensures topics don't have replication factor > broker count
Topic Partition Distribution: Checks for balanced partition distribution across topics
Consumer Group Health: Identifies consumer groups with no active members
Internal Topics Health: Verifies system topics are healthy
Under-Replicated Partitions: Checks if topics have fewer in-sync replicas than configured
Min In-Sync Replicas Configuration: Checks if topics have min.insync.replicas > replication factor
Rack Awareness: Checks rack awareness configuration for better availability
Replica Distribution: Ensures replicas are evenly distributed across brokers
Metrics Configuration: Verifies metrics accessibility
Logging Configuration: Confirms built-in logging availability
Authentication Configuration: Detects if unauthenticated access is enabled (security risk)
Quotas Configuration: Checks if Kafka quotas are configured and being used
Payload Compression: Checks if payload compression is enabled on user topics
Infinite Retention Policy: Checks if any topics have infinite retention policy enabled

Generic Kafka Health Checks

Replication Factor vs Broker Count: Ensures topics don't have replication factor > broker count
Topic Partition Distribution: Checks for balanced partition distribution across topics
Consumer Group Health: Identifies consumer groups with no active members
Internal Topics Health: Verifies system topics are healthy
Under-Replicated Partitions: Checks if topics have fewer in-sync replicas than configured
Min In-Sync Replicas Configuration: Checks if topics have min.insync.replicas > replication factor
Rack Awareness: Checks rack awareness configuration for better availability
Replica Distribution: Ensures replicas are evenly distributed across brokers
Metrics Configuration: Verifies JMX metrics configuration
Logging Configuration: Checks log4j configuration
Authentication Configuration: Detects if unauthenticated access is enabled (security risk)
Quotas Configuration: Checks if Kafka quotas are configured and being used
Payload Compression: Checks if payload compression is enabled on user topics
Infinite Retention Policy: Checks if any topics have infinite retention policy enabled

Health Check Status

✅ Pass: Configuration is healthy and optimal
⚠️ Warning: Configuration could be improved for better performance/security
❌ Failed: Critical issue that should be addressed
ℹ️ Info: Informational message with recommendations

🔍 Validation Process

The tool performs comprehensive validation in multiple phases:

Phase 1: Input Format Validation

Broker URL format validation
File system permissions
Output directory creation

Phase 2: Network Connectivity Testing

DNS resolution verification
TCP connection testing
Kafka cluster connectivity

Phase 3: Security Protocol Testing

SASL authentication verification
SSL/TLS certificate validation
Credential testing

Phase 4: Complete Setup Validation

End-to-end connection testing
File system write permissions
Output format generation testing

📁 Output Structure

kafka-analysis/
├── analysis-2024-01-15-14-30-25.json
├── analysis-2024-01-15-14-30-25.csv
├── analysis-2024-01-15-14-30-25.html
└── analysis-2024-01-15-14-30-25.txt

🛠️ Development

Project Structure

superstream-analyzer/
├── bin/
│   └── index.js          # CLI entry point
├── src/
│   ├── cli.js            # Main CLI logic
│   ├── kafka-client.js   # Kafka connection and analysis
│   ├── file-service.js   # File output handling
│   ├── validators.js     # Validation framework
│   └── utils.js          # Utility functions
├── config.example.json   # Basic configuration example
├── config.example.sasl.json # SASL configuration example
└── package.json

Local Development

# Clone and install dependencies
git clone <repository>
cd superstream-analyzer
npm install

# Run in development mode
npm run dev

# Test with local Kafka
npm run test:local

🧪 Testing

Manual Testing

# Test with local Kafka cluster
npx . --config config.example.json

# Test with SASL authentication
npx . --config config.example.sasl.json

Validation Testing

The tool includes comprehensive validation that will:

Test network connectivity
Verify authentication credentials
Validate file system permissions
Generate sample outputs

📝 Configuration Reference

Kafka Configuration

Field	Type	Required	Description
`bootstrap_servers`	string	Yes	Comma-separated list of Kafka bootstrap servers
`clientId`	string	Yes	Client identifier for Kafka connection
`vendor`	string	No	Kafka vendor (aws-msk, confluent-cloud, aiven, etc.)
`useSasl`	boolean	No	Enable SASL authentication
`sasl.mechanism`	string	No*	SASL mechanism (PLAIN, SCRAM-SHA-256, SCRAM-SHA-512)
`sasl.username`	string	No*	SASL username
`sasl.password`	string	No*	SASL password

*Required if useSasl is true

File Configuration

Field	Type	Required	Description
`outputDir`	string	Yes	Directory for output files
`formats`	array	Yes	Array of output formats (json, csv, html, txt)
`includeMetadata`	boolean	No	Include metadata in output files

Email Configuration

Field	Type	Required	Description
`email`	string	No	Email address for generating report files. If not provided, no file output will be generated

🚨 Troubleshooting

Common Issues

Connection Timeout

Verify broker URLs are correct
Check network connectivity
Ensure firewall allows connections

Authentication Failed

Verify SASL credentials
Check SASL mechanism compatibility
Ensure user has proper permissions

File System Errors

Check write permissions for output directory
Ensure sufficient disk space
Verify directory exists and is writable

Validation Errors

Review detailed error logs
Check all configuration parameters
Verify Kafka cluster is accessible

Getting Help

Run with verbose logging to see detailed error information
Check the validation logs for specific failure points
Verify your configuration file format matches the examples
Ensure your Kafka cluster is running and accessible

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

📞 Support

For issues and questions:

Check the troubleshooting section
Review validation logs for specific errors
Ensure configuration matches the examples provided
Email us: [email protected]

✅ Health/Configuration Checks

SuperStream Kafka Analyzer performs a comprehensive set of health checks on your Kafka cluster to help you identify issues and optimize your setup:

Replication Factor vs Broker Count: Ensures topics do not have a replication factor greater than the number of brokers.
Topic Partition Distribution: Checks for balanced partition distribution across topics.
Consumer Group Health: Identifies consumer groups with no active members.
Internal Topics Health: Verifies system topics are healthy.
Under-Replicated Partitions: Checks if topics have fewer in-sync replicas than configured.
Min In-Sync Replicas Configuration: Checks if topics have min.insync.replicas greater than replication factor.
Vendor-Specific Checks: For AWS MSK, Confluent, Aiven, and Apache Kafka, checks for system topics and platform-specific best practices.
Rack Awareness: Verifies rack awareness configuration for better availability.
Replica Distribution: Ensures replicas are evenly distributed across brokers.
Metrics Configuration: Checks if monitoring/metrics are properly configured.
Logging Configuration: Verifies logging configuration for your Kafka deployment.
Authentication Configuration: Detects if unauthenticated access is enabled (security risk).
Quotas Configuration: Checks if Kafka quotas are configured and being used.
Payload Compression: Checks if payload compression is enabled on user topics.
Infinite Retention Policy: Checks if any topics have infinite retention policy enabled.

Each check provides a clear status (✅ Pass, ⚠️ Warning, ❌ Failed, ℹ️ Info) and actionable recommendations.

🔒 Security & Privacy

No Data Shared: All analysis and health checks are performed locally on your machine. No Kafka data, credentials, or cluster information is ever sent to any external server.
Local-Only: The tool does not transmit, store, or share your Kafka messages, topic data, or configuration outside your environment.
Optional Analytics: Anonymous usage analytics (such as error events and feature usage) are sent only if enabled, and never include sensitive Kafka data. You can disable analytics by setting SUPERSTREAM_ANALYTICS=false.
Email collection: We're collecting email addresses to help the Superstream team better understand the types of companies using our tool. This insight will guide us in shaping a commercial version that meets real needs. While we're deeply committed to supporting the community, gaining even basic marketing insights is essential for us to justify the time and resources required to sustain and grow this project. Your email address will never be shared, and we don’t believe in cold emails or unsolicited marketing. We only reach out if you’ve clearly opted in or asked.

Your security and privacy are our top priority. Everything runs locally and securely by default.

🔑 Required Permissions

To perform all health checks, your user/service account must have the following permissions for each vendor:

AWS MSK

AWS IAM Permissions:
- kafka:DescribeCluster
- kafka:DescribeConfiguration
- kafka:ListClusters
- kafka:ListNodes
- (Optional for advanced checks) kafka:ListConfigurations, kafka:ListKafkaVersions
Kafka Permissions:
- Describe and List on all topics and consumer groups
- DescribeConfigs on brokers and topics
- Read/Consume on topics (required for consumer group health and producer compression checks)

Confluent Cloud

API Key/Secret Permissions:
- CloudClusterAdmin or equivalent role
- Describe and List on all topics and consumer groups
- DescribeConfigs on brokers and topics
- Read/Consume on topics (required for consumer group health and producer compression checks)

Aiven Kafka

Service Account/User Permissions:
- Describe and List on all topics and consumer groups
- DescribeConfigs on brokers and topics
- Read/Consume on topics (required for consumer group health and producer compression checks)

Apache Kafka / Confluent Platform / Redpanda

Kafka User Permissions:
- Describe and List on all topics and consumer groups
- DescribeConfigs on brokers and topics
- Read/Consume on topics (required for consumer group health and producer compression checks)

Note:

Some checks (like logging, quotas, and metrics) require admin-level access to the Kafka Admin API or cloud provider API.

For AWS MSK, you must also have valid AWS credentials configured in your environment.

If you only have limited permissions, some health checks may be skipped or show warnings.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
bin		bin
config-examples		config-examples
kafka-analysis		kafka-analysis
report-examples		report-examples
src		src
static		static
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json

License

superstreamlabs/kafka-analyzer

Folders and files

Latest commit

History

Repository files navigation