Skip to content

Latest commit

 

History

History
139 lines (97 loc) · 9.64 KB

README.md

File metadata and controls

139 lines (97 loc) · 9.64 KB

1. Overview

Pulsar perf is Apache Pulsar's built-in load generation and performance testing tool. It is a powerful tool and has fairly comprehensive options to simulate and test various aspects of a Pulsar workload, such as Producer, Consumer, Reader, and etc.

When executing a Pulsar perf command, it outputs the end-to-end performance metrics (throughput and latency) directly on the command line, something like below. When the execution finishes, it generates a single HdrHistogram result file.

...
09:49:44.420 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:  28907.7  msg/s ---     22.1 Mbit/s --- failure      0.0 msg/s --- Latency: mean:   0.000 ms - med:   0.000 - 95pct:   0.000 - 99pct:   0.000 - 99.9pct:   0.000 - 99.99pct:   0.000 - Max:   0.000
09:49:54.502 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:  42617.5  msg/s ---     32.5 Mbit/s --- failure      0.0 msg/s --- Latency: mean:  84.185 ms - med:  79.618 - 95pct: 124.642 - 99pct: 154.895 - 99.9pct: 240.600 - 99.99pct: 241.294 - Max: 257.830
09:50:04.566 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:  27091.7  msg/s ---     20.7 Mbit/s --- failure      0.0 msg/s --- Latency: mean: 125.973 ms - med:  79.933 - 95pct: 295.433 - 99pct: 966.467 - 99.9pct: 1558.535 - 99.99pct: 1833.951 - Max: 2000.367

...

🔷 Properly Capture Pulsar Perf Execution Metrics

The way that Pulsar perf captures the end-to-end performance metrics, in my opinion, is rather primitive. It doesn't allow the metrics to be exported to an external file (e.g. a CSV file) or to be integrated with a graph/dashboard system like Prometheus and Grafana.

The main objective of this repo. is to provide a wrapper utility around Pulsar perf so the end-to-end- performance metrics can be properly captured:

  • The throughput and latency metrics will be saved in a CSV file.
  • If a remote Prometheus Graphite exporter listening host and port (e.g. <host_ip>:9019) is provided, the metrics will also be sent to that host and port in PlainText Protocol format. By doing so, the Pulsar perf execution metrics can be integrated into Prometheus and Grafana.

🔷 Fine Tune Pulsar Cluster Settings

Another limitation of Pulsar perf utility is that it doesn't offer the capability to fine tune some key cluster parameters that are critical for the overall performance. Some of these key parameters are:

  • Whether or not a topic is partitioned.
  • How the bookie persistence behavior is configured, e.g.
  • Whether or not message deduplication is allowed

The capability of being able to fine tune these cluster-specific settings, in an automatic way, is often crucial in a performance benchmark testing. Pulsar perf, however, doesn't have this capability out of the box.

The secondary objective of this repo is to extend Pulsar perf's capability in this area and therefore become a repeatable benchmark testing platform.

2. Benchmark Utility Description

NOTE: The utility in this repo, pperf_bench.py, is Python based and requires python version 3.7+ (the test in this repo. is based on Python version 3.8.6).

The utility takes several command-line arguments, as listed below:

usage: pperf_bench.py [-h] [-f [CONFIG]] [-d [DURATION]] [-t TOPIC]
                      [-g [PROM_GRAPHITE]]

optional arguments:
  -h, --help            show this help message and exit
  -f [CONFIG], --config [CONFIG]
                        benchmark configuration file (default: "ppfb.yaml"
                        file under the same directory).
  -d [DURATION], --duration [DURATION]
                        benchmark execution duration (format:
                        <integer_value>[h|m|s], default: 10m).
  -t TOPIC, --topic TOPIC
                        pulsar topic name (format:
                        "<tenant>/<namespace>/<topic>").
  -g [PROM_GRAPHITE], --prom_graphite [PROM_GRAPHITE]
                        Prometheus graphite exporter host and port (format:
                        <host_ip>:9109

Among these arguments, the Pulsar topic name is mandatory.

NOTE: when specifying the topic name, please do NOT include "persistent://" (or "non-persistent://") prefix as you would normally do for a Pulsar topic. Instead, the information (persistent or non-persistent) is provided in the configuration file.

2.1. Configuration File

By default, the utility takes the configuration inputs from a file named ppfb.yaml file under the same directory. At the moment, the configuration items in this file are grouped under 5 major categories:

  • pfb-general: General configuration items related with one benchmark testing, such as: 1) if the topic is persistent or non-persistent, 2) if the topic is partitioned, 3) Pulsar perf workload simulation type: producer or consumer, and etc.

  • pfb-persistence: Configuration items that are specific to Pulsar persistence (bookie), such as: 1) ensemble size, 2) write/ack quorum, 3) whether or not message deduplication is enabled, and etc.

  • pulsar-perf-common: Pulsar-perf related configuration items that are common to all client simulation types, such as: 1) message processing (producing/consuming) rate, 2) maximum connections per single broker, 3) message encryption key file, and etc.

  • pulsar-perf-producer: Pulsar-perf configuration items that are specific to a Producer, such as: 1) number of producers, message size, message payload file, and etc.

  • pulsar-perf-consumer: Pulsar-perf configuration items that are specific to a Consumer, such as: 1)number of consumers, 2) receiver queue size, 3) subscription type (e.g Exclusive, Shared, ...), and etc.

2.1.1. Limitation

  1. At the moment, this utility ONLY supports 2 "pulsar-perf" cli commands: produce and consume. It is planned to expand the capability of this utility to other commands in the future (e.g. read, websocket-producer, managed-ledger, and etc.)

  2. When configuring Pulsar perf related settings (under categories: pulsar-perf-*), the utility can take any valid "pulsar-perf [produce|consume]" cli command line option. BUT, the long form of the option MUST be used. The short-form notation is NOT recognized and will cause execution failure.

    For example, in "pulsar-perf" cli, you can use either "-r" or "--rate" to specify the message processing rate. But in this utility, it has to be specified as "--rate".

2.2. Change Pulsar Topic and Message Persistence Behavior

This utility is able to fine tune Pular topic and cluster behaviors through the following configuration items:

Configuration Category Configuration Item Description
pfb-general topic_type if the testing topic is persistent (default) or non-persistent
pfb_general partitioned_topic if the testing topic is partitioned or non-partitioned (default)
pfb_general num_partitions the number of paritions for a partitioned topic
pfb-persistence enable whether or not to enable customized, persistence related settings
pfb-persistence ensembleSize Ensemble size for a ledger
pfb-persistence writeQuorum Write quorum for a ledger
pfb-persistence ackQuorum Ack. quorum for a ledger
pfb-persistence deduplicationEnabled To enable or disable message deduplication

2.3. Execution Output

When executing the utility, it will create a log file (under logs sub-directory) to capture detailed execution details and also generate several metrics files (under metrics sub-directory) with the following naming convention

Sub-folder/File Name Description
logs/pperf_bench_<execution_date_time>.log main log file
metrics/pperf_bench_<execution_date_time>_metrics.raw.csv raw metrics in tabular CSV format
metrics/pperf_bench_<execution_date_time>_metrics.graphite.csv (Optional) Prometheus Graphite Exporter oriented format
metrics/pperf_bench_<execution_date_time>.hgrm (Optional) The original HdrHistogram file generated by pulsar-perf cli

2.3.1. Metrics Integration with Prometheus and Grafana

The command line argument "-g or --prom_graphite" of this utility is optional. But when provided, it specifies the listening host and port where a Prometheus Graphite Exporter(PGE) is running and the utility also sends the metrics to the PGE over the network.

The PGE is a data source for a Prometheus server to scrape (pull). Add a scrape job configuration section in the Prometheus server and it is then able to pull the metrics from the PGE.

scrape_configs:
  - job_name: graphite
    scrape_interval: 5s
    static_configs:
    - targets:
      - <PGE_HOST_IP>:9108

The following screenshot shows an example of displaying the bench execution metrics on a Grafana dashboard where the metrics is exposed to a Prometheus server via PGE.