Note: Best consumed
chilledwith my blog discusssing this experiment in detail
A Terraform-automated series of independent experiments empirically measuring throughput and latency of an Event Hub-triggered Azure Function on a Consumption Plan for different combinations of:
- number of Partitions in the Event Hub
maxBatchSize
setting inhost.json
prefetchCount
setting inhost.json
- Owner access to an Azure subscription
- Linux box with Terraform and Azure CLI in $PATH
git clone https://github.com/iizotov/azure-functions-scaleout && cd azure-functions-scaleout
- Create a Service Principal for Terraform by following these instructions
-
(Option A) create
./terraform/credentials.auto.tfvars
with the SP credentials as below:cat << 'EOF' > ./terraform/credentials.auto.tfvars "subscription_id" = "<REPLACE_ME>" "client_id" = "<REPLACE_ME>" "client_secret" = "<REPLACE_ME>" "tenant_id" = "<REPLACE_ME>" "region" = "West US 2" EOF
-
(Option B) create environmental variables with the SP credentials:
export TF_VAR_subscription_id="<REPLACE_ME>" export TF_VAR_client_id="<REPLACE_ME>" export TF_VAR_client_secret="<REPLACE_ME>" export TF_VAR_tenant_id="<REPLACE_ME>" export TF_VAR_region="West US 2"
Note 1: Terraform will be calling
az
from time to time, hence Azure credentials can not be passed via a more familiarARM_*
env variables methodNote 2: by default, all Azure service will be provisioned in West US 2, to choose a different region, modify the
region
orTF_VAR_region
setting accordingly, ensuring the chosen region can run both Application Insights and Log Analytics. At the time of writing,West US 2
andSoutheast Asia
were the two good candidates
- Edit
./terraform/run.sh
to adjust the settings below. By default, 60 iterations will be performed covering every combination of array values (60=1x3x5x4x1x1), executed in parallel in batches of 12, 15 minutes per batch.## Adjustable Parameters ITERATION_SLEEP=15m # Duration for each iteration MAX_PARALLEL_EXPERIMENTS=12 # Parallel iterations to run LANGUAGES=( node ) # Azure Function consumer lang PARTITIONS=( 4 8 32 ) # Event Hub Partition Count BATCH_SIZES=( 1 16 64 256 512 ) # maxBatchSize values PREFETCH_SIZES=( 0 128 512 2048 ) # prefetchCount values CHECKPOINT_SIZES=( 10 ) # batchCheckpointFrequency values THROUGHPUT_UNITS=( 20 ) # Event Hub Throughput Units
- Run
nohup /bin/bash ./terraform/run.sh > output.out 2>&1 &
The experiment will iterate through every possible combination of the following arrays' values as defined in run.sh
:
$LANGUAGES
$PARTITIONS
$BATCH_SIZES
$PREFETCH_SIZES
$CHECKPOINT_SIZES
$THROUGHPUT_UNITS
A new resource group rg-telemetry-<suffix>
is created with these services and will be shared by all iterations:
- Azure Application Insights for Azure Functions metrics
- Log Analytics Workspace for Event Hub metrics
- Consumption Plan Azure Function: deployment helper with the following application settings:
WEBSITE_RUN_FROM_PACKAGE = "https://github.com/iizotov/azure-functions-scaleout/releases/download/latest/deploymenthelper.zip" NODEJS_TEMPLATE_URL = "https://github.com/iizotov/azure-functions-scaleout/releases/download/latest/nodejs-template.zip" DOTNET_TEMPLATE_URL = "https://github.com/iizotov/azure-functions-scaleout/releases/download/latest/dotnet-template.zip"
The purpose of this helper is to be able dynamically generate a zipdeploy file with the correct
host.json
settings for every iteration of the experiment
Iterations are deployed in parallel, in batches of $MAX_PARALLEL_EXPERIMENTS
. Each iteration:
- ...creates a new resource group
exp-<experiment_id>-<language>-<partition_count>-<TUs>-<batch_size>-<prefetch_count>-<checkpoint_freq>-<suffix>
- ...provisions a Standard Event Hub Namespace with
$THROUGHPUT_UNITS
TUs with a single Event Hub with$PARTITIONS
partitions, enables monitoring via the provisioned Log Analytics Workspace - ...provisions a Consumption plan Azure Function v2 consumer bound to that event hub with
host.json
configured acccordingly. The function consumes messages from the event hub and stores latency as a custom metric in the provisioned Application Insights instance - ...provisions a load generator as an [Azure Container Instance] (https://azure.microsoft.com/en-us/services/container-instances/) instance running 5 container images of
iizotov/azure-sb-loadgenerator-dotnetcore:latest
flooding the Event Hub with messages saturating the ingress - ...sleeps for
$ITERATION_SLEEP
and tears down theexp-...
resource group
The process is repeated until there are no more iterations left.
To join the collected Event Hub metrics in the Log Analytics Workspace to the Azure Function custom metrics in Appplication Insights, the following Kusto query can be used:
// Application Insights instance, grab all custom metrics
let ai_raw_metrics = app("ai-33rl7yxp").customMetrics;
// derive instance counts from "cloud_RoleInstance" metric in AI, summarize by iteration and P1M
let ai_P1M_instance_count=ai_raw_metrics
| where name in ("batchSize")
| summarize
Value=round(dcount(cloud_RoleInstance))
by TimeStamp=bin(timestamp, 1m), Experiment=tolower(tostring(customDimensions.experiment)), MetricName="InstanceCount";
// summarize batchSize and batchAverageLatency custom metrics in AI by iteration and P1M
let ai_P1M_metrics = ai_raw_metrics
| where name in ("batchSize", "batchAverageLatency")
| project MetricName=name, TimeStamp=timestamp, Value=value, Experiment=tolower(tostring(customDimensions.experiment))
| summarize
Value=avg(Value)
by bin(TimeStamp, 1m), Experiment, MetricName;
// extract OutgoingMessages and IncomingMessages EH metrics, summarize by iteration and P1M
let eh_P1M_metrics = AzureMetrics
| where ResourceProvider == "MICROSOFT.EVENTHUB"
| where TimeGrain == "PT1M"
| where MetricName in ("OutgoingMessages", "IncomingMessages")
| project Value=Total, Count, Maximum, Minimum, TimeStamp=TimeGenerated, MetricName, Experiment=tolower(Resource)
| summarize
Value=avg(Value)
by bin(TimeStamp, 1m), Experiment, MetricName;
// now that every metric is a P1M metric, union them all
ai_P1M_metrics
| union eh_P1M_metrics
| union ai_P1M_instance_count
This can be analysed directly or exported to Power BI for a more interactive slicing and dicing experiment
The code included in this sample is not intended to be a set of best practices on how to build scalable enterprise grade applications. This is beyond the scope of this educational experiment.
- My blog interpreting the results here
- Results published to in Power BI
- Documentation on
host.json
parameters - Jeff Hollan's blog on Azure Functions