Fix Helm Chart Prometheus Adapter Metrics #53

paulliwog · 2024-12-16T18:21:22Z

Proposed changes

The x100 multiplier in the existing metrics is not compatible with the scaling ratio from 0.0 to 1.0 in the chart. I'm removing the multiplier to make it consistent with the comments in the values file for scaling.auto.engine.metrics.requestCapacityRatio

This could be a breaking change for any users that have changed the ratio to use a scale from 0 to 100 rather than changing the custom metrics.

Types of changes

What types of changes does your code introduce to the Deepgram self-hosted resources?
Put an x in the boxes that apply

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update or tests (if none of the other choices apply)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

I have read the CONTRIBUTING doc
I have tested my changes in my local self-hosted environment
- Please describe your testing setup and methodology here
I have added necessary documentation (if appropriate)

Further comments

bd-g · 2024-12-16T18:43:28Z

@paulliwog You mentioned you tested the above - do you have screenshots or text output of the prometheus output and/or the HPA with the scaling metrics off by two orders of magnitude?

FWIW a quick read suggest this is correct, and we'll verify on our end, but it's helpful to document in the PR the evidence of the above changes being needed! 😄

paulliwog · 2024-12-16T21:12:27Z

Here are screenshots of the custom metric and it's components:

And here is the HPA using a scaling factor of 0.5, you can see that the values match the current load show in the custom metrics above

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  labels:
    app.kubernetes.io/instance: deepgram-self-hosted
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: deepgram-self-hosted
    app.kubernetes.io/version: release-241121
    helm.sh/chart: deepgram-self-hosted-0.8.0
  name: deepgram-engine-hpa
  namespace: dg-self-hosted
spec:
  behavior:
    scaleDown:
      policies:
        - periodSeconds: 60
          type: Pods
          value: 1
        - periodSeconds: 60
          type: Percent
          value: 25
      selectPolicy: Max
    scaleUp:
      policies:
        - periodSeconds: 15
          type: Pods
          value: 4
        - periodSeconds: 15
          type: Percent
          value: 100
      selectPolicy: Max
      stabilizationWindowSeconds: 0
  maxReplicas: 20
  metrics:
    - external:
        metric:
          name: engine_requests_active_to_max_ratio
        target:
          type: Value
          value: 500m
      type: External
  minReplicas: 2
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: deepgram-engine
status:
  conditions:
    - lastTransitionTime: '2024-12-10T04:28:37Z'
      message: recommended size matches current size
      reason: ReadyForNewScale
      status: 'True'
      type: AbleToScale
    - lastTransitionTime: '2024-12-14T19:13:19Z'
      message: >-
        the HPA was able to successfully calculate a replica count from external
        metric engine_requests_active_to_max_ratio(nil)
      reason: ValidMetricFound
      status: 'True'
      type: ScalingActive
    - lastTransitionTime: '2024-12-14T21:37:54Z'
      message: the desired replica count is less than the minimum replica count
      reason: TooFewReplicas
      status: 'True'
      type: ScalingLimited
  currentMetrics:
    - external:
        current:
          value: 3m
        metric:
          name: engine_requests_active_to_max_ratio
      type: External
  currentReplicas: 2
  desiredReplicas: 2
  lastScaleTime: '2024-12-14T21:39:54Z'

We identified this issue after identifying that only a few concurrent requests would cause the engine to scale up rather than ~100 requests that we were expecting to trigger an increase in scale. I don't have the previous HPA, but it would be 300m in this case with a single concurrent request.

bd-g · 2024-12-16T21:35:32Z

Great. That all tracks. Could you add a line to CHANGELOG.md describing your changes? Nest it under ## Unreleased, with a subheader of ## Changed.

That will also pass the PR checks.

bd-g · 2024-12-18T00:09:56Z

Looks great! Thanks so much. I'll put out a new version with a minor version bump for you to ingest.

bd-g · 2024-12-19T03:33:49Z

@paulliwog Released in 0.8.1

Fix Helm Chart Prometheus Adapter Metrics

ac76b8b

paulliwog requested a review from a team as a code owner December 16, 2024 18:21

Update CHANGELOG.md

65752cb

bd-g approved these changes Dec 18, 2024

View reviewed changes

bd-g merged commit 9aec42d into deepgram:main Dec 18, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Helm Chart Prometheus Adapter Metrics #53

Fix Helm Chart Prometheus Adapter Metrics #53

paulliwog commented Dec 16, 2024

bd-g commented Dec 16, 2024

paulliwog commented Dec 16, 2024

bd-g commented Dec 16, 2024

bd-g commented Dec 18, 2024

bd-g commented Dec 19, 2024

Fix Helm Chart Prometheus Adapter Metrics #53

Fix Helm Chart Prometheus Adapter Metrics #53

Conversation

paulliwog commented Dec 16, 2024

Proposed changes

Types of changes

Checklist

Further comments

bd-g commented Dec 16, 2024

paulliwog commented Dec 16, 2024

bd-g commented Dec 16, 2024

bd-g commented Dec 18, 2024

bd-g commented Dec 19, 2024