Add SpiceDB community integration #2538

tstirrat15 · 2024-11-15T21:44:38Z

What does this PR do?

This adds an openmetrics-based agent checker for SpiceDB, an open-source authorization database maintained by authzed: https://authzed.com/spicedb

Motivation

Datadog doesn't easily support scraping openmetrics without the use of a third-party agent. This provides first-party support and will make life easier for Authzed customers.

Review checklist

PR has a meaningful title or PR has the no-changelog label attached
Feature or bugfix has tests
Git history is clean
If PR impacts documentation, docs team has been notified or an issue has been opened on the documentation repo
If this PR includes a log pipeline, please add a description describing the remappers and processors.

Additional Notes

I used the CockroachDB integration from integrations-core as my template. I think I ported the right parts of it over in the right way, but this is my first time developing an integration and would appreciate any feedback on obvious tripping points.

…file that uses openmetrics v2

tstirrat15 · 2024-11-27T01:04:46Z

I'm looking for prior art around getting Openmetrics-style metrics into Datadog using the agent. There don't seem to be good examples in integrations-extras - there's a few folks who seem to be treating _bucket metrics as gauge metrics which doesn't seem to be correct.

There's https://github.com/DataDog/integrations-core/blob/master/cockroachdb/metadata.csv which indicates that the buckets are exposed as counts for Cockroachdb, but that's only going to be helpful if they're somehow implicitly converted to distribution metrics on the Datadog side. Is that how it works?

My coworker and I were struggling today with the otel exporter - the custom metrics that were coming across for histograms were showing up as straight counts, even though the otel datadog exporter is supposed to treat histogram metrics as distribution metrics by default.

Is there something I'm missing here? I'm also happy to be pointed at another resource if there's documentation somewhere.

urseberry · 2024-12-06T20:06:26Z

I've updated the README and dashboard JSON. Can I get some help with the questions I've asked?

Thank you for the updates, @tstirrat15. I will review them. Unfortunately, I don't know the answers to your questions, as I am a technical writer and not an engineer. An engineer will review your PR also and answer your questions.

spicedb/README.md

spicedb/assets/dashboards/spicedb_overview.json

spicedb/images/IMAGES_README.md

Co-authored-by: Ursula Chen <[email protected]>

tstirrat15 · 2024-12-06T21:00:19Z

Done

spicedb/README.md

spicedb/CHANGELOG.md

Co-authored-by: Ursula Chen <[email protected]>

…rations-extras into add-spicedb-integration

tstirrat15 · 2024-12-12T00:31:01Z

@urseberry are the labels correct for further review?

steveny91 · 2024-12-12T17:40:23Z

The validation is complaining that distribution isn't a valid metric type. The integration assets reference links to metric types which shows distribution is a metric type. What are the valid metric types and where do I find them?

@tstirrat15 For openmetrics, all metrics are collected as gauges or counts.

Prometheus/Openmetric type > Datadog metric type
Gauge > Gauge
Counter > Count
Histogram _bucket, _count, _sum > Count
Summary _count, _sum > Count
Summary _quantile > Gauge

steveny91 · 2024-12-12T17:54:21Z

There's https://github.com/DataDog/integrations-core/blob/master/cockroachdb/metadata.csv which indicates that the buckets are exposed as counts for Cockroachdb, but that's only going to be helpful if they're somehow implicitly converted to distribution metrics on the Datadog side. Is that how it works?

I guess it would be nice to get a better grasp on the need for the distribution metrics? This is calculated in our backend, and I don't really have the knowledge of the top of my head to better explain this at the moment (owned by another team) how that is done, other than on the agent integration side, we do indeed have the ability to submit them to that distribution intake.

But the reason why we collect them as counters is due the way Openmetrics exposes them as count of observations/events within a certain bucket. For example a value of 10 for a upper_bound:1 could mean that there's 10 requests that were under 1ms in response time. Most customer from my experience benefits from just doing a _sum/_count to get the average latency for example for the whole histogram and track that over time.

I do have to admit that currently we don't have the best visual capability either to be able to display a histogram in the form a of bellshape curved.

steveny91 · 2024-12-12T17:57:38Z

I also can't get the integration working locally so that I can develop a dashboard against it - I opened an issue at #2553 and haven't received a response.

I'll take a look at this today if not tomorrow.

tstirrat15 · 2024-12-12T18:54:36Z

I guess it would be nice to get a better grasp on the need for the distribution metrics?

Our desire is to be able to export a dashboard that works similarly to the one we developed internally in Grafana, where openmetrics histograms for latency metrics are displayed as p50/p95/p99 timeseries. From my understanding that sort of display is really only possible with distributions in datadog, and aggregating those into a single rate/max/sum isn't going to work for our customers.

I'm able to do it using the openmetrics integration directly, as in this repo: https://github.com/authzed/examples/tree/main/observability/simple-datadog

In this example, is the agent submitting counts to the API and the API is doing the distribution aggregation, or is it that the agent itself is aggregating the histogram data and then submitting it as distribution data?

My sense is that the agent is doing something to disaggregate, transform, and reaggregate a histogram metric into a distribution metric. That's what I want in this integration, and it works in the Openmetric integration, and this integration is a special case of an Openmetrics integration.

Is there something I'm missing?

steveny91 · 2024-12-16T15:57:38Z

@tstirrat15

The link you provided. I thinks does indeed go through the agent:
https://github.com/authzed/examples/blob/main/observability/simple-datadog/conf.d/openmetrics.d/conf.yaml

Your understanding is correct, is that we do submit some form of a sketch to our backend for it to aggregate. But I'm not really sure what calculation and how it aggregates it, so I can't confidently go into detail in that part of that aspect.

There's also a UI portion after metric ingestion that needs to be done according to this guide:
https://docs.datadoghq.com/metrics/distributions/

So all in all:

histogram_buckets_as_distributions: true setting does indeed submit sketches to our backend for recalculation and aggregations
Once histograms are ingested as distributions, you would indeed be able to calculate the different percentiles via the UI

tstirrat15 · 2024-12-16T21:55:58Z

@steveny91 the part I'm getting at is that in the linked configuration for an Openmetrics check, a metric type of histogram works just fine. The integration I'm working on extends that Openmetrics check. Why can't I submit histogram metrics through my integration?

Like suppose I submit all of these metrics piecewise as counters and gauges. How can a user then turn those into distributions? It's not going to be sufficient for them to take a counter and display it as rate, because I've lost the distribution information, and that's not useful for latency metrics.

steveny91 · 2024-12-17T17:03:04Z

The part I'm getting at is that in the linked configuration for an Openmetrics check, a metric type of histogram works just fine. The integration I'm working on extends that Openmetrics check. Why can't I submit histogram metrics through my integration?

I apologize if I gave off the impression that you can't. You most definitely can if you extend the Openmetrics base class. All I wanted to comment on that was that since it was using the Openmetrics check, it would go through the agent, which you're doing and that is fine. After that you'd have to do some stuff in the UI at app.datadoghq.com to calculate the percentiles and different aggregations.

When you say "a metric type of histogram works just fine", can I conclude from this that the value and submission type for this is as you expect etc. and you can derive the necessary information from it using that exact linked config?

I'm just trying to narrow down the problem here, if it's more along the lines of:

The metric values not accurate after these are submitted as distributions

or

I'm having trouble submitting these as distributions in the current PR.

We had a few back and forth and things might have gotten muddled. I want to see exactly where we stand so I can better assist here. I can definitely help with the integration stuff and get things align and passing CI to be merged. But if it's more of a discussion of metric value and behavior, then I'd have to find someone that can better understand and explain the nuances of distribution and calculations that happen in the backend.

Thanks and I do apologize for maybe derailing this a bit.

tstirrat15 · 2024-12-18T20:23:13Z

@steveny91 it's the latter - the validator complains that distributions and histograms aren't valid metric types in metadata.csv, and the unit tests don't pick up the histogram metrics even though the integration tests do it just fine.

Is this an issue with the harness/validation or an issue with what I'm trying to do?

tstirrat15 added 5 commits November 15, 2024 13:08

Initial commit

6381c23

Add configuration spec based on cockroach spec

374f023

Regen config_models off of spec

0f6b69e

Regen example config off of spec

6f2a378

Add metrics definition file in the style of cockroach and add checks …

0f0db17

…file that uses openmetrics v2

tstirrat15 added 13 commits December 2, 2024 12:51

Copy service checks from cockroach

4ac152f

Put spicedb version in hatch.toml

c848aa9

Add authzed as an optional dependency

069bcdd

Add test files

0186d70

Add a test util file

c24ff3a

Add a docker folder

3e93005

Add a fixtures file

f8d508c

Start fleshing out conftest

6ae324c

Add a todo

5472127

Finish fleshing out metrics

d785a02

Set histogram configuration by default

6773455

Run ruff format

707d172

Make histogram as distribution on by default

4dc4a33

tstirrat15 mentioned this pull request Dec 2, 2024

Workflow for installing in-progress integration seems broken #2553

Open

tstirrat15 added 10 commits December 2, 2024 17:00

Fix reference

54df4c9

Fix test error by disabling chekc

f4ee182

Re-add deps to keep hatch happy

fd9e314

Fix references

729536f

Add get_expected_metrics

c64b4b0

use get_expected_metrics

7e95ded

Reorganize metrics

6803970

Fill out metadata csv

b8e53a1

Reorganize metrics to provide types

c3892e0

Fix tag name

d86185c

urseberry reviewed Dec 6, 2024

View reviewed changes

tstirrat15 and others added 7 commits December 6, 2024 12:57

Update spicedb/README.md

b9e6393

Co-authored-by: Ursula Chen <[email protected]>

Update spicedb/README.md

2ee3c3b

Co-authored-by: Ursula Chen <[email protected]>

Update spicedb/README.md

ac77f8d

Co-authored-by: Ursula Chen <[email protected]>

Update spicedb/README.md

a1cfb59

Co-authored-by: Ursula Chen <[email protected]>

Update spicedb/README.md

3023db1

Co-authored-by: Ursula Chen <[email protected]>

Format dashboard

ef9ec6b

Remove readme

3718eba

urseberry reviewed Dec 6, 2024

View reviewed changes

spicedb/README.md Outdated Show resolved Hide resolved

spicedb/CHANGELOG.md Outdated Show resolved Hide resolved

tstirrat15 and others added 5 commits December 6, 2024 16:39

Update spicedb/README.md

da8ae0b

Co-authored-by: Ursula Chen <[email protected]>

Update spicedb/CHANGELOG.md

8cfe890

Co-authored-by: Ursula Chen <[email protected]>

Use open-source image for thumper

8093de5

Merge branch 'add-spicedb-integration' of github.com:tstirrat15/integ…

d54b74f

…rations-extras into add-spicedb-integration

Fix merge conflicts

1867002

tstirrat15 added 5 commits December 18, 2024 12:56

Run black

7df9e90

Ruff as well

8df39f9

Fix import sort

c2735ba

Add noqa to stop ruff and black from fighting

31c683b

Last one

8ba8171

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SpiceDB community integration #2538

Add SpiceDB community integration #2538

tstirrat15 commented Nov 15, 2024 •

edited

Loading

tstirrat15 commented Nov 27, 2024

urseberry commented Dec 6, 2024

tstirrat15 commented Dec 6, 2024

tstirrat15 commented Dec 12, 2024

steveny91 commented Dec 12, 2024 •

edited

Loading

steveny91 commented Dec 12, 2024

steveny91 commented Dec 12, 2024

tstirrat15 commented Dec 12, 2024

steveny91 commented Dec 16, 2024

tstirrat15 commented Dec 16, 2024

steveny91 commented Dec 17, 2024

tstirrat15 commented Dec 18, 2024 •

edited

Loading

Add SpiceDB community integration #2538

Are you sure you want to change the base?

Add SpiceDB community integration #2538

Conversation

tstirrat15 commented Nov 15, 2024 • edited Loading

What does this PR do?

Motivation

Review checklist

Additional Notes

tstirrat15 commented Nov 27, 2024

urseberry commented Dec 6, 2024

tstirrat15 commented Dec 6, 2024

tstirrat15 commented Dec 12, 2024

steveny91 commented Dec 12, 2024 • edited Loading

steveny91 commented Dec 12, 2024

steveny91 commented Dec 12, 2024

tstirrat15 commented Dec 12, 2024

steveny91 commented Dec 16, 2024

tstirrat15 commented Dec 16, 2024

steveny91 commented Dec 17, 2024

tstirrat15 commented Dec 18, 2024 • edited Loading

tstirrat15 commented Nov 15, 2024 •

edited

Loading

steveny91 commented Dec 12, 2024 •

edited

Loading

tstirrat15 commented Dec 18, 2024 •

edited

Loading