Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC: Prometheus metrics module #816

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kdomanski
Copy link
Contributor

@kdomanski kdomanski commented Feb 19, 2024

This is a POC to address #504 . It is meant to spawn feedback, but is still neither complete nor satisfactory to me.

  • For now the only notable metrics that are exported are statuses of any heaters or temperature sensors available
  • No apikey/auth support My bad. Moonraker already handles that in the server.
  • Experience in instrumenting software tells me, that it would be better to instrument printer objects directly in Klippy and then only add Moonraker-specific metrics from here. Need to think about this.
  • I'm not sure whether this is race-free. Could a status update callback land after klippy shutdown/disconnect callback?

This adds a module, which will expose a Prometheus scrape
target at /server/prometheus/metrics

Also registers a simple Info metric with version info and UUID
@kdomanski
Copy link
Contributor Author

Screenshot 2024-02-19 095435
Screenshot 2024-02-19 095452

Instead, we make some assumptions:
- Response will be in the "normal" format and not openmetrics
- No filtering by name will be done
- gzip won't be used (also sparing the CPU cycles in return for bandwidth)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't be difficult to provide the WebRequest access to the request headers if needed. This hasn't been required to date so I haven't done so.

As far as gzip compression, a reverse proxy could handle that if its desired. I don't think it needs to be in Moonraker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as gzip compression, a reverse proxy could handle that if its desired. I don't think it needs to be in Moonraker.
You are absolutely right. These notes basically address all the parts of prometheus client's default handler that I stripped in order to get this integrated.

)
self.server.register_event_handler(
"server:klippy_disconnected", self._clear_metrics
)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moonraker doesn't send a klippy_disconnected event. It does register a notify_klippy_disconnected websocket notification that piggybacks off of the server:klippy_disconnect event.

@Arksine
Copy link
Owner

Arksine commented Feb 19, 2024

Thanks, I have a few questions/comments based on your initial feedback.

No apikey/auth support

Is this a limitation in Prometheus? Does it support any type of authentication?

Experience in instrumenting software tells me, that it would be better to instrument printer objects directly in Klippy and then only add Moonraker-specific metrics from here. Need to think about this.

I'm not sure Kevin would be enthusiastic about merging instrumentation directly. Its effectively repackaging the data, sending instrumented objects to Moonraker would duplicate the serialization/deserialization effort, presumably with additional overhead. Part of the motivation for Klipper's API server is to farm this kind of work to applications outside of Klippy.

I'm not sure whether this is race-free. Could a status update callback land after klippy shutdown/disconnect callback?

Status updates can still be received after shutdown, but not after a disconnect.

@kdomanski
Copy link
Contributor Author

No apikey/auth support

Is this a limitation in Prometheus? Does it support any type of authentication?

No, this is my mistake. I forgot that Moonraker api keys work simply as bearer tokens. That works just fine with Prom.

Experience in instrumenting software tells me, that it would be better to instrument printer objects directly in Klippy and then only add Moonraker-specific metrics from here. Need to think about this.

I'm not sure Kevin would be enthusiastic about merging instrumentation directly. Its effectively repackaging the data, sending instrumented objects to Moonraker would duplicate the serialization/deserialization effort, presumably with additional overhead. Part of the motivation for Klipper's API server is to farm this kind of work to applications outside of Klippy.

And this is exactly why I think we'll be pondering this for a while. I understand and share the desire to minimize any extraneous stuff in Klippy itself. But I just don't think this particular thing can be farmed out effectively.

In Klippy, there's access to the actual objects in question, type information, introspection, and the like. It should be possible to do most of the instrumentation from the Prometheus module itself (with a little bit of introspection) and not litter every other module manually with instrumentation code.

But doing it in Moonraker means inferring all that information which Klippy already has, based on an implicit API (status updates), which doesn't have a clear schema. Any change in Klippy must be followed by a change in Moonraker and a bespoke converter from status JSON to metrics for each component type. That's overhead too. And unlike status updates, which are strictly scheduled, the Prometheus metrics are only serialized during scrape.
In the end, I'd rather not argue about performance and/or overhead without having actual measurements thereof.

I'm not sure whether this is race-free. Could a status update callback land after klippy shutdown/disconnect callback?

Status updates can still be received after shutdown, but not after a disconnect.

I failed to articulate my question clearly. What I mean is: if a status updates comes right before a disconnect, will their handlers still execute in order every time, or is there a risk that after clearing the metrics (on disconnect, on shutdown, etc.) a wayward update handler still runs and writes a value from the old session?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants