Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: limit scrape_duration_seconds visualization to CHT instance #73

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

IanMinash
Copy link

Limited the values of the Scrape Duration graph to only come from the currently selected CHT instance($cht_instance). This prevents non-related VMs from showing up in a dashboard thats only focused on CHT monitoring.

@jkuester jkuester self-requested a review July 17, 2023 17:06
Copy link
Collaborator

@jkuester jkuester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @IanMinash! I agree that it would be helpful to limit the scrape metrics to the current CHT instance in context. Unfortunately, your change would hide not only the metrics for other CHT instances, but also the scrape metrics from the rest of the monitoring stack (exporters, Grafana, etc) and, most importantly, would hide Postgres scrape metrics (for anyone who has connected Watchdog to their Postgres instance).

I have suggested a change that would keep the other non-CHT metrics in view while not displaying metrics for the out-of-context CHT instances. Additionally, it will filter the postgres metrics to only show the one associated with the current CHT instance in context.

Can you try these changes and see if they are acceptable? Thanks!

@@ -2052,7 +2052,7 @@
"uid": "PBFA97CFB590B2093"
},
"editorMode": "code",
"expr": "scrape_duration_seconds",
"expr": "scrape_duration_seconds{instance=\"$cht_instance\"}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"expr": "scrape_duration_seconds{instance=\"$cht_instance\"}",
"expr": "scrape_duration_seconds unless (scrape_duration_seconds{job=\"cht\", instance!=\"$cht_instance\"} or scrape_duration_seconds{job=\"postgres\", cht_instance!=\"$cht_instance\"})",

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Hi @jkuester, in our case, the monitoring stack is used beyond CHT and the changes suggested still include targets unrelated to CHT. It is also not quite clear to me why I should be interested in scrape duration for grafana for example, whilst I'm interested in processes/metrics that directly relate to CHT.

In my opinion, we should be explicit with the jobs we intend to target then exclude out-of-context instances though I'm still keen on understanding your POV.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm okay, I think I see where you are coming from here. This whole Monitoring Stack row was originally intended to be just an at-a-glance health-check for the Watchdog processes themselves (so Grafana, Prometheus, and the exporters). That being said, I understand the value in being able to easily check the scrape history for the CHT instance in context. It is almost like we are technically trying to achieve different goals:

  1. See overall health of Prometheus by seeing the historical trends for the for the scrape times of all targets. This lets you easily identify outliers or understand when a target is not getting scraped as expected.
  2. See the historical scrape performance for the particular CHT instance in context (informing, among other things, the freshness of the metrics for that instance).

Here is my proposal: lets have separate panels to address both of there goals! In this PR, we can leave the panel in the Monitoring Stack row as it was and instead add a new panel on the CHT Admin Details dashboard (maybe up at the top outside of all the existing rows?) that just shows the scrape data for the CHT instance in context (along with the Postgres instance for that instance if it exists). Then, I have logged #78 which can be addressed later to move the whole Monitoring Stack row out of the CHT Admin Overview dashboard.

Let me know what you think!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the Monitoring Stack row should be moved out of the CHT Admin dashboard.

@jkuester jkuester changed the title Limit scrape_duration_seconds visualization to CHT instance feat: limit scrape_duration_seconds visualization to CHT instance Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants