Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring #62

Merged
merged 34 commits into from
Feb 27, 2025
Merged

Monitoring #62

merged 34 commits into from
Feb 27, 2025

Conversation

intarga
Copy link
Member

@intarga intarga commented Feb 5, 2025

TODO list:

This PR includes:

  • Metrics instrumentation of ingestion
  • node_exporter and postgres_exporter deployment
  • Improvements and bugfixes to related parts of our ansible playbooks
  • Tracing (logging) instrumentation of ingestion

Related changes outside this repo (see PRs linked in TODO list):

Things to note:

  • Unfortunately the Grafana instance IT maintains is read-only, meaning we can't easily edit or experiment with our dashboard. Making changes to the dashboard requires:

    • Exporting the dashboard JSON
    • Setting up your own local Grafana
    • Importing the dashboard
    • Making your changes
    • Exporting again
    • Opening a PR on the departments repo replacing the dashboard file with your newly exported JSON
    • Waiting 2 minutes for their yaml lint pipeline to run (which doesn't even check the things that should be checked in the dashboard)
    • Merging the PR
    • Waiting 15-20 minutes for the deployment to go live

    In my opinion this is an unacceptable edit loop for a visualisation dashboard. I'll leave some feedback for the people at IT who maintain it, but if they aren't receptive, we may eventually want to set up our own.

  • I wanted to include metrics/dashboards using pg_stat_statements which would give us more detailed insights into query performance. Unfortunately the stat_statements collector in latest version of postgres_exporter is broken on Postgres 17. There's already a fix for it in their main branch, but as we don't know when the next release will be, I suggest we ignore this for now, and come back to it when the new version is live. The configuration is already done on our end, so it would just require running a playbook to update postgres_exporter, and making visualisations with the new metrics.

  • I was originally intending to turn on ingestion to test and refine the dashboard (Manuel turned it off in relation to his migration work), but now especially as we don't have stat_statements yet, I think this can wait and be bundled into my upcoming benchmarking work, as that will stress the dashboard anyway.

@intarga intarga force-pushed the monitoring branch 4 times, most recently from 3f2df96 to 8c8dce7 Compare February 12, 2025 12:58
@intarga intarga force-pushed the monitoring branch 2 times, most recently from f9cc4ec to a02184b Compare February 18, 2025 16:07
The previous version had a footgun that rules removed from the ansible
vars would not be removed by ansible from ostack. This version also
allows our vars to match the structure expected by the ostack collection
These need to both always run because github doesn't allow you to put
conditions on requiring check passes for branch protection
@intarga intarga marked this pull request as ready for review February 20, 2025 13:48
@Lun4m
Copy link
Collaborator

Lun4m commented Feb 21, 2025

  • Waiting 15-20 minutes for the deployment to go live

In my opinion this is an unacceptable edit loop for a visualisation dashboard. I'll leave some feedback for the people at IT who maintain it, but if they aren't receptive, we may eventually want to set up our own.

20 minutes for changing some graphs sounds rough, it's probably faster to do it by hand with the GUI.

I wanted to include metrics/dashboards using pg_stat_statements which would give us more detailed insights into query performance. Unfortunately the stat_statements collector in latest version of postgres_exporter is broken on Postgres 17. There's already a fix for it in their main branch, but as we don't know when the next release will be, I suggest we ignore this for now, and come back to it when the new version is live. The configuration is already done on our end, so it would just require running a playbook to update postgres_exporter, and making visualisations with the new metrics.

It seems like they are merging it soon (today even 🎉 )

I was originally intending to turn on ingestion to test and refine the dashboard (Manuel turned it off in relation to his migration work), but now especially as we don't have stat_statements yet, I think this can wait and be bundled into my upcoming benchmarking work, as that will stress the dashboard anyway.

That makes sense, plus we probably also need to merge the other two open draft PRs before that.

Copy link
Collaborator

@Lun4m Lun4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, it's so satisfying to see that the dashboard is running!

@Lun4m Lun4m linked an issue Feb 21, 2025 that may be closed by this pull request
@intarga intarga merged commit 2a74653 into trunk Feb 27, 2025
3 checks passed
@intarga intarga deleted the monitoring branch February 27, 2025 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Monitoring
3 participants