Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make work with multiple slots #163

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

make work with multiple slots #163

wants to merge 2 commits into from

Conversation

plockaby
Copy link
Contributor

Sorry to do another pull request on this, especially since this one is slightly more disruptive. I just discovered a number of problems while trying to monitor more of our stuff. I also discovered that my original pull request didn't quite solve the problem that I was trying to solve. This corrects a few problems:

  1. The query itself was computing values backwards and generating negative differences in all cases. Also, the coalesce that I submitted yesterday was too simplistic and ended up returning incorrect values in some cases. This query will return correct values that I've tested against multiple clusters that I currently run including against a cluster that has multiple active and inactive replication slots.

  2. Line 5536/5550, there is no value in $db->{slot_name} ever. It is never assigned. I'm fairly certain that what is intended is to record the maximum backlog on the database across all slots so that's what it does now.

  3. Line 5541/5560, when no slots match the exclusion rules it should still be OK and not UNKNOWN.

  4. In general, $max should contain the maximum number of bytes that a slot is causing the database to retain. In this original code $max was also being used to store the results of the check (were slots skipped? set to -2 I guess?). Because the query was generating negative differences the check on line 5530 would never pass because $r->{delta} would never be greater than -1 (or -2) because it was always negative. So basically this check would never fail and definitely didn't work when multiple slots existed.

The tests for this (in 02_replication_slots.t) continue to pass with these changes but I'm not 100% sure that they were good or exhaustive tests to begin with. I don't have a good example in the existing tests of how to test replication between multiple servers in the tests so if you have pointers on where to start writing those tests then I'd be happy to write some.

df7cb added a commit to df7cb/check_postgres that referenced this pull request Jun 24, 2020
@df7cb
Copy link
Collaborator

df7cb commented Jun 24, 2020

Hi @plockaby, I pushed a partly fix that was independently discovered by a coworker here.

There are other merge conflicts, could you try to resolve/rebase these? Thanks!

@plockaby
Copy link
Contributor Author

It has been more than a year since I submitted this pull request. I no longer use this code (or work at the position where we were using this code) so I'm not able to test it that my merges actually continue to fix the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants