-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: run post test health checks on system #141430
roachtest: run post test health checks on system #141430
Conversation
8f5fb57
to
261de61
Compare
This changes the post test assertions to always run the health checks on the KV pod. Previously it would run these health checks on the default virtual cluster, which may be a SQL pod. Fixes: cockroachdb#140774
Before, the defaultHTTPClient would grab the lock only when adding cookies. However, we also want to protect the check of a sessionID from concurrent access as well. Previously, it was possible that concurrent calls would all see the sessionID as not set and all attempt to authenticate with the cluster.
261de61
to
f33b7c5
Compare
@@ -134,6 +134,9 @@ func (r *RoachtestHTTPClient) addCookies(ctx context.Context, cookieUrl string) | |||
if !r.cluster.IsSecure() { | |||
return nil | |||
} | |||
|
|||
r.mu.Lock() | |||
defer r.mu.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find! Because of this data race, there was a high bias to return n1
; now, all other nodes are equally likely to be picked (in postTestAssertions
).
TFTR! bors r=srosenberg |
141430: roachtest: run post test health checks on system r=srosenberg a=DarrylWong This changes the post test assertions to always run the health checks on the system tenant (KV pod). Previously it would run these health checks on the default virtual cluster, which may be a SQL pod. We run consistency checks on the system tenant so we care about the liveness of the system tenant here. Fixes: #140774 ----- After this change, running `decomission/mixed-versions`: ``` test-post-assertions: 2025/02/13 16:23:02 cluster.go:2577: running cmd `./cockroach auth-session lo...` on nodes [:1]; details in run_162302.356113000_n1_cockroach-authsessio.log test-post-assertions: 2025/02/13 16:23:17 test_runner.go:1502: n1: https://127.0.0.1:29001/health?ready=1 status=503 body={ "error": "node is shutting down", "code": 14, "message": "node is shutting down", "details": [ ] } test-post-assertions: 2025/02/13 16:23:17 test_runner.go:1509: n2: https://127.0.0.1:29005/health?ready=1 status=200 ok test-post-assertions: 2025/02/13 16:23:17 test_runner.go:1509: n3: https://127.0.0.1:29003/health?ready=1 status=200 ok test-post-assertions: 2025/02/13 16:23:17 test_runner.go:1509: n4: https://127.0.0.1:29007/health?ready=1 status=200 ok test-post-assertions: 2025/02/13 16:23:17 test_runner.go:1525: running validation checks on node 2 (<10m) ``` We can see it correctly identifies node 1 as shutting down and runs the validation checks on node 2. It also only attempts one `auth-session login`. Co-authored-by: DarrylWong <[email protected]>
Build failed: |
unrelated flake bors retry |
141430: roachtest: run post test health checks on system r=srosenberg a=DarrylWong This changes the post test assertions to always run the health checks on the system tenant (KV pod). Previously it would run these health checks on the default virtual cluster, which may be a SQL pod. We run consistency checks on the system tenant so we care about the liveness of the system tenant here. Fixes: #140774 ----- After this change, running `decomission/mixed-versions`: ``` test-post-assertions: 2025/02/13 16:23:02 cluster.go:2577: running cmd `./cockroach auth-session lo...` on nodes [:1]; details in run_162302.356113000_n1_cockroach-authsessio.log test-post-assertions: 2025/02/13 16:23:17 test_runner.go:1502: n1: https://127.0.0.1:29001/health?ready=1 status=503 body={ "error": "node is shutting down", "code": 14, "message": "node is shutting down", "details": [ ] } test-post-assertions: 2025/02/13 16:23:17 test_runner.go:1509: n2: https://127.0.0.1:29005/health?ready=1 status=200 ok test-post-assertions: 2025/02/13 16:23:17 test_runner.go:1509: n3: https://127.0.0.1:29003/health?ready=1 status=200 ok test-post-assertions: 2025/02/13 16:23:17 test_runner.go:1509: n4: https://127.0.0.1:29007/health?ready=1 status=200 ok test-post-assertions: 2025/02/13 16:23:17 test_runner.go:1525: running validation checks on node 2 (<10m) ``` We can see it correctly identifies node 1 as shutting down and runs the validation checks on node 2. It also only attempts one `auth-session login`. 141442: *: add testutils as disallowed import to top-level packages r=rafiss a=rafiss See the individual commits to make this easier to review. This PR is a step towards completing #81375, with the ultimate goal of no longer depending on test packages in our production binary. More work is needed to avoid depending on any of the subpackages inside of `testutils`. informs #81375 Co-authored-by: DarrylWong <[email protected]> Co-authored-by: Rafi Shamim <[email protected]>
Build failed (retrying...): |
Based on the specified backports for this PR, I applied new labels to the following linked issue(s). Please adjust the labels as needed to match the branches actually affected by the issue(s), including adding any known older branches. Issue #140774: branch-release-24.1, branch-release-24.3, branch-release-25.1. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 98dcef2 to blathers/backport-release-24.1-141430: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 24.1.x failed. See errors above. error creating merge commit from 98dcef2 to blathers/backport-release-24.3-141430: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 24.3.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
This changes the post test assertions to always run the health checks on the system tenant (KV pod). Previously it would run these health checks on the default virtual cluster, which may be a SQL pod. We run consistency checks on the system tenant so we care about the liveness of the system tenant here.
Fixes: #140774
After this change, running
decomission/mixed-versions
:We can see it correctly identifies node 1 as shutting down and runs the validation checks on node 2. It also only attempts one
auth-session login
.