Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-picks for 2.10.25-RC.1 #6345

Merged
merged 41 commits into from
Jan 9, 2025
Merged

Cherry-picks for 2.10.25-RC.1 #6345

merged 41 commits into from
Jan 9, 2025

Conversation

wallyqs
Copy link
Member

@wallyqs wallyqs commented Jan 8, 2025

Dependencies

Leafnode

JetStream

Tests

@wallyqs wallyqs requested a review from a team as a code owner January 8, 2025 17:59
@neilalexander
Copy link
Member

Missing #6282 and #6284.

Also need to add #6287, it changes no functionality but will be necessary for resolving future merge conflicts in the filestore.

@neilalexander
Copy link
Member

Also should probably wait on maybe #6341 and definitely #6344 too.

MauriceVanVeen and others added 22 commits January 8, 2025 10:34
…ons on hub

If the hub has a user with subscribe permissions on a literal subject
that the leaf is trying to create a queue subscription on, the interest
may not be propagated.

The issue was caused by the fact that we were checking the permissions
on the key (that includes subject and queue name) instead of the subject
itself.

Resolves #6281

Signed-off-by: Ivan Kozlovic <[email protected]>
Signed-off-by: Maurice van Veen <[email protected]>
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.28.0 to 0.29.0.
- [Commits](golang/sys@v0.28.0...v0.29.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Bumps [golang.org/x/time](https://github.com/golang/time) from 0.8.0 to 0.9.0.
- [Commits](golang/time@v0.8.0...v0.9.0)

---
updated-dependencies:
- dependency-name: golang.org/x/time
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
None of these are ever mutated after the group is created, therefore we
will create unnecessary lock contention on the group lock by trying to
take it.

Signed-off-by: Neil Twigg <[email protected]>
Otherwise consumer infos could hold the JS lock while waiting to take
the Raft lock for a group.

Signed-off-by: Neil Twigg <[email protected]>
…shot/restore advisories

Otherwise these advisories can end up very large and take more CPU time to encode.

Signed-off-by: Neil Twigg <[email protected]>
@wallyqs wallyqs force-pushed the downstream/v2.10.25 branch from 240fce9 to 14db100 Compare January 8, 2025 18:34
@wallyqs
Copy link
Member Author

wallyqs commented Jan 8, 2025

@neilalexander #6282 and #6284 use some built-ins from v2.11 like unmarshalRequest etc... so will leave them out for now, but including the latest two

Since `deleteNotActive` already runs in its own goroutine via `time.AfterFunc`,
we just create more work for the scheduler by having extra goroutines running
on top of that.

Signed-off-by: Neil Twigg <[email protected]>
@wallyqs
Copy link
Member Author

wallyqs commented Jan 8, 2025

Updated and included #6341 + #6344

MauriceVanVeen and others added 8 commits January 8, 2025 12:58
Signed-off-by: Maurice van Veen <[email protected]>
Need to capture `rg.node` while the JetStream lock is held,
otherwise stopping an asset could race and it could be set
back to `nil`.

Signed-off-by: Neil Twigg <[email protected]>
Similar fix to #4533, but
extending to some other clustered tests, as well as one super cluster
test. Should prevent these tests from failing with `JetStream not
enabled for account`.

Signed-off-by: Maurice van Veen <[email protected]>
Could take some time for `o.processStreamSignal` to be called and up
`o.npc`. Without would sometimes fail with `jetstream_test.go:24773:
require int64 equal, but got: 0 != 1`.

Signed-off-by: Maurice van Veen <[email protected]>
`TestJetStreamClusterRedeliverBackoffs` would sometimes fail, even
though it would be very close to the intended backoff:
```
jetstream_cluster_2_test.go:4216: Timing is off for 1, expected ~100ms, but got 99.846642ms
jetstream_cluster_2_test.go:4216: Timing is off for 0, expected ~25ms, but got 50.767748ms
jetstream_cluster_2_test.go:4216: Timing is off for 4, expected ~250ms, but got 249.95211ms
```

Allowing for a bit of leeway on both ends.

Signed-off-by: Maurice van Veen <[email protected]>
…nge (#6332)

Calling `require_NoError` in a `checkFor` immediately fails the test and
doesn't allow for retries. Also up the max timeout when doing stepdowns,
as that might sometimes take longer as well.

Signed-off-by: Maurice van Veen <[email protected]>
Ack state could take some time to propagate, since the sent acks are not
synchronous.

Signed-off-by: Maurice van Veen <[email protected]>
derekcollison and others added 7 commits January 9, 2025 10:38
Should solve this data race, where `rn.updateLeader(noLeader)` was
called without holding the lock.
```
==================
WARNING: DATA RACE
Write at 0x00c0011d6da8 by goroutine 238071:
  github.com/nats-io/nats-server/v2/server.(*raft).updateLeader()
      /home/travis/build/nats-io/nats-server/server/raft.go:3212 +0x1fa
  github.com/nats-io/nats-server/v2/server.TestJetStreamClusterDesyncAfterErrorDuringCatchup.func2()
      /home/travis/build/nats-io/nats-server/server/jetstream_cluster_4_test.go:3970 +0x1f2
  github.com/nats-io/nats-server/v2/server.TestJetStreamClusterDesyncAfterErrorDuringCatchup.func3()
      /home/travis/build/nats-io/nats-server/server/jetstream_cluster_4_test.go:4046 +0xc56
  testing.tRunner()
      /home/travis/sdk/go1.23.3/src/testing/testing.go:1690 +0x226
  testing.(*T).Run.gowrap1()
      /home/travis/sdk/go1.23.3/src/testing/testing.go:1743 +0x44
Previous read at 0x00c0011d6da8 by goroutine 238374:
  github.com/nats-io/nats-server/v2/server.(*raft).processAppendEntry()
      /home/travis/build/nats-io/nats-server/server/raft.go:3351 +0x124c
  github.com/nats-io/nats-server/v2/server.(*raft).processAppendEntries()
      /home/travis/build/nats-io/nats-server/server/raft.go:2029 +0x1f2
  github.com/nats-io/nats-server/v2/server.(*raft).runAsFollower()
      /home/travis/build/nats-io/nats-server/server/raft.go:2044 +0x446
  github.com/nats-io/nats-server/v2/server.(*raft).run()
      /home/travis/build/nats-io/nats-server/server/raft.go:1906 +0x557
  github.com/nats-io/nats-server/v2/server.(*raft).run-fm()
      <autogenerated>:1 +0x33
  github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine.func1()
      /home/travis/build/nats-io/nats-server/server/server.go:3885 +0x59
```
Signed-off-by: Maurice van Veen <[email protected]>
Deponds on #6345

Includes the following de-flakes:
- #6329
- #6330
- #6331
- #6332
- #6334

And this data race fix:
- #6150

Signed-off-by: Maurice van Veen <[email protected]>
This ensures that we stop these goroutines more quickly when either the
server or the JetStream system shuts down.

This also avoids a race in `TestJetStreamClusterGhostEphemeralsAfterRestart`,
as the unit test was reverting a modified constant before these goroutines
would have finished in some cases.

Signed-off-by: Neil Twigg <[email protected]>
@wallyqs wallyqs merged commit 339d1b3 into release/v2.10.25 Jan 9, 2025
5 checks passed
@wallyqs wallyqs deleted the downstream/v2.10.25 branch January 9, 2025 20:33
@wallyqs wallyqs restored the downstream/v2.10.25 branch January 9, 2025 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants