Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix KV create race after delete/purge #1301

Merged
merged 1 commit into from
Aug 15, 2024

Conversation

fnichol
Copy link
Contributor

@fnichol fnichol commented Aug 14, 2024

This change fixes an issue when using a JetStream K/V store where a user is creating, deleting, and re-creating keys. If the last entry for a key is a Operation::Delete or Operation::Purge, the initial self.update() returns an error, causing the second part of the method to be exercised.

Prior to this change, if the entry was deleted or purged a kv.put() call is used which ignores the revision of that last entry. A single writer to the K/V store would succeed (as no other writers would write first) so no problem. However, if 2 writers attempt to create a key, then a second writer could call the kv.put() before the first writer calls kv.put(). This means that both writers get an Ok(revision) and can assume that they won the creation of the key.

When using a "distributed lock" pattern (that is many writers race to create a key and the first successful writer wins), this above scenario results in potentially more than one writer who believes they have uniquely acquired the distributed lock.

This change replaces the kv.put() call to a kv.update() call and provides the revision from the deleted/purged entry to ensure that no other writer has beaten the caller to this update. This change closes the race period between concurrent writers to between the first update and the second update call with some optimistic write concurrency to detect another writer.

It appears as though this strategy is in effect in the Go client code kv.Create implementation.

Copy link
Member

@Jarema Jarema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey!

Thank you for catching that.
This looks good. Just one request for naming consistency change.

async-nats/src/jetstream/kv/mod.rs Outdated Show resolved Hide resolved
fnichol added a commit to systeminit/si that referenced this pull request Aug 14, 2024
This change updates the Git branch to an `si/stable` branch so that any
upstream changes can be made to the `js-kv-recreate-race` branch without
impacting this codebase directly.

References: nats-io/nats.rs#1301

Signed-off-by: Fletcher Nichol <[email protected]>
async-nats/src/jetstream/kv/mod.rs Outdated Show resolved Hide resolved
This change fixes an issue when using a JetStream K/V store where a user
is creating, deleting, and re-creating keys. If the last entry for a key
is a `Operation::Delete` or `Operation::Purge`, the initial
`self.update()` returns an error, causing the second part of the method
to be exercised.

Prior to this change, if the entry was deleted or purged a `kv.put()`
call is used which ignores the revision of that last entry. A single
writer to the K/V store would succeed (as no other writers would write
first) so no problem. However, if 2 writers attempt to create a key,
then a second writer *could* call the `kv.put()` before the first writer
calls `kv.put()`. This means that *both* writers get an `Ok(revision)`
and can assume that they won the creation of the key.

When using a "distributed lock" pattern (that is many writers race to
create a key and the first successful writer wins), this above scenario
results in potentially more than one writer who believes they have
uniquely acquired the distributed lock.

This change replaces the `kv.put()` call to a `kv.update()` call and
provides the `revision` from the deleted/purged entry to ensure that no
other writer has beaten the caller to this update. This change closes
the race period between concurrent writers to between the first update
and the second update call with some optimistic write concurrency to
detect another writer.

It appears as though this strategy is in effect in the Go client code
[kv.Create] implementation.

[kv.Create]: https://github.com/nats-io/nats.go/blob/278f9f188bca4d7bdee283a0e98ab66b82530c60/jetstream/kv.go#L944-L963

Co-authored-by: John Keiser <[email protected]>
Signed-off-by: Fletcher Nichol <[email protected]>
Signed-off-by: Fletcher Nichol <[email protected]>
Copy link
Member

@Jarema Jarema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Thank you for your contribution!

@Jarema Jarema merged commit 17e5d65 into nats-io:main Aug 15, 2024
12 checks passed
fnichol added a commit to systeminit/si that referenced this pull request Sep 12, 2024
Notable is that this release contains our upstream KV create race fix
and some lower level data types to work with direct getting messages.

https://github.com/nats-io/nats.rs/releases/tag/async-nats%2Fv0.36.0

References: nats-io/nats.rs#1301

Signed-off-by: Fletcher Nichol <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants