-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for persistent locks/semaphores #84
Comments
Why not just have your cluster wait on a node? You can observe a node that hasn't been created yet, so just have your cluster check for the existence of Locks were intended to lock around updating a record or running a job, or ensuring that you could have a single writer setup, but have two of them: one active, one on standby. |
Either I didn't explain my requirements well enough, or I'm not understanding what you're suggesting. Each of my clustered nodes are trying to obtain the lock so they can do something locally (restart a service). If the service restarts successfully, or if the restart fails wildly, that lock is removed (because it's ephemeral), and the other nodes in the cluster proceed with their get-lock-or-block-then-do-stuff loop. I feel like I'm missing something... |
To clarify, WHEN the service fails to start (for whatever reason), I want the node to continue to hold the lock to prevent other nodes from proceeding (maybe the service we're restarting is configured incorrectly, and sequencing through each of them will bring the entire load-balanced solution down). |
I think what Jonathan is saying is to not use the actual locking class and On Tue Dec 16 2014 at 5:41:27 PM Bryan Stenson [email protected]
|
I think I understand. But, that still won't work for me for a few reasons:
Finally, once my process completes on any given node, the process running the zk client completes. From a zookeeper cluster perspective, I cannot tell whether the restart was successful or not. |
Well, I think I've worked around this...as suggested, I'm not using the built-in lock/semaphore objects...I'm simply:
|
Me having to implement this workflow just to get persistent locks, however, seems like a common case. I still think the idea of expanding the "lock" construct (semaphores too) to support persistent nodes would be a great feature. Anybody else? |
If you have a node that is persistent, how does a client know which node is theirs after their process restarts? It may be worth trying to separate these concepts into two things:
|
I'd like to do this same thing too, for restarts across clusters like Cassandra, Kafka, even Zookeeper itself. I only want one machine to be down for restart/reboot at any given time.
I was thinking of including the machine's IP (it's unique and static, in my case) in the ZK node name. That way the client would always be able to tell if it has an existing lock node or not, and can delete the right one.
I thought about this as a way to use ephemeral nodes instead of non-ephemeral nodes. Unfortunately, it relies on knowing how many nodes should be up at any given time. I'm not sure if that's always a straightforward thing to determine, and it may remove the advantage of decentralization that ZK affords. You'd probably end up having to create non-ephemeral nodes to record all the machines which are "supposed" to be up, and make sure that you create & delete those at the right time. If you accidentally don't create one, then bad things can happen like rebooting too many machines at once. If you accidentally don't delete one, the end result is the same as if the lock wasn't freed, so you need intervention anyway. Using non-ephemeral nodes makes it somewhat more likely to encounter locks that are stuck. But in that situation I need a human to intervene anyway, so I just need to create the right tooling. I'm going to try to build my functionality on top of this library: hopefully it won't require too many big changes. |
Use case:
My code performs a cluster-level operation that MUST be successful before other members of the cluster are allowed to begin (think "restarting a web service" on a node in a load balanced cluster).
"MUST" here includes when the program dies (either due to it's own exception, or due to a system exception).
I'd like to:
If/When the operation (step 2) dies, the lock should persist (so no other members of the cluster perform the operation). Additionally, I'd like to be able to restart the program/application and resume with the same lock.
Currently, locks/semaphores are created with :ephemeral_sequential, which means the lock is automatically removed if/when the operation dies.
The text was updated successfully, but these errors were encountered: