-
Notifications
You must be signed in to change notification settings - Fork 71
0.7 Feature planning #83
Comments
Here is my take on the options for 1: Compared to ZeroRPC and NSQ, Python IPC looks like an inferior option for our purposes, I think we can probably rule it out. The other two packages try to accomplish almost exactly what we need done, and have so many advantages over IPC it isn't really funny. Also, I don't really see very many advantages of using Python IPC - at least for what we are trying to do (or because I don't know enough about it). From my brief time spent reading about ZeroRPC and NSQ it looks like NSQ may be the best choice. ZeroRPC is definitely a more minimalist approach, slightly more than a socket wrapper + broker, which is nice in that it adds quite a lot of flexibility - but definitely leaves a lot of stuff up to us to implement. NSQ appears to a be a more holistic approach towards a scalable pubsub message delivery system. I particularly like the effort towards stronger guarantees in regards to message delivery. Also, it seems to be thoroughly thought through/developed, and the fact that its the second iteration of the software (redesigned simplequeue) gives me a surprising amount of confidence. Probably the biggest advantage of NSQ is that it has already solved quite a few problems we would end up working around with ZeroMQ/RPC or rolling our own. That being said, NSQ is just not going to fit our purposes as exactly as we could develop ZeroMQ/RPC to do, and (without looking into it more) I'm not sure how/if their service discovery would interact with consul. Additionally, ZeroRPC is not a standalone daemon - just a library, which simplifies integration/monitoring/management quite a bit. |
A lot of research and discussion has occured in between updates on this ticket, but I think the conclusions are about thus:
If this is all good, then the next questions are:
Thoughts on this mess? |
From 0.7 milestone:
Big choices due for discussion:
Overall requirements/goals of wiring:
Candidates for 1:
ZeroRPC (a Python API layer on top of ZeroMQ) is a very nice piece of software, but its more advanced features are very sparse on documentation. Codebase is very lightly commented, meaning a lot of codebase reading is required to develop with it in depth. ZeroMQ is obviously quite a bit better documented, but we would likely be writing a lot more code by hand, possibly inducing a longterm maintenance burden.
I was initially thinking we would be doing this, but I'm not sure it's the right move since it locks us in to Python, along with making our application a process monitor. Certain components would be much better suited to GoLang, and if we use a language agnostic layer it will be easy to make the change. In addition, it's likely that supervisord or circus are much better process managers than we could write.
I'm still looking into this, but it honestly looks pretty great. It's a distributed message brokering system (written in Go) that handles a lot of the orchestration bits for us, and has robust Python bindings. It basically already has a distributed service discovery system built in and handles wiring together subscribers and publishers based on topic. It's documentation is a bit sparse in explaining how it works, but pretty good on how to use it. The daemons also expose http endpoints which is very helpful.
Considerations for 2:
Jobmanager <-> Stratum Server
I think this is the obvious first one to do. Since we're likely to have many stratum server processes, and we want to be able to add jobmanagers and upgrade their switching semantics easily this seems the most logical place to split first.
Stratum Server <-> Socket Connection
This is something I've wanted for a while, allowing us to swap out client logic without disconnecting users. This would allow a lot more agility in development since the whole release/rollback cycle is so much less painful. Basically, some simple frontend handles recieving a connection and parsing out json messages and then passes it to an backend without looking at the contents. Backends can be restarted and load balanced easily.
Reporters <-> Stratum server
I think this is the lowest value, although would be nice to have down the road. It would allow batching of shares quite nicely, which will become more of an issue once we have a lot (5-10+) stratum server processes. However we're not seeing many issues with share logging volume and I don't see many other advantages.
Metrics
All of this will move to statsite (statsd). The whole stat counter thing was neat, but statsite is built for it.
Process Monitoring/Management
At this point I think it's Circus and Consul. Supervisor has very limited expandability making certain tasks a really big chore. Circus, while a bit green, has a relatively easy to use plugin system that will be a boon. I honestly wish there was something a bit more robust in this area, but there isn't.
The text was updated successfully, but these errors were encountered: