0.7 Feature planning #83

icook · 2014-09-26T04:44:21Z

From 0.7 milestone:

Make the move to a service orchestration architecture.

Overall goals:

Increase fault tolerance through a more distributed design.

Reduce component coupling further.

Move away from port specific settings to per-connection configurable defaults.

Big choices due for discussion:

How to wire together the components.
Which components do we separate? Pros and cons of separation.

Overall requirements/goals of wiring:

Robust multiple PUB/SUB. We're likely going to setup many "jobmanagers" as publishers of new jobs so that one can fail or be upgraded with no downtime. Multiple "stratum managers" will be subscribing. Conventionally this would require a broker, but there are ways to avoid this SPOF. IE, in ZeroRPC each component has an internal ZeroMQ router that acts as a broker, but this doesn't make multiple components dependent on the health of a single broker. The downside of this system is added complexity in service discovery (as opposed to just selecting one of a set of separate brokers, configuring an instance local broker is required).
Low latency above all else. We're not moving tons of data, but we want it to be pretty fast.
Some are to store shared state. Almost all of these require shared state of some sort, but I think Redis is the obvious choice here. I might give a cursory glance at other options.

Candidates for 1:

ZeroMQ, either via ZeroRPC or via custom wrapper.
ZeroRPC (a Python API layer on top of ZeroMQ) is a very nice piece of software, but its more advanced features are very sparse on documentation. Codebase is very lightly commented, meaning a lot of codebase reading is required to develop with it in depth. ZeroMQ is obviously quite a bit better documented, but we would likely be writing a lot more code by hand, possibly inducing a longterm maintenance burden.
Python IPC using multiprocessing and pipes.
I was initially thinking we would be doing this, but I'm not sure it's the right move since it locks us in to Python, along with making our application a process monitor. Certain components would be much better suited to GoLang, and if we use a language agnostic layer it will be easy to make the change. In addition, it's likely that supervisord or circus are much better process managers than we could write.
NSQ
I'm still looking into this, but it honestly looks pretty great. It's a distributed message brokering system (written in Go) that handles a lot of the orchestration bits for us, and has robust Python bindings. It basically already has a distributed service discovery system built in and handles wiring together subscribers and publishers based on topic. It's documentation is a bit sparse in explaining how it works, but pretty good on how to use it. The daemons also expose http endpoints which is very helpful.

Considerations for 2:

Jobmanager <-> Stratum Server
I think this is the obvious first one to do. Since we're likely to have many stratum server processes, and we want to be able to add jobmanagers and upgrade their switching semantics easily this seems the most logical place to split first.

Stratum Server <-> Socket Connection
This is something I've wanted for a while, allowing us to swap out client logic without disconnecting users. This would allow a lot more agility in development since the whole release/rollback cycle is so much less painful. Basically, some simple frontend handles recieving a connection and parsing out json messages and then passes it to an backend without looking at the contents. Backends can be restarted and load balanced easily.

Reporters <-> Stratum server
I think this is the lowest value, although would be nice to have down the road. It would allow batching of shares quite nicely, which will become more of an issue once we have a lot (5-10+) stratum server processes. However we're not seeing many issues with share logging volume and I don't see many other advantages.

Metrics
All of this will move to statsite (statsd). The whole stat counter thing was neat, but statsite is built for it.

Process Monitoring/Management

At this point I think it's Circus and Consul. Supervisor has very limited expandability making certain tasks a really big chore. Circus, while a bit green, has a relatively easy to use plugin system that will be a boon. I honestly wish there was something a bit more robust in this area, but there isn't.

ericecook · 2014-09-28T20:30:53Z

Here is my take on the options for 1:

Compared to ZeroRPC and NSQ, Python IPC looks like an inferior option for our purposes, I think we can probably rule it out. The other two packages try to accomplish almost exactly what we need done, and have so many advantages over IPC it isn't really funny. Also, I don't really see very many advantages of using Python IPC - at least for what we are trying to do (or because I don't know enough about it).

From my brief time spent reading about ZeroRPC and NSQ it looks like NSQ may be the best choice.

ZeroRPC is definitely a more minimalist approach, slightly more than a socket wrapper + broker, which is nice in that it adds quite a lot of flexibility - but definitely leaves a lot of stuff up to us to implement.

NSQ appears to a be a more holistic approach towards a scalable pubsub message delivery system. I particularly like the effort towards stronger guarantees in regards to message delivery. Also, it seems to be thoroughly thought through/developed, and the fact that its the second iteration of the software (redesigned simplequeue) gives me a surprising amount of confidence. Probably the biggest advantage of NSQ is that it has already solved quite a few problems we would end up working around with ZeroMQ/RPC or rolling our own.

That being said, NSQ is just not going to fit our purposes as exactly as we could develop ZeroMQ/RPC to do, and (without looking into it more) I'm not sure how/if their service discovery would interact with consul. Additionally, ZeroRPC is not a standalone daemon - just a library, which simplifies integration/monitoring/management quite a bit.

icook · 2014-10-24T03:58:46Z

A lot of research and discussion has occured in between updates on this ticket, but I think the conclusions are about thus:

ZeroMQ for coordination between components. ZeroRPC is great, but it's not quite what we need and it'll likely be a bit faster to build stripped down version of our own since ZeroRPC isn't well documented, etc.
Redis for service discovery, simplifying config/setup/coding etc (although not as "cool")
Redis for shared memory backend, IE connected client information,
HAproxy as our connection handler/socket multiplexer.
All service wiring managed in process via Python library that periodically pings central service index.

If this is all good, then the next questions are:

Exactly which data structures will be stored in Redis, and how (if at all) will they be cached in process? Given the speed of redis I'm leaning towards not at all, but without numbers it's difficult to say. I think the basics are:
- Connection information. This will likely be keyed by haproxy connection id.
- Jobs. These will probably be keyed by a hash of a few data components in the share.
What will the data structure for services look like in Redis? In general I'm thinking we'll have a hash (dict) in Redis for each service (created by the service at startup) with various config information, and it will be set to expire perhaps in 60 seconds. Then, the daemon re-upping the expiry time is a health check essentially.
Will ports be dynamic or static? I'm in general in favor of dynamic, since it lightens the configuration burden a bit. IE, when it starts it tries to find an open port in some range by attempting to bind until it suceeds.
We're using redis for fundamentally 3 things: shared memory, share logging, and service discovery. Which will be separate instances? My gut says to make shared memory and share logging the same Redis instance, but I have some reservations about the scaling ability. Basic numbers would show estimated redis actions/second = authentications per second + shares per second*5, so one additional action per share processed pretty much, which at current rates is not a big deal at all. Redis is usually well known to be able to handle 30,000 actions per second, which would be about 12x what we're doing right now. This could definitely be cut down a ton by grouping share reporting into one second batch jobs. Then it would be something more like authentications per second + shares per second*2 + number of powerpool instances (for reporting once a second all shares in a single action). These numbers exclude the frontends access, however if we get that large setting up a replicated redis client for the frontend would be trivial, and the delay doesn't matter much.
With ZeroMQ service wiring, which socket types do we use for various communication types? Essentially new job notifications make the most sense as a PUB/SUB type, but certain actions would likely be best done via an RPC connection. But doing both is probably going to be more headache than it's worth, in that a service can now have two ports, and the service consumers need to attach themselves to both, etc. Definitely more robust, but the dev cost and what are likely to be slim performance gains make it seem silly. Given this, I'm in favor of doing RPC all around, and doing a push style publish with RPC.

Thoughts on this mess?

icook added the discussion label Sep 26, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.7 Feature planning #83

0.7 Feature planning #83

icook commented Sep 26, 2014

ericecook commented Sep 28, 2014

icook commented Oct 24, 2014

0.7 Feature planning #83

0.7 Feature planning #83

Comments

icook commented Sep 26, 2014

ericecook commented Sep 28, 2014

icook commented Oct 24, 2014