How to scale slack-bot-server? #4

keremtiryaki · 2016-03-11T16:14:19Z

If i need to run thousands of bot for thousands of teams.
can i use slack-bot-server?

dblock · 2016-03-11T16:23:31Z

This has been talked about in dblock/slack-gamebot#81, which is entirely based on this code. I think you can do low thousands today, however there're two known issues, possibly not that problematic.

It takes time to establish a websocket connection, so startup time starts to become noticeable around 100 bots. Note that http://playplay.io now has 260, and works well, but you cannot instantly re-establish all these connections on a single node. It takes 30 seconds to restart.
RAM becomes a problem, currently at 260 bots we're looking at 250MB in RAM which doesn't seem like that much, however that can grow very fast if you have very large teams and you need the data in a local store (all the data is downloaded on rtm.start).

To solve this we need a solution to horizontally scale the bots. The easiest way would be to load-balance them across multiple nodes. That would need to be implemented, but I would start with #3, first.

benjaminjackson · 2017-07-18T21:25:47Z

I'd add that to run a web server that accepts multiple connections, it's good to split them out into separate processes. I have a Procfile setup that spawns a web proc with multiple Unicorn children and one worker proc with a single thread for the bots:

web: env WEB_ONLY=1 bundle exec unicorn -p $PORT -c config/unicorn.rb -E $RACK_ENV
worker: env BOT_ONLY=1 bundle exec unicorn -p $PORT -E $RACK_ENV

dblock · 2017-07-19T00:28:03Z

The problem with this is that a service needs to expose an endpoint for registration. When that happens you need to start a bot instance. I guess it's ok that the WEB_ONLY part starts that bot for the time being, but it's still not ideal.

BenBach · 2017-12-12T12:36:43Z

Hi. I have the exact same issue. Our memory consumption on our web dyno is growing and I am trying to extract the bots to a worker. Did someone find a solution for this issues?

Thank you very much in advance

alexagranov · 2017-12-13T03:31:19Z

So I'm currently testing out a multi-bot approach that overrides SlackBotRubyServer::Service start! and start_from_database! to do the following:

upon boot, grab Team.active.where(server_id: nil, is_admin: true).where.not(bot_token: nil).limit(ENV['SLACK_MAX_TEAM_COUNT']).lock(true) and walk through each running callbacks and start! (and setting server_id). This ensures each worker starts a set number of distinct teams.
after boot, subscribe to an SQS queue for TeamAdded events. Only one bot worker can dequeue and handle adding the Team.
after boot, subscribe to a Kinesis stream for Service events, such as rebooting (team removed), so that each worker can notify the teams it's handling.

@dblock I think I'm ready to show you what I have ;-)

dblock · 2017-12-17T04:05:47Z

While that may work, I suspect there's going to be a lot of edge cases. Of course you should show us whatever you have and PR improvements that make it possible/easier into this lib.

Stepping back, I'd like to see an interface in slack-ruby-bot-server that abstracts the whole distribution mechanism away, so that we can plug SQS or whatever other queue. Load balancing and such are all common problems in distributed systems like zookeeper, so I think it's best to find something that works out of the box instead of reinventing the wheel.

BenBach · 2017-12-17T11:43:10Z

@alexagranov Sounds great. I am curious :-)

alexagranov · 2017-12-17T20:23:58Z

@dblock - true enough. I neglected to mention though that the aim of my approach is to segment team-specific traffic to a specific bot(s) and not actually to load-balance - keeping it simple at first. I see potential issues with a federated set of bot workers having to coordinate which one gets to update the Slack workspace with a post, for instance. I do think something like zookeeper would be useful once a particular team's size (or SLA) dictates multiple bot workers to share the load.

alexagranov · 2017-12-18T04:45:23Z

oh, and there's also the issue of multiple bot workers per team: if each bot worker is using the same bot token, I believe I've seen Slack broadcast the same user input to all connected realtime clients. Could probably stand to redo that experiment though...

dblock mentioned this issue Jun 3, 2016

Add new teams on the fly #16

Closed

dblock added you can help question labels Jul 3, 2016

dblock mentioned this issue Jul 17, 2017

Docs on scaling with Unicorn #63

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to scale slack-bot-server? #4

How to scale slack-bot-server? #4

keremtiryaki commented Mar 11, 2016

dblock commented Mar 11, 2016

benjaminjackson commented Jul 18, 2017

dblock commented Jul 19, 2017 •

edited

Loading

BenBach commented Dec 12, 2017

alexagranov commented Dec 13, 2017 •

edited

Loading

dblock commented Dec 17, 2017 •

edited

Loading

BenBach commented Dec 17, 2017

alexagranov commented Dec 17, 2017 •

edited

Loading

alexagranov commented Dec 18, 2017

How to scale slack-bot-server? #4

How to scale slack-bot-server? #4

Comments

keremtiryaki commented Mar 11, 2016

dblock commented Mar 11, 2016

benjaminjackson commented Jul 18, 2017

dblock commented Jul 19, 2017 • edited Loading

BenBach commented Dec 12, 2017

alexagranov commented Dec 13, 2017 • edited Loading

dblock commented Dec 17, 2017 • edited Loading

BenBach commented Dec 17, 2017

alexagranov commented Dec 17, 2017 • edited Loading

alexagranov commented Dec 18, 2017

dblock commented Jul 19, 2017 •

edited

Loading

alexagranov commented Dec 13, 2017 •

edited

Loading

dblock commented Dec 17, 2017 •

edited

Loading

alexagranov commented Dec 17, 2017 •

edited

Loading