Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to scale slack-bot-server? #4

Open
keremtiryaki opened this issue Mar 11, 2016 · 9 comments
Open

How to scale slack-bot-server? #4

keremtiryaki opened this issue Mar 11, 2016 · 9 comments

Comments

@keremtiryaki
Copy link

If i need to run thousands of bot for thousands of teams.
can i use slack-bot-server?

@dblock
Copy link
Collaborator

dblock commented Mar 11, 2016

This has been talked about in dblock/slack-gamebot#81, which is entirely based on this code. I think you can do low thousands today, however there're two known issues, possibly not that problematic.

  • It takes time to establish a websocket connection, so startup time starts to become noticeable around 100 bots. Note that http://playplay.io now has 260, and works well, but you cannot instantly re-establish all these connections on a single node. It takes 30 seconds to restart.
  • RAM becomes a problem, currently at 260 bots we're looking at 250MB in RAM which doesn't seem like that much, however that can grow very fast if you have very large teams and you need the data in a local store (all the data is downloaded on rtm.start).

To solve this we need a solution to horizontally scale the bots. The easiest way would be to load-balance them across multiple nodes. That would need to be implemented, but I would start with #3, first.

@benjaminjackson
Copy link

I'd add that to run a web server that accepts multiple connections, it's good to split them out into separate processes. I have a Procfile setup that spawns a web proc with multiple Unicorn children and one worker proc with a single thread for the bots:

web: env WEB_ONLY=1 bundle exec unicorn -p $PORT -c config/unicorn.rb -E $RACK_ENV
worker: env BOT_ONLY=1 bundle exec unicorn -p $PORT -E $RACK_ENV

@dblock
Copy link
Collaborator

dblock commented Jul 19, 2017

The problem with this is that a service needs to expose an endpoint for registration. When that happens you need to start a bot instance. I guess it's ok that the WEB_ONLY part starts that bot for the time being, but it's still not ideal.

@BenBach
Copy link

BenBach commented Dec 12, 2017

Hi. I have the exact same issue. Our memory consumption on our web dyno is growing and I am trying to extract the bots to a worker. Did someone find a solution for this issues?

Thank you very much in advance

@alexagranov
Copy link
Collaborator

alexagranov commented Dec 13, 2017

So I'm currently testing out a multi-bot approach that overrides SlackBotRubyServer::Service start! and start_from_database! to do the following:

  • upon boot, grab Team.active.where(server_id: nil, is_admin: true).where.not(bot_token: nil).limit(ENV['SLACK_MAX_TEAM_COUNT']).lock(true) and walk through each running callbacks and start! (and setting server_id). This ensures each worker starts a set number of distinct teams.
  • after boot, subscribe to an SQS queue for TeamAdded events. Only one bot worker can dequeue and handle adding the Team.
  • after boot, subscribe to a Kinesis stream for Service events, such as rebooting (team removed), so that each worker can notify the teams it's handling.

@dblock I think I'm ready to show you what I have ;-)

@dblock
Copy link
Collaborator

dblock commented Dec 17, 2017

While that may work, I suspect there's going to be a lot of edge cases. Of course you should show us whatever you have and PR improvements that make it possible/easier into this lib.

Stepping back, I'd like to see an interface in slack-ruby-bot-server that abstracts the whole distribution mechanism away, so that we can plug SQS or whatever other queue. Load balancing and such are all common problems in distributed systems like zookeeper, so I think it's best to find something that works out of the box instead of reinventing the wheel.

@BenBach
Copy link

BenBach commented Dec 17, 2017

@alexagranov Sounds great. I am curious :-)

@alexagranov
Copy link
Collaborator

alexagranov commented Dec 17, 2017

@dblock - true enough. I neglected to mention though that the aim of my approach is to segment team-specific traffic to a specific bot(s) and not actually to load-balance - keeping it simple at first. I see potential issues with a federated set of bot workers having to coordinate which one gets to update the Slack workspace with a post, for instance. I do think something like zookeeper would be useful once a particular team's size (or SLA) dictates multiple bot workers to share the load.

@alexagranov
Copy link
Collaborator

oh, and there's also the issue of multiple bot workers per team: if each bot worker is using the same bot token, I believe I've seen Slack broadcast the same user input to all connected realtime clients. Could probably stand to redo that experiment though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants