Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sending messages is slow when using groups #83

Open
remusmp opened this issue Mar 1, 2018 · 28 comments
Open

Sending messages is slow when using groups #83

remusmp opened this issue Mar 1, 2018 · 28 comments

Comments

@remusmp
Copy link

remusmp commented Mar 1, 2018

It seems that groups slow down sending messages ~100 times:
I've created 3 different echo consumers in order to compare them:

  1. AsyncJsonWebsocketConsumer with groups
  2. AsyncConsumer with groups
  3. AsyncConsumer without groups

I got these timing results:

1.
Elapsed time 0.026857614517211914.
Elapsed time 0.026637554168701172.
Elapsed time 0.03318667411804199.
Elapsed time 0.03062891960144043.
Elapsed time 0.026413679122924805.
Elapsed time 0.028857707977294922.
Elapsed time 0.03323197364807129.
Elapsed time 0.026538610458374023.
Elapsed time 0.036324501037597656.
Elapsed time 0.0322566032409668.
Elapsed time 0.0385587215423584.
Elapsed time 0.02066636085510254.
Elapsed time 0.03473186492919922.
Elapsed time 0.03095531463623047.
Elapsed time 0.022891759872436523.

2.
Elapsed time 0.02672290802001953.
Elapsed time 0.022529125213623047.
Elapsed time 0.042157888412475586.
Elapsed time 0.028131961822509766.
Elapsed time 0.035176753997802734.
Elapsed time 0.03284716606140137.
Elapsed time 0.03023362159729004.
Elapsed time 0.03440999984741211.
Elapsed time 0.029916763305664062.
Elapsed time 0.03461766242980957.

3. 
Elapsed time 0.00019168853759765625.
Elapsed time 9.107589721679688e-05.
Elapsed time 0.00020265579223632812.
Elapsed time 0.0004429817199707031.
Elapsed time 0.0002741813659667969.
Elapsed time 0.00019693374633789062.
Elapsed time 0.00026679039001464844.
Elapsed time 0.0003097057342529297.
Elapsed time 0.00017261505126953125.
Elapsed time 0.0003085136413574219.
Elapsed time 0.00048232078552246094.
Elapsed time 0.0002384185791015625.

Here is the consumers code:

from channels.generic.websocket import AsyncJsonWebsocketConsumer, AsyncConsumer
from time import time
from channels.layers import get_channel_layer
from asgiref.sync import async_to_sync, sync_to_async

class EchoConsumer1(AsyncJsonWebsocketConsumer):

    async def connect(self):
        await self.accept()
        await self.channel_layer.group_add("test", self.channel_name)

    async def disconnect(self, close_code):
        await self.channel_layer.group_discard("test", self.channel_name)

    async def receive_json(self, content):
        start = time()
        await self.channel_layer.group_send(
            "test",
            {
                "type": "chat.message",
                "json": content
            }
        )
        print("Elapsed time {}.".format(time()-start))

    async def chat_message(self, event):
        await self.send_json(event["json"])


class EchoConsumer2(AsyncConsumer):

    async def websocket_connect(self, event):
        await self.send({
            "type": "websocket.accept",
        })
        await self.channel_layer.group_add("test", self.channel_name)

    async def websocket_send(self, event):
        await self.send(event)

    async def websocket_receive(self, event):
        start = time()
        channel_layer = get_channel_layer()
        await channel_layer.group_send(
            "test",
            {
                "type": "websocket.send",
                "text": event["text"],
            }
        )
        print("Elapsed time {}.".format(time()-start))


class EchoConsumer3(AsyncConsumer):

    async def websocket_connect(self, event):
        await self.send({
            "type": "websocket.accept",
        })

    async def websocket_receive(self, event):
        start = time()
        await self.send({
            "type": "websocket.send",
            "text": event["text"],
        })
        print("Elapsed time {}.".format(time()-start))
@andrewgodwin
Copy link
Member

I'll try to look into this in the next couple of weeks.

@andrewgodwin andrewgodwin self-assigned this Mar 3, 2018
@agronick
Copy link
Contributor

Is Redis even involved in the third consumer?

@SHxKM
Copy link

SHxKM commented Jul 9, 2018

I'm also experiencing very slow behavior when using groups.

@naszta
Copy link

naszta commented Oct 3, 2018

Any update on this? Or do we have a solution what hasn't come up here? (We have this issue, but we don't use the latest versions. I would change to the latest/working version, if I could see that it's worth.)

@Zach-Edge
Copy link

any update

@physicsai
Copy link

Many thanks to @andrewgodwin for all your hard work on this great project! Are there any updates regarding this bug?

I am considering whether to use on the long term django channels on our site in production (we were hit by the same problem in testing), but I see that this bug has been open for over a year.

@carltongibson
Copy link
Member

any update

These comments don't add anything. You can see there's no update, otherwise there'd be an update... 🙂

This needs someone experiencing the issue to spend the time to dig into it. What we don't have here is Why it's happening?

Short of that, it'll just have to sit for a while.

@sevdog
Copy link
Contributor

sevdog commented Apr 13, 2020

I have tried to re-test this if issue was still present (also checking some differences between different versions of channels-redis and channels).

These are my results:

## channels 2.2.0  + channel_redis 2.2.1
consumer, mean, variance, max, min
 v1, 0.003962, 4.2704E-7, 0.011806, 0.003338
 v2, 0.004009, 2.4160E-7, 0.006387, 0.003317
 v3, 0.000079, 1.1771E-9, 0.000275, 0.000055

## channels 2.3.1  + channel_redis 2.3.3
consumer, mean, variance, max, min
 v1, 0.000831, 1.8120E-8, 0.002146, 0.000706
 v2, 0.000826, 3.7920E-8, 0.002566, 0.000683
 v3, 0.000069, 8.9345E-10, 0.000288, 0.000053

## channels 2.4.0  + channel_redis 2.4.2
consumer, mean, variance, max, min
 v1, 0.000814, 6.0222E-9, 0.001563, 0.000715
 v2, 0.000810, 2.2231E-8, 0.002422, 0.000684
 v3, 0.000069, 6.4959E-10, 0.000269, 0.000054

Each test was performed with 200 requests to same echo-consumer.
Seems that in v2.2 there is an improvement in performances (5x faster) but with just 1 group performance are quite bad compared to simple send (~12x slower).

Some good work was done, but I belive that is bad to slow down by an order of magnitude with a single group.

However for the sake of completeness I will try a more complex test with more than 1 group to see if there are drawbacks or not.

@carltongibson
Copy link
Member

Hi @sevdog. Thanks. If you can work out where the time is being spent, that should help... 🙂

@sevdog
Copy link
Contributor

sevdog commented Apr 16, 2020

I had run a test logging times inside send_group function in 3 different points:

  • after connection (line 614)
  • after map-channels-to-connection (line ~628)
  • at the end

I have calculated elapsed time since function get initially called.

These are times stats:

# connection
0.000024, 4.6381E-11, 0.000082, 0.000020
# map channels 
0.000459, 1.2593E-8, 0.001299, 0.000402
# whole function
0.000903, 3.4705E-8, 0.002230, 0.000780

From these data seems that _map_channel_keys_to_connection takes about the same time that the following for loop.

This was still tested with just 1 group.

I will do some other checks with 2 and 10 groups as soon as I can.

@sevdog
Copy link
Contributor

sevdog commented Apr 24, 2020

I have to apologize for my previous statement. Seems that what takes about 0.000210 seconds (with a single client connected) is this part:

async with self.connection(self.consistent_hash(group)) as connection:
# Discard old channels based on group_expiry
await connection.zremrangebyscore(
key, min=0, max=int(time.time()) - self.group_expiry
)
channel_names = [
x.decode("utf8") for x in await connection.zrange(key, 0, -1)
]

I have also performed an other test with 3 different clients which connects to my 3 test consumers, these are times:

## channels 2.2.0  + channel_redis 2.2.1
 v1, 0.002098, 1.0687E-7, 0.003900, 0.001585
 v2, 0.002735, 1.2245E-6, 0.010026, 0.001693
 v3, 0.000074, 3.8980E-10, 0.000122, 0.000052
## channels 2.3.1  + channel_redis 2.3.3
 v1, 0.002400, 1.2855E-6, 0.010583, 0.001343
 v2, 0.002093, 1.0505E-7, 0.002888, 0.001016
 v3, 0.000078, 6.7096E-10, 0.000119, 0.000048
## channels 2.4.0  + channel_redis 2.4.2
 v1, 0.001904, 5.5975E-8, 0.003261, 0.001259
 v2, 0.001906, 5.8758E-8, 0.003391, 0.001102
 v3, 0.000065, 2.4156E-10, 0.000095, 0.000050

I will try to add more automation to my actual tests to have a better poit-of-view fo how the situation evolves with more clients (maybe with a graph or something similar).

@nhondong

This comment was marked as spam.

@sevdog
Copy link
Contributor

sevdog commented May 16, 2020

I have done some other tests with a growing number of client connecting to these test consumers (from 1, 6, 11, 16, 21, 26 and 31) having these results:

Schermata da 2020-05-16 11-59-17

I dubt that this ha something goog, since I also need a comparison with channels v1, since this should be tested on a multi-host enviroment. My tests were performed on my machine with docker, but they should be done on a semi-real environmet (server and clients separeted).

Also a better way to collect time-related data should be used (instead of print/log).

@nhondong

This comment was marked as spam.

@sevdog
Copy link
Contributor

sevdog commented May 20, 2020

I do not have tested against previous version (channels v1.x aesgi-redis 1.x), so this is not a full comparison between major versions.

Also, as I stated before, an accurate test requires different hosts for clients and server. This requires more time (and money).

Moreover my tests were only performed against Python 3.6, it should also be tested against other versions because this may have some kind of reletion with asyncio handling.

@alex-ong
Copy link

alex-ong commented Jun 4, 2020

Is there a way to send directly between consumers in the meantime? It seems like calling

 #add people to a group, then 
 await self.channel_layer.group_send("targetgroup", functionstuff) 

is the current problem, but what about:

 #manually create a group or know about it beforehand
 foreach consumer in group:
     await self.channel_layer.send(consumer, functionstuff) 

I'll try and test this soon, however managing groups ourselves seems like a pain.

EDIT: reading docs, it says that you could maintain the groups yourself. I'll have a look at just using channel_layer.send() vs channel_layer.group_send() with 1-3 consumers

@alex-ong
Copy link

alex-ong commented Jun 5, 2020

Running a quick test, and tested:
self.channel_layer.send(self.channel_name) vs
self.channel_layer.group_send(group that only includes myself)

I got 1ms and 2ms respectively.
I also noticed that the v3 test doesn't even use redis?
Running v3 gives me the 0.000005~ (5 thousandths of an ms) that the previous testers got.

What does self.send() do? It seems like its a websocket send, not a redis send?

Is the goal to get inter process communication to be as fast as websocket? That makes sense on some level; but what portion of python sending it to pickling, sending to redis, then receiving from redis and unpickling is the bottleneck?

@carltongibson
Copy link
Member

#223 was related. (Speed up reported switching redis connection to unix sockets.)

@KenWhitesell
Copy link

Channels appears to create a number of lua scripts that get executed. Is it worth considering "pre-loading" those scripts into redis at start-up and then just call them during processing? I'm wondering whether or not some of these delays are caused by the scripts being loaded and parsed before being executed. (Unfortunately, I don't know of any way of measuring that.)

@carltongibson
Copy link
Member

Hey @KenWhitesell. 👋 I don't think that's the issue. They're cached on first load. 572aa94 was added to make the most of this.

@sevdog
Copy link
Contributor

sevdog commented Jan 12, 2021

I just used yappi to do a more accurate profiling of this call, using this code.

Using 32 clients (which were processes on an other host) I got the following results:

name                         ncall  tsub      ttot      tavg     
RedisChannelLayer.group_send  1      0.003629  0.005724  0.005724  
RedisChannelLayer.group_send  1      0.004395  0.006690  0.006690 
RedisChannelLayer.group_send  1      0.006053  0.009821  0.009821 
RedisChannelLayer.group_send  1      0.004937  0.007409  0.007409 
RedisChannelLayer.group_send  1      0.004702  0.008454  0.008454 

I then tried to figure out how the internal code of the method could be improved and got interested in this piece of code:

for key in channel_redis_keys:
await connection.zremrangebyscore(
key, min=0, max=int(time.time()) - int(self.expiry)
)

Since these tasks does not have any relation which prevents concurrent execution this can be improved via a divide-et-impera using asyncio.gather. So I tried this code:

await asyncio.gather(*[
    connection.zremrangebyscore(
        key, min=0, max=int(time.time()) - int(self.expiry)
    ) for key in channel_redis_keys
])

Which gives the following time results:

name                         ncall  tsub      ttot      tavg     
RedisChannelLayer.group_send  1      0.002697  0.005851  0.005851 
RedisChannelLayer.group_send  1      0.001918  0.003774  0.003774  
RedisChannelLayer.group_send  1      0.004098  0.008801  0.008801 
RedisChannelLayer.group_send  1      0.002748  0.005214  0.005214
RedisChannelLayer.group_send  1      0.003148  0.006863  0.006863 

Within my test this has reduced both overall and internal time.

@carltongibson
Copy link
Member

Hey sevdog, thanks for the update. Once Django 3.2a1 is out the way I will be swinging back to Channels so will take a look at this. 👍

@pbylina

This comment was marked as spam.

@andrewgodwin andrewgodwin removed their assignment Apr 8, 2021
@aryaniyaps
Copy link

@carltongibson
@andrewgodwin
anything new?
I would like to implement channels for a scalable application (expecting 1000+ users in a group). If we cannot do that in the near future, should I consider implementing my own channel layer?

Are there any alternatives?
What service should I use for the layer?

@carltongibson
Copy link
Member

Hi both.

No, this is still pending investigation. I've not had bandwidth as yet. If you'd like to dig in that would be super!

@aryaniyaps
Copy link

I have a doubt @carltongibson ,
Here Andrew says that channels should not be used for extensive usage in the first place, and that we need to roll out our own solutions for sending messages between multiple nodes.
If this issue is fixed, does that mean that we can use channels for sending messages to hundreds of nodes per group (assuming that group sending would become faster)?

@carltongibson
Copy link
Member

@aryan340 without testing there's no real way to say. I would suspect though that at scale you'd want to be looking into more fully featured and battle tested message brokers. There's a lot in this space that's simply out of scope for us here.

@acu192
Copy link
Collaborator

acu192 commented Oct 24, 2021

For those following this issue, it might be interesting for you to know there is a new channel layer implementation which is a lightweight layer on top of Redis Pub/Sub.

Redis Pub/Sub is efficient & fast. This new channel layer thinly wraps it so we can borrow its speed/efficiency. If you run a ton of messages through your app's channel layer (particularly if using a lot of group_send()) then I expect you'll see a big performance boost by switching to this new channel layer (we did - we've been running it in production for 6 months and it's been great).

It's called RedisPubSubChannelLayer. See channels_redis's REDME for info on how to use it. It's still considered alpha-stage so proceed with caution and try it first in your dev/staging environment. Please file an issue if you find one.

Hope it helps someone!

Disclaimer: I started the RedisPubSubChannelLayer, so I'm biased toward positive feelings about it. But, it has already had other awesome people contribute fixes and improvements (biggest shoutout to @qeternity). 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests