Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Django Channels memory leaks on opening new connections #373

Open
yuriymironov96 opened this issue May 19, 2021 · 27 comments
Open

Django Channels memory leaks on opening new connections #373

yuriymironov96 opened this issue May 19, 2021 · 27 comments

Comments

@yuriymironov96
Copy link

Hello! First of all, this is a great project and I wanted to thank all the maintainers for keeping it amazing!

For last two months, django-channels websockets on our production service have been periodically failing with OOM. Having researched it, it looks like a django-channels memory leak.

I think it may be related to opening new channel connections and then improperly closing them in channels_layer object. Opening multiple browser tabs leading to a single channels group increases memory usage, and closing them does not release the memory (the disconnect occurs though).

I have also managed to reproduce it on a simple project with minimal dependencies and steps to reproduce, so please feel free to check it out: https://github.com/yuriymironov96/django-channels-leak. This is a project based on django-channels tutorial.

Here are dependencies of the sample project:

  • python = "^3.8"
  • channels = "^3.0.3"
  • channels-redis = "^3.2.0"
  • channels-rabbitmq = "^3.0.0"
  • Pympler = "^0.9"
  • daphne = "^3.0.2"

I have tries both channels_redis and channels_rabbitmq, multiple servers (django debug server, daphne, uvicorn) and the issue still persists.

The sample benchmarks are:

  • Server is launched, no browser tabs open: 49.3 MB;
  • First browser tab opened: 51.2 MB;
  • 20 browser tabs opened: 52.2 MB;
  • All browser tabs closed: 52.2 MB;
  • 20 browser tabs opened: 52.6 MB;
  • All browser tabs closed: 52.6 MB;

It may look like a minor leak at this rate, but it is scales quickly and occurs frequently due to high load of our application.

Could you please have a look at it and share any ideas you have?

@mitgr81
Copy link

mitgr81 commented May 27, 2021

Thank you for your work on this so far, @yuriymironov96. I've submitted a PR to your sample project and what seems to me like at least a naive work-around that either fully eliminated or at least drastically reduced the impact of this memory leak in my testing on this and another project that I've been working on.

That work-around is at yuriymironov96/django-channels-leak@ec0a7da and is exceptionally "knowy" as it alters state of a different module's object. I hope it will be helpful for folks who know these projects better than I to either tell me why I'm wrong, at least; or at best help drive a discussion toward a sustainable fix for the issue.

@yuriymironov96
Copy link
Author

@mitgr81
Thank you for your interest in this issue and thank you for the PR!

I have a question: which parts of PR are essential for the fix to work? I did a quick check:

  • Copied contents of /chat/consumers.py file to my repo;
  • Opened ~50 tabs of the same chat room;
  • Memory usage increased by 4mb;
  • Focused on the first tab and pressed "close tabs to the right";

Result:

  • The memory is not released;
  • The following traceback appears in the terminal:
... (this traceback looped and took all my terminal)
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/yurii/code/python/django-channels-leak/.venv/lib/python3.8/site-packages/channels/routing.py", line 71, in __call__
    return await application(scope, receive, send)
  File "/Users/yurii/code/python/django-channels-leak/.venv/lib/python3.8/site-packages/channels/sessions.py", line 47, in __call__
    return await self.inner(dict(scope, cookies=cookies), receive, send)
  File "/Users/yurii/code/python/django-channels-leak/.venv/lib/python3.8/site-packages/channels/sessions.py", line 254, in __call__
    return await self.inner(wrapper.scope, receive, wrapper.send)
  File "/Users/yurii/code/python/django-channels-leak/.venv/lib/python3.8/site-packages/channels/auth.py", line 181, in __call__
    return await super().__call__(scope, receive, send)
  File "/Users/yurii/code/python/django-channels-leak/.venv/lib/python3.8/site-packages/channels/middleware.py", line 26, in __call__
    return await self.inner(scope, receive, send)
  File "/Users/yurii/code/python/django-channels-leak/.venv/lib/python3.8/site-packages/channels/routing.py", line 150, in __call__
    return await application(
  File "/Users/yurii/code/python/django-channels-leak/.venv/lib/python3.8/site-packages/channels/consumer.py", line 94, in app
    return await consumer(scope, receive, send)
  File "/Users/yurii/code/python/django-channels-leak/.venv/lib/python3.8/site-packages/channels/consumer.py", line 58, in __call__
    await await_many_dispatch(
  File "/Users/yurii/code/python/django-channels-leak/.venv/lib/python3.8/site-packages/channels/utils.py", line 58, in await_many_dispatch
    await task
  File "/Users/yurii/code/python/django-channels-leak/.venv/lib/python3.8/site-packages/channels_redis/core.py", line 426, in receive
    del self.receive_buffer[channel]
KeyError: 'specific.a77195ec8bb349cda933a4ad48c77266!10d9b0b0168845418b1d3b5e205d79e3'
127.0.0.1:51226 - - [02/Jun/2021:14:44:30] "WSDISCONNECT /ws/chat/hello/" - -
disconnect
2021-06-02 14:44:31,371 ERROR    Exception inside application: 'specific.a77195ec8bb349cda933a4ad48c77266!adddc4dc5e3d498faa9db9669983cfb5'
Traceback (most recent call last):
  File "/Users/yurii/code/python/django-channels-leak/.venv/lib/python3.8/site-packages/channels_redis/core.py", line 417, in receive
    done, pending = await asyncio.wait(
  File "/Users/yurii/.pyenv/versions/3.8.6/lib/python3.8/asyncio/tasks.py", line 426, in wait
    return await _wait(fs, timeout, return_when, loop)
  File "/Users/yurii/.pyenv/versions/3.8.6/lib/python3.8/asyncio/tasks.py", line 531, in _wait
    await waiter
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/yurii/code/python/django-channels-leak/.venv/lib/python3.8/site-packages/channels/routing.py", line 71, in __call__
    return await application(scope, receive, send)
  File "/Users/yurii/code/python/django-channels-leak/.venv/lib/python3.8/site-packages/channels/sessions.py", line 47, in __call__
    return await self.inner(dict(scope, cookies=cookies), receive, send)
  File "/Users/yurii/code/python/django-channels-leak/.venv/lib/python3.8/site-packages/channels/sessions.py", line 254, in __call__
    return await self.inner(wrapper.scope, receive, wrapper.send)
  File "/Users/yurii/code/python/django-channels-leak/.venv/lib/python3.8/site-packages/channels/auth.py", line 181, in __call__
    return await super().__call__(scope, receive, send)
  File "/Users/yurii/code/python/django-channels-leak/.venv/lib/python3.8/site-packages/channels/middleware.py", line 26, in __call__
    return await self.inner(scope, receive, send)
  File "/Users/yurii/code/python/django-channels-leak/.venv/lib/python3.8/site-packages/channels/routing.py", line 150, in __call__
    return await application(
  File "/Users/yurii/code/python/django-channels-leak/.venv/lib/python3.8/site-packages/channels/consumer.py", line 94, in app
    return await consumer(scope, receive, send)
  File "/Users/yurii/code/python/django-channels-leak/.venv/lib/python3.8/site-packages/channels/consumer.py", line 58, in __call__
    await await_many_dispatch(
  File "/Users/yurii/code/python/django-channels-leak/.venv/lib/python3.8/site-packages/channels/utils.py", line 58, in await_many_dispatch
    await task
  File "/Users/yurii/code/python/django-channels-leak/.venv/lib/python3.8/site-packages/channels_redis/core.py", line 426, in receive
    del self.receive_buffer[channel]
KeyError: 'specific.a77195ec8bb349cda933a4ad48c77266!adddc4dc5e3d498faa9db9669983cfb5'

I will continue researching your solution tomorrow, but maybe you can already point out what I might be missing?

@mitgr81
Copy link

mitgr81 commented Jun 2, 2021

@yuriymironov96 I think you're hitting the reason I needed to make alterations to hotfixes/channels_redis/core.py that can also be seen in yuriymironov96/django-channels-leak@ec0a7da. Essence there is to allow for somebody else removing the channel out of the receive_buffer...which I think is incredibly "dirty", but again, hoping to drive a conversation rather than a direct fix.

@mitgr81
Copy link

mitgr81 commented Jun 2, 2021

Also for what it's worth I wasn't directly testing for memory getting released on socket close, but I had altered the sample consumer to disconnect and reconnect after a short time; and I was looking to make sure memory growth was capped. Probably ultimately close to the same end result, but slightly different approach.

@yuriymironov96
Copy link
Author

yuriymironov96 commented Jun 3, 2021

@mitgr81
Thanks again! It was silly of me to forget about changes in the channels_redis/core.py itself.

I have another question: are changes made to channels_leak/chat/templates/chat/room.html necessary for this fix to work? Because we are having quite a large SPA and reloading it to handle sockets issue will not work for us.

As for testing without changes to client-side: having changed channels_redis/core.py and consumers.py, the memory is still leaking. Your fix did its job perfectly and cleared channel_layer.receive_buffer object, but it looks like the memory is also leaking somewhere else.

@yuriymironov96
Copy link
Author

Moreover, it looks like there is one more memory leak that concerns volume of messages transferred through sockets. Having generated 500-paragraph Lorem Ipsum message and sent it 5 times (2 tabs open), application has claimed about 1MB of memory and never released it.

@mitgr81
Copy link

mitgr81 commented Jun 3, 2021

I have another question: are changes made to channels_leak/chat/templates/chat/room.html necessary for this fix to work?

They're not, that was just to help accelerate the test in my use case.

As for testing without changes to client-side: having changed channels_redis/core.py and consumers.py, the memory is still leaking. Your fix did its job perfectly and cleared channel_layer.receive_buffer object, but it looks like the memory is also leaking somewhere else.

That matches my findings. It appears that that reduced the magnitude of the leak; but not eliminated it entirely.

Moreover, it looks like there is one more memory leak that concerns volume of messages transferred through sockets. Having generated 500-paragraph Lorem Ipsum message and sent it 5 times (2 tabs open), application has claimed about 1MB of memory and never released it.

Seems like a good place to keep digging; I appreciate the collaboration!

@revoteon
Copy link

I was able to replicate memory leak. It is much more pronounced than I expected.

My observations:

  • It grows proportional to number of requests
  • It happens with both in-memory channel layer and redis channel layer
  • It happens with Daphne only. I was unable to replicate it with Uvicorn.

So perhaps the issue is related to Daphne?

@mitgr81
Copy link

mitgr81 commented Jun 22, 2021

I could see that. We have swapped from running purely Daphne in our environments to round-robin Daphne/unicorn deployments. Anecdotally it’s been better, but we also shipped with a patch around this issue at the same time.

@revoteon
Copy link

revoteon commented Jul 1, 2021

After a week of usage of Uvicorn instead Daphne in production environment, I'm now fully convinced that this leak is related to Daphne, not channels. In my use case, results are significant:

memoryusagedaphne-png-2362×664-

Not only that, the switch to Uvicorn proved to have significant effect on CPU usage as well. Its lower than that of Daphne. As a result, we have not only addressed memory leak, but also improved website performance as well.

@carltongibson
Copy link
Member

OK, let's move this over to Daphne. Thanks @revoteon

@carltongibson carltongibson transferred this issue from django/channels Jul 2, 2021
@wrath625
Copy link

wrath625 commented Dec 19, 2021

any movement on this? been having memory consumption crashes and have narrowed things down in my use case to it being very likely this. it fits like a glove

EDIT: I was able to resolve my leak by reading updated docs, getting good, and fixing my code. I wasn't cleaning this up with a group_discard:

async_to_sync(self.channel_layer.group_add)( ... )

@ax003d
Copy link

ax003d commented Jul 19, 2022

same here, it works well with python3.6, but when I upgrade to python3.8, Daphne began memory leak. I have these packages installed:
channels==2.4.0
channels-redis==3.1.0

@BSVogler
Copy link

We ended up here as we were looking for the source of memory leaks. We are currently using daphne on python3.11 docker x86 image. Daphne on arm seems to be fine though.

@badrul1
Copy link

badrul1 commented Jul 28, 2023

Is this still unresolved? I'm trying Daphne for the firs time and noticed my memory usage in Docker goes from 4gb to 6gb+ and invariably crashes Docker.

@pgrzesik
Copy link

pgrzesik commented Aug 1, 2023

Hello @BSVogler 👋 Did you manage to find the root cause of that issue? I think I might be running into the exact same situation after Python 3.11 upgrade.

@aaditya-ridecell
Copy link

Hello,
Seeing the issue after upgrading to Python 3.8.

@BSVogler
Copy link

BSVogler commented Aug 3, 2023

Hello @BSVogler 👋 Did you manage to find the root cause of that issue? I think I might be running into the exact same situation after Python 3.11 upgrade.

no, we switched over to hypercorn.

@badrul1
Copy link

badrul1 commented Aug 3, 2023

My issue appears to have gone away. Not sure how or why.

@carltongibson
Copy link
Member

carltongibson commented Aug 3, 2023

It would be good if someone could provide a minimal reproduction, just involving Daphne and a simple application, and not e.g. Docker &co.

Likely it's an issue in the application but it's impossible to tell without anything (small) to reason about.

@pgrzesik
Copy link

pgrzesik commented Aug 3, 2023

@BSVogler Thanks a lot for the update 🙇

@carltongibson I'm currently trying to nail down the issue that we're encountering. If I manage to pin it down to specific behavior of daphne I'll make sure to share a minimal reproducible case.

@aaditya-ridecell
Copy link

@pgrzesik Thank you for taking this up!
We really cannot find a root cause and might end up removing channels code.

@pgrzesik
Copy link

pgrzesik commented Aug 15, 2023

Hey 👋 After digging deeper into it, it seems like the issue on our end is related to some weird interaction between daphne and ddtrace library on Python 3.11 (Python 3.10 is fine, we have it running for a long time with the same setup). Without ddtrace we don't experience these segfaults, similarily with ddtrace but without daphne (e.g. on our other apps), we also don't experience these issues. Given the above, I'm afraid I won't be able to help with reproduction as it seems to be more complex than initially thought.

@aaditya-ridecell What Python version are you using?

@carltongibson
Copy link
Member

Thanks for the effort @pgrzesik -- even a partial result helps narrow things down. 🎁

@aaditya-ridecell
Copy link

Thanks for looking into this @pgrzesik. We are using Python 3.8 and don't use the ddtrace library.

@augustolima
Copy link

Hey, any updates guys? I'm experiencing the same thing.

How did you guys fix the issue?

@Natgho
Copy link

Natgho commented Apr 3, 2024

Hey, is there any update?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests