-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Massive memory usage using Docker Swarm with Windows Server 22 #337
Comments
This issue has been open for 30 days with no updates. |
1 similar comment
This issue has been open for 30 days with no updates. |
Looking into this. I created an internal ticket (#44339481) for tracking. |
@fady-azmy-msft I believe this is already being tracked. |
Yes I believe this is being tracked internally per communication I've had with my companies Microsoft representative. I used poolmon to track down what was using the non page pool memory and it was the HTab tag, this then appeared to be a known issue with Win Server 22 I'd like to leave this open until the Windows Update is published to fix the issue as this may help others if they experience the same problem |
This issue has been open for 30 days with no updates. |
5 similar comments
This issue has been open for 30 days with no updates. |
This issue has been open for 30 days with no updates. |
This issue has been open for 30 days with no updates. |
This issue has been open for 30 days with no updates. |
This issue has been open for 30 days with no updates. |
This bug should have been fixed with 9B (September's Patch Tuesday). @andyfisher100 can you confirm this no longer repros for you? |
@fady-azmy-msft just to confirm that the patch is KB5030216? I have installed the patch on multiple host machines that are part of our docker swarm. One of the machines the non-paged memory pool seems have stabilised around 10.9GB although this still seems a little high. However on another machine the non-paged memory is currently up to 20.3Gb after 6 days of uptime and seems to be increasing by about 3GB daily. Using poolmon it would appear that it is still the HTab memory pool consuming the majority of the non-paged memory We also have a support case open with yourselves and have been performing multiple memory dumps for analysis etc. The support engineer has mentioned that handles are being left open by containers. In our docker setup, the containers managed by swarm are short lived, they run one azure devops pipeline job and then the container exits and swarm starts a new container for the next azure devops pipeline job. This means that a host may have many stopped containers from a swarm service. The Microsoft support engineer suggested that we need to open a case with our docker provider. Currently we make use of Docker CE/Moby as instructed in the MS documentation here https://learn.microsoft.com/en-us/virtualization/windowscontainers/quick-start/set-up-environment?tabs=dockerce#windows-server-1 Would you suggest raising an issue on the Moby page or is this issue sufficient? |
I have been working with MS support and it appears that this issue is now resolved via windows updates. I can't see anything specifically in the Windows Server 2022 patch notes so I can't say for sure if its the patch @fady-azmy-msft mentioned or if it is a later patch, the system is patched with KB5032198 (Nov Cumulative update) Also worth noting that I updated docker to the very latest version during my patching/testing too |
Describe the bug
We have been using docker swarm for our windows containers for a number of years now. We have not long upgraded our host nodes to run Windows Server 2022.
We have noticed that since the upgrade there has been a huge spike in memory consumption and eventually the node becomes unusable hitting 98% memory used over a couple of weeks. Our servers has between 64GB and 128Gb of memory.
What we notice is that over a 1-2 week period the Non-paged pool memory just gradually increases until full. This behaviour usually suggests a memory leak of some kind.
In the end we just have to reboot the host, which is not ideal in a production system
I believe it may be linked to the way some of our containers run and that swarm is trying to keep track of containers that are no longer running. We use our containers as build Agents for Azure DevOps. Some of the containers are configured to "Run Once". In this setup, a Azure DevOps pipeline job runs and when finished, the agent process dies and thus the container also dies. swarm realises that there is now one less replica and then spins up a new clean build agent.
I don't know if containers that have died are some how still having memory reserved for them by the OS.
To Reproduce
Expected behaviour
I would expect the memory usage to not steadily increase until the host becomes unusable and requires a reboot
Configuration:
Additional context
This issue did not seem to occur when the hosts ran Windows Server 2019
The text was updated successfully, but these errors were encountered: