Massive memory usage using Docker Swarm with Windows Server 22 #337

andyfisher100 · 2023-03-17T14:56:35Z

Describe the bug
We have been using docker swarm for our windows containers for a number of years now. We have not long upgraded our host nodes to run Windows Server 2022.

We have noticed that since the upgrade there has been a huge spike in memory consumption and eventually the node becomes unusable hitting 98% memory used over a couple of weeks. Our servers has between 64GB and 128Gb of memory.

What we notice is that over a 1-2 week period the Non-paged pool memory just gradually increases until full. This behaviour usually suggests a memory leak of some kind.

In the end we just have to reboot the host, which is not ideal in a production system

I believe it may be linked to the way some of our containers run and that swarm is trying to keep track of containers that are no longer running. We use our containers as build Agents for Azure DevOps. Some of the containers are configured to "Run Once". In this setup, a Azure DevOps pipeline job runs and when finished, the agent process dies and thus the container also dies. swarm realises that there is now one less replica and then spins up a new clean build agent.

I don't know if containers that have died are some how still having memory reserved for them by the OS.

To Reproduce

Setup a docker swarm configuration running windows server 22
Run a swarm job where the containers will self stop
Non-Paged memory usage will gradually increase

Expected behaviour
I would expect the memory usage to not steadily increase until the host becomes unusable and requires a reboot

Configuration:

Edition: Windows Server 2022 Standard Edition
Base Image being used: mcr.microsoft.com/dotnet/framework/runtime:4.8-windowsservercore-ltsc2022
Container engine: Docker
Container Engine version 20.10.9

Additional context
This issue did not seem to occur when the hosts ran Windows Server 2019

microsoft-github-policy-service · 2023-04-21T17:30:20Z

This issue has been open for 30 days with no updates.
@MikeZappa87, please provide an update or close this issue.

microsoft-github-policy-service · 2023-04-21T17:30:26Z

This issue has been open for 30 days with no updates.
@MikeZappa87, please provide an update or close this issue.

fady-azmy-msft · 2023-04-26T18:14:42Z

Looking into this. I created an internal ticket (#44339481) for tracking.

MikeZappa87 · 2023-04-26T20:05:28Z

@fady-azmy-msft I believe this is already being tracked.

andyfisher100 · 2023-04-27T08:28:46Z

Yes I believe this is being tracked internally per communication I've had with my companies Microsoft representative. I used poolmon to track down what was using the non page pool memory and it was the HTab tag, this then appeared to be a known issue with Win Server 22

I'd like to leave this open until the Windows Update is published to fix the issue as this may help others if they experience the same problem

microsoft-github-policy-service · 2023-05-29T00:49:39Z

This issue has been open for 30 days with no updates.
@MikeZappa87, please provide an update or close this issue.

microsoft-github-policy-service · 2023-05-29T00:49:40Z

This issue has been open for 30 days with no updates.
@MikeZappa87, please provide an update or close this issue.

microsoft-github-policy-service · 2023-06-28T05:36:00Z

This issue has been open for 30 days with no updates.
@MikeZappa87, please provide an update or close this issue.

microsoft-github-policy-service · 2023-06-28T05:36:00Z

This issue has been open for 30 days with no updates.
@MikeZappa87, please provide an update or close this issue.

microsoft-github-policy-service · 2023-07-31T12:13:16Z

This issue has been open for 30 days with no updates.
@MikeZappa87, please provide an update or close this issue.

microsoft-github-policy-service · 2023-08-30T15:31:57Z

This issue has been open for 30 days with no updates.
@MikeZappa87, please provide an update or close this issue.

fady-azmy-msft · 2023-10-02T17:12:05Z

This bug should have been fixed with 9B (September's Patch Tuesday). @andyfisher100 can you confirm this no longer repros for you?

andyfisher100 · 2023-10-11T09:12:20Z

@fady-azmy-msft just to confirm that the patch is KB5030216?

I have installed the patch on multiple host machines that are part of our docker swarm. One of the machines the non-paged memory pool seems have stabilised around 10.9GB although this still seems a little high. However on another machine the non-paged memory is currently up to 20.3Gb after 6 days of uptime and seems to be increasing by about 3GB daily. Using poolmon it would appear that it is still the HTab memory pool consuming the majority of the non-paged memory

We also have a support case open with yourselves and have been performing multiple memory dumps for analysis etc. The support engineer has mentioned that handles are being left open by containers. In our docker setup, the containers managed by swarm are short lived, they run one azure devops pipeline job and then the container exits and swarm starts a new container for the next azure devops pipeline job. This means that a host may have many stopped containers from a swarm service.
I have also seen files locked by a process in the Docker\windowsfilter folder when no containers are running but had been running previously.

The Microsoft support engineer suggested that we need to open a case with our docker provider. Currently we make use of Docker CE/Moby as instructed in the MS documentation here https://learn.microsoft.com/en-us/virtualization/windowscontainers/quick-start/set-up-environment?tabs=dockerce#windows-server-1

Would you suggest raising an issue on the Moby page or is this issue sufficient?

andyfisher100 · 2023-12-14T12:35:57Z

I have been working with MS support and it appears that this issue is now resolved via windows updates. I can't see anything specifically in the Windows Server 2022 patch notes so I can't say for sure if its the patch @fady-azmy-msft mentioned or if it is a later patch, the system is patched with KB5032198 (Nov Cumulative update)

Also worth noting that I updated docker to the very latest version during my patching/testing too

andyfisher100 added the bug Something isn't working label Mar 17, 2023

microsoft-github-policy-service bot added the triage New and needs attention label Mar 17, 2023

fady-azmy-msft assigned MikeZappa87 Mar 20, 2023

fady-azmy-msft removed the triage New and needs attention label Mar 20, 2023

andyfisher100 closed this as completed Dec 14, 2023

This was referenced Jan 31, 2024

Windows Containers memory leak moby/moby#39476

Closed

docker windows is leaking non-paged kernel memory moby/moby#42487

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Massive memory usage using Docker Swarm with Windows Server 22 #337

Massive memory usage using Docker Swarm with Windows Server 22 #337

andyfisher100 commented Mar 17, 2023

microsoft-github-policy-service bot commented Apr 21, 2023

microsoft-github-policy-service bot commented Apr 21, 2023

fady-azmy-msft commented Apr 26, 2023

MikeZappa87 commented Apr 26, 2023

andyfisher100 commented Apr 27, 2023 •

edited

Loading

microsoft-github-policy-service bot commented May 29, 2023

microsoft-github-policy-service bot commented May 29, 2023

microsoft-github-policy-service bot commented Jun 28, 2023

microsoft-github-policy-service bot commented Jun 28, 2023

microsoft-github-policy-service bot commented Jul 31, 2023

microsoft-github-policy-service bot commented Aug 30, 2023

fady-azmy-msft commented Oct 2, 2023

andyfisher100 commented Oct 11, 2023 •

edited

Loading

andyfisher100 commented Dec 14, 2023

Massive memory usage using Docker Swarm with Windows Server 22 #337

Massive memory usage using Docker Swarm with Windows Server 22 #337

Comments

andyfisher100 commented Mar 17, 2023

microsoft-github-policy-service bot commented Apr 21, 2023

microsoft-github-policy-service bot commented Apr 21, 2023

fady-azmy-msft commented Apr 26, 2023

MikeZappa87 commented Apr 26, 2023

andyfisher100 commented Apr 27, 2023 • edited Loading

microsoft-github-policy-service bot commented May 29, 2023

microsoft-github-policy-service bot commented May 29, 2023

microsoft-github-policy-service bot commented Jun 28, 2023

microsoft-github-policy-service bot commented Jun 28, 2023

microsoft-github-policy-service bot commented Jul 31, 2023

microsoft-github-policy-service bot commented Aug 30, 2023

fady-azmy-msft commented Oct 2, 2023

andyfisher100 commented Oct 11, 2023 • edited Loading

andyfisher100 commented Dec 14, 2023

andyfisher100 commented Apr 27, 2023 •

edited

Loading

andyfisher100 commented Oct 11, 2023 •

edited

Loading