-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docker checkpoint/restore is slow #2519
Comments
Sounds like you already made the necessary measurements without docker. So it indeed sounds rather like a docker problem. Although it is not directly obvious why it would be so much slower. Have you tried it with Podman. Just as an additional data point. |
@hanwen-flow I was able to replicate these results locally. It looks like the reason
Checkpoint/restore with Podman is significantly faster. |
@rst0git thanks for the analysis. I will look into podman. However, we are running a SaaS company, and it is not clear if we can push this change onto our customers. |
The main reason to try Podman is to see if it is a Docker or a CRIU problem. If it is just a Docker problem you can provide a patch to Docker and fix it there. |
@adrianreber I believe this problem is related to the migration to v2 shim (moby/moby#41546) and the implementation is similar to the |
fyi, I've been toying with podman. While the CRIU part of it is plenty fast, the way the rootfs diff is handled seems clumsy and somewhat slow. I'll open a separate issue with podman. |
Description
Checkpoint/restore inside docker is slow.
(apologies if this the wrong place to report, but even a closed bugreport about this would have saved me quite some time.)
Steps to reproduce the issue:
I made a program to allocate memory here, and tried to CP/Restore it both directly and running inside a docker container
dumping the program with a 5Gb heap running on Linux directly took about 5 seconds (1 Gb/sec)
restoring it took about 1 second.
dumping the program when running in a docker container using
docker checkpoint create
took about 40 seconds; restoring the checkpoint took 20 seconds.CRIU logs and information:
there seem to be no logs under
/var/lib/docker/containers/$containerID/checkpoints/$cpID
version info:
The text was updated successfully, but these errors were encountered: