-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Container snapshots/restore feature and other critical bug fixes #6441
base: main
Are you sure you want to change the base?
Conversation
- Added sandbox.docker_snapshots config (bool) - Mounts a btrfs loop device for docker storage - Can be enabled/disabled (revert docker config change)
This is a great feature, thank you for working on it! It might be good to try to keep the bug fixes separate from new feature(s). One smaller fixes PR and one large feature PR would mean that the fixes PR is faster to review and merge. If you need the fixes too, you can create/rebase the feature branch on top on the fixes branch. It shouldn't cause much trouble when merged, then. Up to you, it just would be better IMHO. Because I think you are correct that the fixes are needed (independently), so it would be nice if we can take them in before the snapshots feature is ready. |
Here I isolated the critical bug fixes: |
Let's remove the dev_container stuff from this PR to keep changes isolated. Feel free to open a second PR for dev_container, but no guarantees there. The maintainers explicitly chose to remove dev_container, and I don't want to add it back without gathering consensus first |
I tried this on mac and it doesn't really work, and it looks like it now requires On a real deployment (SaaS in kubernetes, on-prem corporate, etc..) that serves multiple users, this would need a significantly different implementation (support from the runtime infra itself). What's the plan for making this actually work in a real production deployment? |
What exactly didn't work? any error msg? You can try the test case commented here: |
I'm running OH inside a container and sandboxes as nested containers (and sometimes those sandboxes create nested-nested-containers). I don't understand why people is running OH on the host. Probably not windows users. Docker runs as root on my OH env. |
The runtime operates within the sandbox. While the runtime can trigger the snapshot/restore operation, the actual handling must occur outside the sandbox. Otherwise, you won't be able to capture the entire sandbox storage during the snapshot.
You mean running OH in production like a "OH as a service"? We are snapshotting sandboxes (docker containers) and docker containers are widely used in production. Related: For #6457 we will probably need to snapshot a designated workspace. This is another additional requirement that might also align well with your use case. |
And increased timeout, needed for slow headless docker in docker environments.
It's a sparse file (no real storage allocation)
Make sure to |
@kripper I think we need to break this into multiple PRs The reattach on ExposedPorts is a simple bug, we should be able to merge that fix quickly The snapshot stuff seems super cool, would love to get that in The devcontainer stuff is controversial, and I'd love to have a separate PR where we can discuss whether or not to merge that |
Oh nice looks like a good chunk of it was broken out here! https://github.com/All-Hands-AI/OpenHands/pull/6460/files 🙏 |
Ok. I will remove the DevContainer folder from this PR in the future. My expectation was the other PR to be merged and so to avoid the additional effort of preparing PRs with cherry picked commits. |
PR Summary
This PR introduces a feature to snapshot/restore the sandbox storage + several important fixes and features to enhance the overall functionality of the system.
Key Changes:
docker_runtime_kwargs = { privileged = true }
[Feature] Support running docker-in-docker for OpenHands runtime #5569
New Feature: Container Snapshot and Restore
This feature introduces the ability to snapshot and restore container storage. It mounts Docker's storage directory (
/var/lib/docker
) on a virtual Btrfs volume using a loop device within the OpenHands (OH) environment. This approach has been tested on Windows using VS Code and Dev Containers, with OH running in a Docker container and creating sandboxes in nested containers.Motivation:
The primary goal of this feature is to enable reverting changes made by OH across the entire system. This allows users to effectively undo any system modifications introduced during interactions within OH, promoting a safer and more predictable development and testing environment.
Important Notes:
Current Status:
Usage Instructions:
To enable the snapshot and restore functionality, add the following configuration to your sandbox settings:
If disabled, the Docker configuration will revert upon restarting OH.
Snapshot Creation:
Testing
openhands/storage/docker_snapshots.py
contains a test case (unit-tests in comments) to debug and understand how it works.Feel free to reach out if there are any issues or further questions.