-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix clock changes in latest Jepsen #569
Conversation
Hmm. @dancmeyers, since you added some of these parameters specifically to fix earlier broken-ness in Docker, do you want to weigh in here? I don't want to flip back and forth between two broken states if we can avoid it. |
@aphyr By the way, this PR is just what we applied locally to prevent clock skew in the test nodes from propagating to the host. We had the clock skew nemesis disabled in our tests, so we don't know why it was happening. If someone can figure that out and prevent it, I think it would be a better fix than this. |
The current flipped and flopped values for: volumes:
# no /sys/fs/cgroup
privileged: true
cap_add:
- ALL Evolved to that state:
The docker-compose regression has been fixed but not yet released by docker. In general, Jepsen should be able to follow docker-debian-base: # For a host running bullseye, or a newer cgroups and systemd, you have to use this:
docker run -td --stop-signal=SIGRTMIN+3 \
--tmpfs /run:size=100M --tmpfs /run/lock:size=100M \
-v /sys/fs/cgroup:/sys/fs/cgroup:rw --cgroupns=host \
--name=name jgoerzen/debian-base-whatever In the past, the proposed changes to: volumes:
- "/sys/fs/cgroup:/sys/fs/cgroup:ro"
cap_add:
- NET_ADMIN Have broken some environments. And:
Seems to have been true for some time. As for the container clock skew you are seeing, what a good reminder that clock skew is real and can be consequential! Would it be too simplistic to defensively? (jepsen.os/setup!)
; Try to stop ntpd service in case it is present and running.
(try
(c/su
(c/exec :service :ntpd :stop))
(catch RuntimeException e)) |
I have to admit it's been so long since I touched this that I have nothing useful to add right now. Although we do have some more Jepsen work upcoming, so hopefully it doesn't bite me again then ;) |
So--it sounds like I should close this PR then, to avoid re-breaking Docker in other environments? This may be one of those things where a fork is preferable... |
At the current time, I don't think it's possible to have a So, would recommend being conservative for now, leaving |
@aphyr We have an internal fork already, so it's not really a problem for us anymore. We only use the Docker tests for @nurturenature To be clear, the clock skew we're seeing isn't causing a real failure - it's causing the test setup to fail inside the node container. There is no NTP client running inside the test node to stop. There is one running outside the test node on the host. What we're trying to show is that Jepsen will fail on setup in some environments. For us, the problem is solved via a fork. One thing I want to note is that we were a little confused as to why the test nodes need privileged capabilities. It's possible to only change the clock inside a Docker container without affecting the host as long as the container isn't privileged and/or doesn't have the Adding @IamXander in case he has any other observations. |
Docker released new versions earlier this week that better support OS/systemd containers like Jepsen uses for nodes. It's now possible to use unprivileged containers. P.S. 👍 for a CI with Jepsen. |
This partially reverts a previous commit, which added
privileged: true
andcapabilities: ALL
to Docker test nodes. It also adds mounts which should ensure the containers come up with the same local time as the host.Closes #568