Skip to content

Spurious failures in docker daemon on Windows #426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ecstatic-morse opened this issue Jul 25, 2019 · 2 comments
Open

Spurious failures in docker daemon on Windows #426

ecstatic-morse opened this issue Jul 25, 2019 · 2 comments

Comments

@ecstatic-morse
Copy link
Contributor

When doing test runs for #399 on a D4v3 Azure instance, I occasionally observed some spurious regressions/fixes. These appear to be caused by an error in the docker daemon which emits messages like:

error waiting for container: Error response from daemon: i/o timeout

unable to upgrade to tcp, received 500

These messages never appear in the same crate, always in different ones, and one usually does not appear without the other. Additionally, when a run had errors like this, there would often be other crates which failed with no error message (like in #310). I'm unsure if these are actually correlated since the logs don't include their timestamps, but it seems likely.

The code which emits the second error is here. My best guess is that we are observing ECONNTIMEOUT despite the TCP Keepalive settings earlier. Maybe due to a transient network issue?

@ecstatic-morse ecstatic-morse mentioned this issue Jul 25, 2019
13 tasks
@pietroalbini
Copy link
Member

A temporary solution to make this less painful for people reviewing reports would be to detect when this happens and overriding the build result to be a spurious BuildFail(DockerIOError). A new spurious FailureReason needs to be added, and an example of overriding the build result can be found here.

@ecstatic-morse
Copy link
Contributor Author

ecstatic-morse commented Jul 30, 2019

After switching from a Windows 10 Pro Host to a Windows Server one, I have yet to observe this issue after building around 500 crates. It may still be lurking, but it also could be a symptom of process isolation on Windows 10 not being recommended for production use.

In any case, we should start checking for special return codes from docker run. I'm not sure if these are actually returned when the aforementioned errors occur though. I tried checking the {{ .State.Error }} field of docker inspect when I could still reproduce this but nothing was present even when the error occurred.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants