You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When doing test runs for #399 on a D4v3 Azure instance, I occasionally observed some spurious regressions/fixes. These appear to be caused by an error in the docker daemon which emits messages like:
error waiting for container: Error response from daemon: i/o timeout
unable to upgrade to tcp, received 500
These messages never appear in the same crate, always in different ones, and one usually does not appear without the other. Additionally, when a run had errors like this, there would often be other crates which failed with no error message (like in #310). I'm unsure if these are actually correlated since the logs don't include their timestamps, but it seems likely.
The code which emits the second error is here. My best guess is that we are observing ECONNTIMEOUT despite the TCP Keepalive settings earlier. Maybe due to a transient network issue?
The text was updated successfully, but these errors were encountered:
A temporary solution to make this less painful for people reviewing reports would be to detect when this happens and overriding the build result to be a spurious BuildFail(DockerIOError). A new spurious FailureReason needs to be added, and an example of overriding the build result can be found here.
After switching from a Windows 10 Pro Host to a Windows Server one, I have yet to observe this issue after building around 500 crates. It may still be lurking, but it also could be a symptom of process isolation on Windows 10 not being recommended for production use.
In any case, we should start checking for special return codes from docker run. I'm not sure if these are actually returned when the aforementioned errors occur though. I tried checking the {{ .State.Error }} field of docker inspect when I could still reproduce this but nothing was present even when the error occurred.
When doing test runs for #399 on a D4v3 Azure instance, I occasionally observed some spurious regressions/fixes. These appear to be caused by an error in the docker daemon which emits messages like:
These messages never appear in the same crate, always in different ones, and one usually does not appear without the other. Additionally, when a run had errors like this, there would often be other crates which failed with no error message (like in #310). I'm unsure if these are actually correlated since the logs don't include their timestamps, but it seems likely.
The code which emits the second error is here. My best guess is that we are observing
ECONNTIMEOUT
despite the TCP Keepalive settings earlier. Maybe due to a transient network issue?The text was updated successfully, but these errors were encountered: