Open
Description
When doing test runs for #399 on a D4v3 Azure instance, I occasionally observed some spurious regressions/fixes. These appear to be caused by an error in the docker daemon which emits messages like:
error waiting for container: Error response from daemon: i/o timeout
unable to upgrade to tcp, received 500
These messages never appear in the same crate, always in different ones, and one usually does not appear without the other. Additionally, when a run had errors like this, there would often be other crates which failed with no error message (like in #310). I'm unsure if these are actually correlated since the logs don't include their timestamps, but it seems likely.
The code which emits the second error is here. My best guess is that we are observing ECONNTIMEOUT
despite the TCP Keepalive settings earlier. Maybe due to a transient network issue?
Metadata
Metadata
Assignees
Labels
No labels