Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix: Add retries to update container state, increase container state …
…ttl (#644) In a remote provider, we noticed this: ``` 2024-10-21T00:46:11.860396431Z <taskqueue-527dc66c-0d37-4614-b078-3ca3f77fe603-4a686b40> - container still running: 489898f6b7814f85 2024-10-21T00:46:11.936984064Z <taskqueue-527dc66c-0d37-4614-b078-3ca3f77fe603-4a686b40>:<53d0e79d-8203-46a2-8ea5-a151ef59412c> - Epoch 284: 0%| | 0/67 [00:00<?, ?it/s, v_num=0] 2024-10-21T00:46:11.980022059Z <taskqueue-527dc66c-0d37-4614-b078-3ca3f77fe603-4a686b40> - container state not found, stopping container 2024-10-21T00:46:11.980043449Z <taskqueue-527dc66c-0d37-4614-b078-3ca3f77fe603-4a686b40> - stopping container. 2024-10-21T00:46:11.991772502Z <taskqueue-527dc66c-0d37-4614-b078-3ca3f77fe603-4a686b40> - container stopped. 2024-10-21T00:46:12.170090824Z Unmounting layer: /tmp/taskqueue-527dc66c-0d37-4614-b078-3ca3f77fe603-4a686b40/layer-0/merged ``` ``` 2024-10-21T00:45:41.750579111Z <taskqueue-527dc66c-0d37-4614-b078-3ca3f77fe603-4a686b40> - container still running: 489898f6b7814f85 2024-10-21T00:45:41.860215399Z <taskqueue-527dc66c-0d37-4614-b078-3ca3f77fe603-4a686b40> - unable to update container state: redislock: not obtained ``` Basically after a single failure we're considering the container lost and killing it. This gives us a little bit more leeway.
- Loading branch information