Skip to content

Commit

Permalink
Fix: Add retries to update container state, increase container state …
Browse files Browse the repository at this point in the history
…ttl (#644)

In a remote provider, we noticed this:
```
2024-10-21T00:46:11.860396431Z <taskqueue-527dc66c-0d37-4614-b078-3ca3f77fe603-4a686b40>  - container still running: 489898f6b7814f85
2024-10-21T00:46:11.936984064Z <taskqueue-527dc66c-0d37-4614-b078-3ca3f77fe603-4a686b40>:<53d0e79d-8203-46a2-8ea5-a151ef59412c>  - Epoch 284:   0%|          | 0/67 [00:00<?, ?it/s, v_num=0]
2024-10-21T00:46:11.980022059Z <taskqueue-527dc66c-0d37-4614-b078-3ca3f77fe603-4a686b40>  - container state not found, stopping container
2024-10-21T00:46:11.980043449Z <taskqueue-527dc66c-0d37-4614-b078-3ca3f77fe603-4a686b40>  - stopping container.
2024-10-21T00:46:11.991772502Z <taskqueue-527dc66c-0d37-4614-b078-3ca3f77fe603-4a686b40>  - container stopped.
2024-10-21T00:46:12.170090824Z Unmounting layer: /tmp/taskqueue-527dc66c-0d37-4614-b078-3ca3f77fe603-4a686b40/layer-0/merged
```

```
2024-10-21T00:45:41.750579111Z <taskqueue-527dc66c-0d37-4614-b078-3ca3f77fe603-4a686b40>  - container still running: 489898f6b7814f85
2024-10-21T00:45:41.860215399Z <taskqueue-527dc66c-0d37-4614-b078-3ca3f77fe603-4a686b40>  - unable to update container state: redislock: not obtained
```


Basically after a single failure we're considering the container lost
and killing it. This gives us a little bit more leeway.
  • Loading branch information
luke-lombardi authored Oct 21, 2024
1 parent b6c9e64 commit 96f01b0
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion pkg/types/scheduler.go
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ const (
ContainerResourceUsageEmissionInterval time.Duration = 3 * time.Second
)
const ContainerStateTtlSWhilePending int = 600
const ContainerStateTtlS int = 60
const ContainerStateTtlS int = 120
const WorkspaceQuotaTtlS int = 600

type ErrContainerStateNotFound struct {
Expand Down

0 comments on commit 96f01b0

Please sign in to comment.