You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Occasionally I see error messages which look like this:
2024-05-20 16:02:33 STIMELA.kube ERROR: k8s API error while deleting PVC 'wsclean-temp-aff8e055'
──────────────────────────────────────────────────────────────────────────── detailed error report follows ─────────────────────────────────────────────────────────────────────────────
⚠ k8s API error while deleting PVC 'wsclean-temp-aff8e055'
├── ApiException: (404)
│ Reason: Not Found
│ HTTP response headers: HTTPHeaderDict({'Audit-Id': '999f22db-468f-4bc0-be89-d51867837c4d', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json',
│ 'X-Kubernetes-Pf-Flowschema-Uid': 'dbf2ccb2-e6d3-4c03-9de0-69dbbada21da', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'd7fa6cb1-b697-4e51-b97b-da0d669a1b6f', 'Date': 'Mon, 20 May
│ 2024 14:02:33 GMT', 'Content-Length': '246'})
│ HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"persistentvolumeclaims \"wsclean-temp-aff8e055\" not
│ found","reason":"NotFound","details":{"name":"wsclean-temp-aff8e055","kind":"persistentvolumeclaims"},"code":404}
│
│
├── kind: Status
├── apiVersion: v1
├── metadata:
├── status: Failure
├── message: persistentvolumeclaims "wsclean-temp-aff8e055" not found
├── reason: NotFound
├── details:
│ ├── name: wsclean-temp-aff8e055
│ └── kind: persistentvolumeclaims
└── code: 404
I do not yet have a consistent reproducer but I believe it may have something to do with the temporary volume being brought down automatically when the job pod is finished (perhaps because of the lifecycle: step configuration). Consequently, when Stimela attempts to do cleanup there is no PVC to delete and the above error occurs. For reference, the temporary volume was configured as follows:
Good point, if it's defined with lifecycle: step, the k8s backend's got no business trying to delete it at the end of the session, the wretched thing should have been dead and buried by then. I suspect it may be the case of the cleanup code being both overzealous and insufficiently clever. There's awkward edges in the k8s API where a resource continues to be returned by list_namespaced_xxx() even though it's in "Terminating" state, and one needs to jump through extra hoops to detect this condition. I see an attempted jump here, which may be insufficiently jump-y...
Occasionally I see error messages which look like this:
I do not yet have a consistent reproducer but I believe it may have something to do with the temporary volume being brought down automatically when the job pod is finished (perhaps because of the
lifecycle: step
configuration). Consequently, when Stimela attempts to do cleanup there is no PVC to delete and the above error occurs. For reference, the temporary volume was configured as follows:The text was updated successfully, but these errors were encountered: