Update job deletion logic + propagate update times #283

Garbett1 · 2024-03-03T18:10:08Z

The previous deletion logic would always check that the streams were zero-ed before deleting the job: 0b744c9#diff-030e4836d83d50a740c86eb47a2780e7a81f170cc5902279ff28ad5122081d37L593

This diff updates the logic to match the previous logic, and updates the LastUpdated time in a few places so it's a more accurate measurement.

… locations.

peterebden

Nice find!

Hamishpk

Would you mind swapping 573 and 574? Slight bug in the metrics where it starts recording before it locks. It's on my to do list but haven't found time :/

Hamishpk

Nice! Thanks Xander

Garbett1 · 2024-03-04T08:43:45Z

I actually refactored the metrics generation entirely in the next diff, so I think it would be relevant!

fische · 2024-03-04T09:34:58Z

mettle/api/api.go

+		s.mutex.Lock()
+		j := s.jobs[digest.Hash]
+		j.LastUpdate = time.Now()
+		s.mutex.Unlock()


Should we really update LastUpdate here? This is just sending an event to the client, not updating the actual job that we store in memory

I think that makes sense right? If I have an operation that is sending progress events, for each one of them I want to make sure that the jobs liveness property is being incremented.

If I have a job that takes 10 minutes to run for example, I don't want the LastUpdate to be set only at the start when the job is enqueued IMO?

Might be that we should make what LastUpdate actually represents clearer.

If I have a job that takes 10 minutes to run for example, I don't want the LastUpdate to be set only at the start when the job is enqueued IMO?

Why not? To me, this is why we have all those conditions in shouldDeleteJob:

if !j.Done && len(j.Streams) == 0 && timeSinceLastUpdate > expiryTime { return true } if !j.Done && timeSinceLastUpdate > 2*expiryTime { return true }

ie if the job is still in progress, we keep it for the expiration time (1h) and while it still has some streams. Otherwise we keep it for 2h.

I'm specifically thinking about this condition:

timeSinceLastUpdate := time.Since(j.LastUpdate) if j.Done && len(j.Streams) == 0 && timeSinceLastUpdate > retentionTime { // return true to delete return true }

If my job takes 10 minutes to run, and we don't update with progress events, once that job is done we immediately remove it (even if a client might attempt to reconnect)

My understanding is that we are retaining the jobs to smooth that out.

To me, once the job is done, so when we update the Done field in the job object, we should update the LastUpdate field as well, because that's an update to the job, right?

I do see Maximes point here. Streaming an update isn't really an update. Checking for listeners and setting the last update when we reassign a job should fix the issue without this bit anyway right?

fische · 2024-03-04T09:35:05Z

The stream check was only on the expiration, not on the deletion routine. Not sure if we're expecting any remaining stream once the action has been completed, but it's probably harmless, even safer, just in case.

* Change update logic to match previous behaviour, update time in a few locations. * Version + Changelog --------- Co-authored-by: Hamish Pitkeathly <[email protected]>

Garbett1 added 2 commits March 3, 2024 18:05

Change update logic to match previous behaviour, update time in a few…

1f0af1e

… locations.

Version + Changelog

f1474be

peterebden previously approved these changes Mar 3, 2024

View reviewed changes

Hamishpk reviewed Mar 4, 2024

View reviewed changes

Hamishpk previously approved these changes Mar 4, 2024

View reviewed changes

fische reviewed Mar 4, 2024

View reviewed changes

Merge branch 'master' into propagate-update-times

2fa6651

Hamishpk dismissed stale reviews from peterebden and themself via 2fa6651 March 6, 2024 16:47

Liu-ko approved these changes Mar 6, 2024

View reviewed changes

Hamishpk merged commit fa2b0e1 into thought-machine:master Mar 6, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update job deletion logic + propagate update times #283

Update job deletion logic + propagate update times #283

Garbett1 commented Mar 3, 2024

peterebden left a comment

Hamishpk left a comment

Hamishpk left a comment

Garbett1 commented Mar 4, 2024

fische Mar 4, 2024

Garbett1 Mar 4, 2024

fische Mar 4, 2024

Garbett1 Mar 4, 2024

fische Mar 6, 2024

Hamishpk Mar 6, 2024

fische commented Mar 4, 2024

Update job deletion logic + propagate update times #283

Update job deletion logic + propagate update times #283

Conversation

Garbett1 commented Mar 3, 2024

peterebden left a comment

Choose a reason for hiding this comment

Hamishpk left a comment

Choose a reason for hiding this comment

Hamishpk left a comment

Choose a reason for hiding this comment

Garbett1 commented Mar 4, 2024

fische Mar 4, 2024

Choose a reason for hiding this comment

Garbett1 Mar 4, 2024

Choose a reason for hiding this comment

fische Mar 4, 2024

Choose a reason for hiding this comment

Garbett1 Mar 4, 2024

Choose a reason for hiding this comment

fische Mar 6, 2024

Choose a reason for hiding this comment

Hamishpk Mar 6, 2024

Choose a reason for hiding this comment

fische commented Mar 4, 2024