Use sequence counters in cron helpers to limit execution time #144

LegalizeAdulthood · 2024-10-25T16:28:58Z

Even with cron scheduling the housekeeping tasks infrequently, with a large number of documents we can perform many, many thousands of HTTP HEAD requests. This results in a long bulk execution time for a cron job. A limit on cron job execution time should be set to perform scanning housekeeping in chunks.

To support incremental scanning, use a sequence id associated with each entity that results in a network request. Associate the sequence number with a timestamp. When an entity is scanned, it's sequence number is set to the current scan time stamp. When a scanning time limit has been exceeded, the cron helper will exit.

On the next scan, a new sequence number/timestamp pair is created. Scanning preference is given to to items with a lower sequence number. As chunks of items are scanned, the process will wrap around to the least recently scanned items.

Some care must be taken to bootstrap the sequence/timestamp housekeeping to prefer scanning items first that have no timestamp (e.g. have never been scanned) vs. those items that have been scanned but whose timestamp is the oldest.

LegalizeAdulthood added enhancement performance labels Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use sequence counters in cron helpers to limit execution time #144

Use sequence counters in cron helpers to limit execution time #144

LegalizeAdulthood commented Oct 25, 2024

Use sequence counters in cron helpers to limit execution time #144

Use sequence counters in cron helpers to limit execution time #144

Comments

LegalizeAdulthood commented Oct 25, 2024