Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sequence counters in cron helpers to limit execution time #144

Open
LegalizeAdulthood opened this issue Oct 25, 2024 · 0 comments
Open

Comments

@LegalizeAdulthood
Copy link
Owner

Even with cron scheduling the housekeeping tasks infrequently, with a large number of documents we can perform many, many thousands of HTTP HEAD requests. This results in a long bulk execution time for a cron job. A limit on cron job execution time should be set to perform scanning housekeeping in chunks.

To support incremental scanning, use a sequence id associated with each entity that results in a network request. Associate the sequence number with a timestamp. When an entity is scanned, it's sequence number is set to the current scan time stamp. When a scanning time limit has been exceeded, the cron helper will exit.

On the next scan, a new sequence number/timestamp pair is created. Scanning preference is given to to items with a lower sequence number. As chunks of items are scanned, the process will wrap around to the least recently scanned items.

Some care must be taken to bootstrap the sequence/timestamp housekeeping to prefer scanning items first that have no timestamp (e.g. have never been scanned) vs. those items that have been scanned but whose timestamp is the oldest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant