You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Even with cron scheduling the housekeeping tasks infrequently, with a large number of documents we can perform many, many thousands of HTTP HEAD requests. This results in a long bulk execution time for a cron job. A limit on cron job execution time should be set to perform scanning housekeeping in chunks.
To support incremental scanning, use a sequence id associated with each entity that results in a network request. Associate the sequence number with a timestamp. When an entity is scanned, it's sequence number is set to the current scan time stamp. When a scanning time limit has been exceeded, the cron helper will exit.
On the next scan, a new sequence number/timestamp pair is created. Scanning preference is given to to items with a lower sequence number. As chunks of items are scanned, the process will wrap around to the least recently scanned items.
Some care must be taken to bootstrap the sequence/timestamp housekeeping to prefer scanning items first that have no timestamp (e.g. have never been scanned) vs. those items that have been scanned but whose timestamp is the oldest.
The text was updated successfully, but these errors were encountered:
Even with cron scheduling the housekeeping tasks infrequently, with a large number of documents we can perform many, many thousands of HTTP HEAD requests. This results in a long bulk execution time for a cron job. A limit on cron job execution time should be set to perform scanning housekeeping in chunks.
To support incremental scanning, use a sequence id associated with each entity that results in a network request. Associate the sequence number with a timestamp. When an entity is scanned, it's sequence number is set to the current scan time stamp. When a scanning time limit has been exceeded, the cron helper will exit.
On the next scan, a new sequence number/timestamp pair is created. Scanning preference is given to to items with a lower sequence number. As chunks of items are scanned, the process will wrap around to the least recently scanned items.
Some care must be taken to bootstrap the sequence/timestamp housekeeping to prefer scanning items first that have no timestamp (e.g. have never been scanned) vs. those items that have been scanned but whose timestamp is the oldest.
The text was updated successfully, but these errors were encountered: