Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jobs: Add every_hour and every_day extentions to the cron expression. #76373

Closed
wants to merge 1 commit into from

Conversation

miretskiy
Copy link
Contributor

@miretskiy miretskiy commented Feb 10, 2022

Cron expressions used by scheduled jobs support extensions such as
@hourly or @daily. These cron expressions execute cron schedule
on top of every hour or at midnight respectively.

Spreading out cron schedules execution times might be beneficial.
Even though, ultimately, the responsiblity of not overloading the system
falls on the system itself (e.g. jobs system slowing down execution, or
perhaps an admission control type system), having an easy way to spread
out the schedule execution times might be beneficial.

This PR adds support to two extension to the crontab mnemonics:
@every_hour which executes the schedule every hour on a randmly
chosen minute, and @every_day which executes the schedule every day
at a randomly chosen time.

Release Notes (enterprise change): Extend scheduled jobs to support
cront tab expression extension to specify job execution at random
minute every hour (@every_hour), or at a random time every day (@every_day).

…ion.

Cron expressions used by scheduled jobs support extensions such as
`@hourly` or `@daily`.  These cron expressions execute cron schedule
on top of every hour or at midnight respectively.

Spreading out cron schedules execution times might be beneficial.
Even though, ultimately, the responsiblity of not overloading the system
falls on the system itself (e.g. jobs system slowing down execution, or
perhaps an admission control type system), having an easy way to spread
out the schedule execution times might be beneficial.

This PR adds support to two extension to the crontab mnemonics:
`@every_hour` which executes the schedule every hour on a randmly
chosen minute, and `@every_day` which executes the schedule every day
at a randomly chosen time.

Release Notes (enterprise change): Extend scheduled jobs to support
cront tab expression extension to specify job execution at random
minute every hour (`@every_hour`), or at a random time every day (`@every_day`).
@miretskiy miretskiy requested review from dt and otan February 10, 2022 15:06
@miretskiy miretskiy requested a review from a team as a code owner February 10, 2022 15:06
@miretskiy miretskiy requested review from shermanCRL and removed request for a team February 10, 2022 15:06
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@miretskiy miretskiy changed the title jobs: Add every_hour and every_day extentions to the cron express… jobs: Add every_hour and every_day extentions to the cron expression. Feb 10, 2022
@shermanCRL
Copy link
Contributor

Hmm, intuitively as a user I am not sure I’d want to say “any time within the hour is fine”. I assume people want some predictability. And, FWIW, I find the syntax a little too magical.

If the problem to solve is jitter or randomness, the goal being to not have crazy spikes, perhaps there is a syntax for that? This looks like an example: https://stackoverflow.com/questions/9049460/cron-jobs-and-random-times-within-given-hours. I think it’d be good if the cron expression explicitly indicates randomness.

Or, the user can take responsibility for spreading things out themselves: https://stackoverflow.com/questions/30207888/cron-expression-for-every-hour-starting-from-specific-time

@HonoreDB
Copy link
Contributor

I'm in favor of random/unspecified time being the default because syncing all the jobs up is such a common foot-gun, and it's not always trivial to do the randomization on the application side.

@HonoreDB
Copy link
Contributor

Apparently this has been on my mind for at least 6 years. javan/whenever#615

@dt
Copy link
Member

dt commented Feb 10, 2022

Another option could be to allow just the operator to express their desired recurrence interval instead of when it should run (i.e. what a cron string says) and then add jitter to that. e.g. @every(24h,30m), and use that to compute NextRun.

@miretskiy
Copy link
Contributor Author

Hmm, intuitively as a user I am not sure I’d want to say “any time within the hour is fine”. I assume people want some predictability. And, FWIW, I find the syntax a little too magical.

I don't know if it's more or less magical than @hourly or @daily or */5 3-7 * * * which we already support.

If the problem to solve is jitter or randomness, the goal being to not have crazy spikes, perhaps there is a syntax for that? This looks like an example: https://stackoverflow.com/questions/9049460/cron-jobs-and-random-times-within-given-hours. I think it’d be good if the cron expression explicitly indicates randomness.

Again, randomness is not a "standard" extension. SO, how we call it or format it is really up to us.

Or, the user can take responsibility for spreading things out themselves: https://stackoverflow.com/questions/30207888/cron-expression-for-every-hour-starting-from-specific-time

That was our intension; and that's why it hasn't been implemented so far. Honestly, I don't have too strong of an opinion here. Clearly, when creating new TTL job @otan could do basically what this PR does and pick a random minute.
The users can also do that -- but... why...? we support @hourly, it's not a scratch to imagine that every hour might be useful.

But, as @HonoreDB, I do feel it's a foot gun that's just so easy to trigger (and yes, the real solution is admission control).

@miretskiy
Copy link
Contributor Author

Another option could be to allow just the operator to express their desired recurrence interval instead of when it should run (i.e. what a cron string says) and then add jitter to that. e.g. @every(24h,30m), and use that to compute NextRun.

What's the second arg? Like a range for jitter?

@miretskiy
Copy link
Contributor Author

Hmm, intuitively as a user I am not sure I’d want to say “any time within the hour is fine”. I assume people want some predictability. And, FWIW, I find the syntax a little too magical.

I don't know if it's more or less magical than @hourly or @daily or */5 3-7 * * * which we already support.

@shermanCRL Also, if they users feel strongly when their schedule runs, they already can do that -- using @hourly or expressing their exact intent with a full crontab syntax. And that's the default behavior.

@dt
Copy link
Member

dt commented Feb 10, 2022

What's the second arg? Like a range for jitter?

yep. Could do something like @every(24h~30m) too, as long as we can split it then pass both halves to ParseDuration.

@miretskiy
Copy link
Contributor Author

yep. Could do something like @every(24h~30m) too, as long as we can split it then pass both halves to ParseDuration.

This is probably a bit more involved. Like: what's the starting time of such schedule?
Perhaps something like @every_25h (or @every_3m) define the target frequency of the schedule.
And the start time is the current time -- and thus, you kinda add jitter automatically. The only thing is, the
execution times will vary because we haven't fixed the start time... But maybe it's a feature?

@otan
Copy link
Contributor

otan commented Feb 11, 2022

I think this does create a "natural" jitter, which is what this kind of solves, but doesn't work so well for cases like, e.g., restoring from a dump, where all tables/crons would be created upfront. maybe @randomly_every_hour? hah!

Could do something like @every(24h~30m)

i think is what was being aimed for, but agreed it's not as straightforward.

@otan otan removed their request for review February 16, 2022 22:50
@shermanCRL shermanCRL removed their request for review January 24, 2023 16:18
@miretskiy miretskiy closed this Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants