Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Celery Beat does not dispatch tasks properly after a process restart when TIME_ZONE setting is not UTC and USE_TZ=False #798

Open
wencakisa opened this issue Aug 28, 2024 · 4 comments

Comments

@wencakisa
Copy link

wencakisa commented Aug 28, 2024

Summary:

We've had a production issue where whenever we deploy new code (which ultimately leads to restart of the processes, including the beat) - the scheduled periodic tasks do not dispatch for exactly 1 hour after this restart. After that - they begin as scheduled and we have no delays after that (until the next deploy, unfortunately).

  • Celery Version: 5.4.0
  • Celery-Beat Version: 2.7.0

Exact steps to reproduce the issue:

  1. Set USE_TZ=False in your Django settings
  2. Change the time zone configuration to use Europe/London

Detailed information

We found the exact root cause of this and it is a complex combination of:

  1. The Django timezone settings
  2. The last_run_at field of the PeriodicTask model
  3. The Celery code that determines whether the task "is before the last run"

So, we have the following Django settings:

USE_TZ = False
TIME_ZONE = "Europe/London"
CELERY_TIME_ZONE = TIME_ZONE  # "Europe/London"

This leads to the following:

  • Datetime objects that are passed within the Django app are timezone-naive
  • The datetime objects are stored in the DB in the London timezone

Something to note here - London & UTC are even, but due to DST - they now have a one hour difference:

image

I have a periodic task that runs every minute. If I start the task for the first time - everything goes as expected and the task is dispatch every minute.

However, if I kill the beat process and run it again - the task is not dispatched until exactly 1 hour and 1 minute after that.

image

We found that this issue is because the last_run_at field in the PeriodicTask objects is saved as timezone-naive (which is expected, because USE_TZ is set to False) - the beat process does not properly convert it to a London timezone when checking the last run time, rather than converting it to UTC:

image

image

Here's the exact code that does that (https://github.com/celery/celery/blob/f3a2cf45a69b443cac6c79a5c85583c8bd91b0a3/celery/schedules.py#L470-L473):

def is_before_last_run(year, month, day):
    return self.maybe_make_aware(datetime(year, month, day)) < last_run_at

where maybe_make_aware makes the passed datetime object timezone-aware, but defaults to UTC, rather than the specified timezone, which leads to the issue (https://github.com/celery/celery/blob/f3a2cf45a69b443cac6c79a5c85583c8bd91b0a3/celery/utils/time.py#L308):

def maybe_make_aware(dt, tz=None):
    """Convert dt to aware datetime, do nothing if dt is already aware."""
    if is_naive(dt):
        dt = to_utc(dt)
        return localize(
            dt, timezone.utc if tz is None else timezone.tz_or_local(tz),
        )
    return dt

This is most probably the root cause for all of these:

The same issue represents itself in the opposite way when you use a timezone that is "before" UTC, for example - America/New_York (which is currently 4 hours before UTC).

If you do that - the task is dispatched immediately after the process starts, no matter that you said it should be run every minute. Which makes sense, because the same comparison functions are executed, but only aimed toward UTC.

The fix we found for our case is to reset last_run_at to None each time we do a new deployment - this way, there is no past datetime to compare with, thus the tasks begin execution as normal. After that, further scheduled executions are correct.

PeriodicTask.objects.update(last_run_at=None)

Which is actually what the documentation suggests if you do timezone configuration changes. But in our case - we did not change the configuration at all, it is the same from the start. We need to do this in order to "fake" the Beat process that this task has never run before, ultimately making the tasks dispatch as expected.

If you use USE_TZ=True and TIME_ZONE="UTC" - you won't have this issue.
However, changing our settings to these ☝🏻 default values is impossible at this moment, thus I think this should be carefully thought and possibly issue a fix.

The fix itself should be relatively easy - when comparing datetime with last_run_at, observe the configured timezone and make the passed object aware to the relevant timezone, not strictly UTC.

@ChanXing2023
Copy link

USE_TZ=True

@ThankCat
Copy link

ThankCat commented Sep 8, 2024

I don't understand why developers ignore such questions

@Nusnus
Copy link
Member

Nusnus commented Sep 8, 2024

I don't understand why developers ignore such questions

We do not ignore anything my friend.
We just have other priorities.

Contributing a possible solution is a wonderful method to get more attention/prioritization.

@ChanXing2023
Copy link

You can learn from this
#801

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants