Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

beat fails to start with UnicodeDecodeError in celery_schedulers/redis_scheduler #227

Open
jaraco opened this issue Aug 25, 2018 · 9 comments

Comments

@jaraco
Copy link
Contributor

jaraco commented Aug 25, 2018

Traceback (most recent call last):
  File "/app/.heroku/venv/bin/vr_beat", line 17, in <module>
    sys.exit(start_celerybeat())
  File "/app/.heroku/venv/lib/python3.6/site-packages/vr/server/commands.py", line 52, in start_celerybeat
    management.call_command('celerybeat', pidfile=None)
  File "/app/.heroku/venv/lib/python3.6/site-packages/django/core/management/__init__.py", line 120, in call_command
    return command.execute(*args, **defaults)
  File "/app/.heroku/venv/lib/python3.6/site-packages/djcelery/management/base.py", line 75, in execute
    super(CeleryCommand, self).execute(*args, **options)
  File "/app/.heroku/venv/lib/python3.6/site-packages/django/core/management/base.py", line 445, in execute
    output = self.handle(*args, **options)
  File "/app/.heroku/venv/lib/python3.6/site-packages/newrelic/api/function_trace.py", line 109, in literal_wrapper
    return wrapped(*args, **kwargs)
  File "/app/.heroku/venv/lib/python3.6/site-packages/djcelery/management/commands/celerybeat.py", line 24, in handle
    beat.run(*args, **options)
  File "/app/.heroku/venv/lib/python3.6/site-packages/celery/bin/beat.py", line 79, in run
    return beat().run()
  File "/app/.heroku/venv/lib/python3.6/site-packages/celery/apps/beat.py", line 83, in run
    self.start_scheduler()
  File "/app/.heroku/venv/lib/python3.6/site-packages/celery/apps/beat.py", line 104, in start_scheduler
    c.reset(self.startup_info(beat)))))
  File "/app/.heroku/venv/lib/python3.6/site-packages/celery/apps/beat.py", line 125, in startup_info
    scheduler = beat.get_scheduler(lazy=True)
  File "/app/.heroku/venv/lib/python3.6/site-packages/celery/beat.py", line 507, in get_scheduler
    lazy=lazy)
  File "/app/.heroku/venv/lib/python3.6/site-packages/celery/utils/imports.py", line 53, in instantiate
    return symbol_by_name(name)(*args, **kwargs)
  File "/app/.heroku/venv/lib/python3.6/site-packages/celery_schedulers/redis_scheduler.py", line 34, in __init__
    self.entries = pickle.loads(pickled)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)
lxc-start: tools/lxc_start.c: main: 366 The container failed to start.
@jaraco
Copy link
Contributor Author

jaraco commented Aug 25, 2018

Problem is pickles aren't compatible from Python 2.7 to 3.6:

celery-schedulers master $ python2.7
>>> import pickle
>>> import datetime
>>> dat = pickle.dumps(datetime.datetime.now(), protocol=0)
>>> dat
"cdatetime\ndatetime\np0\n(S'\\x07\\xe2\\x08\\x18\\x16 \\x1c\\x0eP%'\np1\ntp2\nRp3\n."
celery-schedulers master $ python3
>>> import pickle
>>> dat = b"cdatetime\ndatetime\np0\n(S'\\x07\\xe2\\x08\\x18\\x16 \\x1c\\x0eP%'\np1\ntp2\nRp3\n."
>>> pickle.loads(dat)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)

@jaraco
Copy link
Contributor Author

jaraco commented Aug 25, 2018

Or more concisely:

$ python2.7 -c "import sys, pickle, datetime; pickle.dump(datetime.datetime.now(), sys.stdout)" | python -c "import sys, pickle; print(pickle.load(sys.stdin.buffer))"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)

@jaraco
Copy link
Contributor Author

jaraco commented Aug 25, 2018

It's an open bug in Python.

@jaraco
Copy link
Contributor Author

jaraco commented Aug 25, 2018

The workaround for datetime is to use encoding='bytes':

celery-schedulers master $ python2.7 -c "import sys, pickle, datetime; pickle.dump(datetime.datetime.now(), sys.stdout)" | python -c "import sys, pickle; print(pickle.load(sys.stdin.buffer, encoding='bytes'))"
2018-08-24 22:41:50.747077

Unfortunately, if you do that with the celerybeat schedule pickle, another issue arises:

celery-schedulers master $ rwt -- -i read-redis.py
Collecting celery<4
  Using cached https://files.pythonhosted.org/packages/b1/27/cd2e097208f60a0512b8b3201f8bd93c1db359c3fa35cab2f405aa1d45a7/celery-3.1.26.post2-py2.py3-none-any.whl
Collecting redis
  Using cached https://files.pythonhosted.org/packages/3b/f6/7a76333cf0b9251ecf49efff635015171843d9b977e4ffcf59f9c4428052/redis-2.10.6-py2.py3-none-any.whl
Collecting pytz>dev (from celery<4)
  Using cached https://files.pythonhosted.org/packages/30/4e/27c34b62430286c6d59177a0842ed90dc789ce5d1ed740887653b898779a/pytz-2018.5-py2.py3-none-any.whl
Collecting billiard<3.4,>=3.3.0.23 (from celery<4)
Collecting kombu<3.1,>=3.0.37 (from celery<4)
  Using cached https://files.pythonhosted.org/packages/4d/0e/73ac0df8949bdb6ad81812110962b38e6bb4243c3bfdad2d9e90b5509e2e/kombu-3.0.37-py2.py3-none-any.whl
Collecting amqp<2.0,>=1.4.9 (from kombu<3.1,>=3.0.37->celery<4)
  Using cached https://files.pythonhosted.org/packages/ed/09/314d2788aba0aa91f2578071a6484f87a615172a98c309c2aad3433da90b/amqp-1.4.9-py2.py3-none-any.whl
Collecting anyjson>=0.3.3 (from kombu<3.1,>=3.0.37->celery<4)
Installing collected packages: pytz, billiard, amqp, anyjson, kombu, celery, redis
Successfully installed amqp-1.4.9 anyjson-0.3.3 billiard-3.3.0.23 celery-3.1.26.post2 kombu-3.0.37 pytz-2018.5 redis-2.10.6
Traceback (most recent call last):
  File "read-redis.py", line 10, in <module>
    doc = pickle.loads(data, encoding='bytes')
  File "/var/folders/c6/v7hnmq453xb6p2dbz1gqc6rr0000gn/T/rwt-7jvgcc5w/celery/schedules.py", line 372, in __init__
    self.hour = self._expand_cronspec(hour, 24)
  File "/var/folders/c6/v7hnmq453xb6p2dbz1gqc6rr0000gn/T/rwt-7jvgcc5w/celery/schedules.py", line 420, in _expand_cronspec
    min=min_, max=max_ - 1 + min_, value=number))
ValueError: Invalid crontab pattern. Valid range is 0-23. '42' was found.

That's because '*' is getting decoded as b'*' or bytes([42]). So the error message is really misleading. It didn't find '42'. It found the byte and thought it was a number and not a text string (not unreasonably).

@jaraco
Copy link
Contributor Author

jaraco commented Aug 25, 2018

The resolution of the upstream ticket is "don't use pickle". So maybe that's the solution.

It does appear that using jsonpickle works around the issue:

celery-schedulers master $ python2.7 -m rwt -q jsonpickle -- -c "import sys, jsonpickle, datetime; sys.stdout.write(jsonpickle.dumps(datetime.datetime.now
.......................... ()))" > pickle.json
celery-schedulers master $ cat pickle.json | rwt -q jsonpickle -- -c "import sys, jsonpickle; print(jsonpickle.loads(sys.stdin.read()))"
2018-08-24 22:59:37.790738

Maybe next I'll test to see if the whole document in redis could be serialized and deserialized by jsonpickle, but not tonight.

@jaraco
Copy link
Contributor Author

jaraco commented Aug 25, 2018

I tried serializing the objects on Python 2 and deserializing on Python 3 using jsonpickle, and while the operation succeeded, it did not provide an accurate representation on Python 3:

>>> doc
{'clean_old_builds': <list_iterator object at 0x10c0c33c8>, 'filesystem_scooper': <list_iterator object at 0x10c9bb0b8>, 'procs_scooper': <list_iterator object at 0x10c0c3400>, 'test_all_the_things': <list_iterator object at 0x10c0c34a8>}
>>> doc['clean_old_builds']
<list_iterator object at 0x10c0c33c8>
>>> list(doc['clean_old_builds'])
[('name', 'clean_old_builds'), ('schedule', <crontab: 0 12 * * * (m/h/d/dM/MY)>), ('options', {'expires': 120}), ('total_run_count', 1197), ('last_run_at', datetime.datetime(2018, 8, 25, 12, 0, 0, 96747)), ('app', None), ('args', ()), ('kwargs', {}), ('task', 'vr.server.tasks.clean_old_builds')]

I'm now recommending that to upgrade the beat proc from Python 2 to Python 3 that the celerybeat_schedule data be flushed.

@lbolla
Copy link
Contributor

lbolla commented Aug 27, 2018

If we go with the option of "flushing" the existing messages (+1 from me), we should also consider dropping pickle altogether and use a more portable serialization format (json, msgpack, etc.)

@jaraco
Copy link
Contributor Author

jaraco commented Aug 27, 2018

I found the cause for the jsonpickle not working in jsonpickle/jsonpickle#216. If that bug is repaired, we probably could rely on jsonpickle for a more portable, json-based serialization that works across Pythons. We could also write a custom handler for ScheduleEntry objects that works around the bug in jsonpickle 0.9.6.

When you suggest using JSON, I suspect you mean explicitly hand-encoding each type of object in the schedule entries. I'd worry that any such approach would be brittle, as the serialization would have to keep tabs on the exact structure of the objects being stored (which could change with upstream changes).

The jsonpickle project seeks to solve this issue generically, by providing a robust and human-readable serialization for any serializable Python types. It would be less likely to break if there were changes in the object structure.

the option of "flushing" the existing messages

I'm actually unsure if this is an option. I was presuming that these schedules get created automatically, but I'm not sure they do. We'd have to verify that they do before flushing them. What I could do is save the redis pickle, restart the beat proc, and see if the schedules come back on their own... and if not, restore the pickle.

@jaraco
Copy link
Contributor Author

jaraco commented Aug 27, 2018

I see the definitions in the settings CELERYBEAT_SCHEDULE, so I'm pretty sure they'll recreate themselves.

I stopped the Py2 version, deleted the celerybeat_schedule, started the py3 version, then confirmed it had recreated the schedule:

>>> db = redis.StrictRedis(yougov_pubsub, db=1)
>>> db.get('celerybeat_schedule')
b"(dp1\nS'procs_scooper'\np2\nccelery.beat\nScheduleEntry\np3\n(g2\nS'vr.server.tasks.procs_scooper'\np4\ncdatetime\ndatetime\np5\n(S'\\x07\\xe2\\x08\\x1b\\r\\x00\\x00\\x01cS'\ntRp6\nI30132\nccelery.schedules\ncrontab\np7\n(S'*/30'\np8\nS'*'\nS'*'\nS'*'\nS'*'\ntRp9\n(t(dp10\n(dp11\nS'expires'\np12\nI120\nstRp13\nsS'test_all_the_things'\np14\ng3\n(g14\nS'vr.server.tasks.uptest_all_procs'\np15\ng5\n(S'\\x07\\xe2\\x08\\x1b\\x0c3\\n\\n\\xacT'\ntRp16\nI28724\nccelery.schedules\nschedule\np17\n(cdatetime\ntimedelta\np18\n(I0\nI3600\nI0\ntRp19\nI00\nNtRp20\n(tg10\n(dp21\ng12\nI120\nstRp22\nsS'filesystem_scooper'\np23\ng3\n(g23\nS'vr.server.tasks.filesystem_scooper'\np24\ng5\n(S'\\x07\\xe2\\x08\\x1b\\x0c\\x00\\x00\\x01l\\xba'\ntRp25\nI3768\ng7\n(I0\nS'*/4'\np26\nS'*'\nS'*'\nS'*'\ntRp27\n(tg10\n(dp28\ng12\nI120\nstRp29\nsS'clean_old_builds'\np30\ng3\n(g30\nS'vr.server.tasks.clean_old_builds'\np31\ng5\n(S'\\x07\\xe2\\x08\\x1b\\x0c\\x00\\x00\\x01|\\x00'\ntRp32\nI1199\ng7\n(I0\nI12\nS'*'\nS'*'\nS'*'\ntRp33\n(tg10\n(dp34\ng12\nI120\nstRp35\ns."
>>> db.delete('celerybeat_schedule')
1
>>> db.get('celerybeat_schedule')
b'\x80\x03}q\x00(X\x12\x00\x00\x00filesystem_scooperq\x01ccelery.beat\nScheduleEntry\nq\x02(h\x01X"\x00\x00\x00vr.server.tasks.filesystem_scooperq\x03cdatetime\ndatetime\nq\x04C\n\x07\xe2\x08\x1b\r\r$\x0bj}q\x05\x85q\x06Rq\x07K\x00ccelery.schedules\ncrontab\nq\x08(K\x00X\x03\x00\x00\x00*/4q\tX\x01\x00\x00\x00*q\nh\nh\ntq\x0bRq\x0c)}q\r}q\x0eX\x07\x00\x00\x00expiresq\x0fKxstq\x10Rq\x11X\r\x00\x00\x00procs_scooperq\x12h\x02(h\x12X\x1d\x00\x00\x00vr.server.tasks.procs_scooperq\x13h\x04C\n\x07\xe2\x08\x1b\r\r$\x0bj\x8cq\x14\x85q\x15Rq\x16K\x00h\x08(X\x04\x00\x00\x00*/30q\x17h\nh\nh\nh\ntq\x18Rq\x19)h\r}q\x1ah\x0fKxstq\x1bRq\x1cX\x13\x00\x00\x00test_all_the_thingsq\x1dh\x02(h\x1dX \x00\x00\x00vr.server.tasks.uptest_all_procsq\x1eh\x04C\n\x07\xe2\x08\x1b\r\r$\x0bj\xccq\x1f\x85q Rq!K\x00ccelery.schedules\nschedule\nq"cdatetime\ntimedelta\nq#K\x00M\x10\x0eK\x00\x87q$Rq%\x89N\x87q&Rq\')h\r}q(h\x0fKxstq)Rq*X\x10\x00\x00\x00clean_old_buildsq+h\x02(h+X \x00\x00\x00vr.server.tasks.clean_old_buildsq,h\x04C\n\x07\xe2\x08\x1b\r\r$\x0bj\xd4q-\x85q.Rq/K\x00h\x08(K\x00K\x0ch\nh\nh\ntq0Rq1)h\r}q2h\x0fKxstq3Rq4u.'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants