Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

race condition when closing a task #11

Open
TomasTomecek opened this issue Apr 19, 2016 · 0 comments
Open

race condition when closing a task #11

TomasTomecek opened this issue Apr 19, 2016 · 0 comments

Comments

@TomasTomecek
Copy link
Contributor

worker log:

2016-04-19 09:46:42 [INFO    ] Waking up task #24993 [...].
2016-04-19 09:46:42 [INFO    ] Task #24993 exited with status 0
2016-04-19 09:46:42 [INFO    ] Task has finished: 24993
2016-04-19 09:46:42 [INFO    ] kill_group: Process (pgrp 9253) exited
2016-04-19 09:47:03 [WARNING ] Closing interrupted tasks: [24993]
2016-04-19 09:47:03 [ERROR   ] <Fault 1: Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/kobo/django/xmlrpc/dispatcher.py", line 95, in _marshaled_dispatch
    response = self._dispatch(method, params)
  File "/usr/lib64/python2.6/SimpleXMLRPCServer.py", line 418, in _dispatch
    return func(*params)
  File "/usr/lib/python2.6/site-packages/kobo/hub/decorators.py", line 24, in _new_func
    return func(request, *args, **kwargs)
  File "/usr/lib/python2.6/site-packages/kobo/hub/xmlrpc/worker.py", line 111, in interrupt_tasks
    task.interrupt_task(recursive=True)
  File "/usr/lib/python2.6/site-packages/django/db/transaction.py", line 371, in inner
    return func(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/kobo/hub/models.py", line 737, in interrupt_task
    task.interrupt_task(recursive=True)
  File "/usr/lib/python2.6/site-packages/django/db/transaction.py", line 371, in inner
    return func(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/kobo/hub/models.py", line 733, in interrupt_task
    raise Exception("Cannot interrupt task %d, state is %s" % (self.id, self.state))
Exception: Cannot interrupt task 24994, state is 3

At 9:42, process of 24993 finished with 0 so kobo correctly closed the task and cleaned up:

2016-04-19 09:46:42 [INFO    ] Task #24993 exited with status 0
2016-04-19 09:46:42 [INFO    ] Task has finished: 24993
2016-04-19 09:46:42 [INFO    ] kill_group: Process (pgrp 9253) exited

I believe that here comes the race condition. In next cycle, we can see

2016-04-19 09:47:03 [WARNING ] Closing interrupted tasks: [24993]

which is incorrect since task is already closed. When checking code:

if task_info["state"] == TASK_STATES["OPEN"] and task_info["id"] not in self.pid_dict:
    # an interrupted task appears to be open, but running task manager doesn't track it in it's pid list
    # this happens after a power outage, for example
    interrupted_list.append(task_info["id"])
    finished_tasks.add(task_info["id"])
    continue

this condition is evaluated as true which means that the task is still open even though it's suppose to be closed (because it doesn't have an entry in pid_dict)

So, why it's not closed? Looking at code, this is how tasks are supposed to be closed

try:
    task.run()
except ShutdownException:
    ...

thread.stop()
if failed:
    hub.worker.fail_task(task.task_id, task.result)
else:
    hub.worker.close_task(task.task_id, task.result)

Since that code contains very little logging, it's hard to figure out what could possibly go wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant