Uncatched exception "Connection refused" #28

laurentguilbert · 2014-01-13T15:22:46Z

Hello,

First of all thanks for the great work, JobTastic is really neat and handy.

I tried running delay_or_eager and delay_or_fail with a stopped broker (rabbitMQ) but it seems that the resulting error isn't catched.

What I get is a 500 "[Errno 111] Connection refused".

I'm working on a very simple test case with the bare minimum. My setup is Celery (3.0) along with JobTastic (0.2.2) and amqp (1.0.13).

I figure this shouldn't be a hard fix so let me know if you are aware of this issue or if I'm doing something wrong here. I'll be glad to contribute.

winhamwr · 2014-01-13T17:21:00Z

Hello Laurent,

Thanks for creating an issue. For handling connection errors, we're using _get_possible_broker_errors_tuple to try and piggyback on Celery's own error detection code. I'm surprised that Celery's connection_errors doesn't contain that error, hrm.

This is definitely something I'd like to have fixed, so contributions (or any insight in to the problem) would bery very-much appreciated. The goal of delay_or_fail is definitely to protect you from those kind of exceptions.

Thanks
-Wes

laurentguilbert · 2014-01-13T17:57:37Z

Thanks for your quick answer.

I'll try to fix it and send you a pull request as soon as I can.

laurentguilbert · 2014-01-14T15:30:30Z

After a while I figured out what's happening.

Inside task.run and task.on_success the state of the task is updated whether it's eager or not.

So if your result backend is not running delay_or_eager will fail and never run locally.

I figure a simple fix would be to check against task_is_eager inside task.run and task.on_success (this one seems to be deprecated along with delay_or_run).

Let me know what you think and if you agree I'll submit a pull request !

winhamwr · 2014-01-14T17:41:22Z

Inside task.run and task.on_success the state of the task is updated whether it's eager or not.

Ooooh. That makes sense, and I hadn't fully considered the case of a broken result backend. delay_or_fail and delay_or_eager are designed to work around broken task brokers. Hrm.

So one way of handling this would be to document that even if you use delay_or_FOO, if your result backend is down and your task tries to record progress or record a result, you're still going to get an Exception. I think that's very reasonable. Any task that actually needs to return a result can't really be count as having run if we don't have a way of storing/retrieving that result, so there's no way to gracefully fail.

One necessary change for delay_or_eager runs where the result backend is down: we shouldn't throw an exception in the on_success handler. We should catch the exception ourselves, log a warning message and just keep going.

And also in delay_or_fail, the simulate_async_error call should catch that backend exception, log a warning message and keep going.

For both of those cases, we need to find something equivalent to _get_possible_broker_errors_tuple, but that finds result backend errors. Hopefully Celery internals have something like this?

Does that make sense to you?

Thanks
-Wes

winhamwr · 2014-01-14T17:43:09Z

Oops, and you're also totally right about this code in task.run also needing to recover from a broken result backend.

laurentguilbert · 2014-01-15T08:30:25Z

Well it seems that _get_possible_broker_errors_tuple catches all the exceptions I encountered for a broken result backend. In fact I think IOError should cover it.

Is it really necessary to record an eager task's state ? By design celery's task.apply requires no connection at all.

delay_or_fail will throw an exception at mark_as_failure though. This seems alright since you can't expect a coherent result here without a working backend. An alternative is to return a failed EagerResult with the backend exception, not sure about this.

winhamwr · 2014-01-16T19:44:07Z

Hi Laurent,

By design celery's task.apply requires no connection at all.

I think that was a good design choice in Celery itself, but for the kind of user-facing tasks that jobtastic deals with (where we care about progress and almost always also about the result), it adds complexity. A JobtasticTask goes to great lengths to make a task behave the same, regardless of the way it's run, which means storing the results in the backend. That allows your other code (probably something in a rendered template that might do AJAX requests) to not care about whether we were in the delay case, the eager case or the fail case.

I think that the biggest problem you've identified is that someone using Jobtastic just for caching, thundering heard avoidance and the fault-tolerance helpers might have a task whose results they don't care about at all. In their case, breaking the task on our internal update_progress call hurts them when they could otherwise have gone on along their merry way with a broken result backend. I think we should recover from a result backend error around that call, automatically.

delay_or_fail will throw an exception at mark_as_failure though.

I think that's another place where we should catch the exception, log a warning and keep going. delay_or_fail is great for something like a report I'm running for a user. If there's something wrong with my broker or backend, I don't want to run that report in my web process. It's better to just give the user an error message and have them try again later. Right now, if it's the result backend that's broken, we don't meet that goal. There will be an exception thrown in the web process.

The other necessary fix is:

One necessary change for delay_or_eager runs where the result backend is down: we shouldn't throw an exception in the on_success handler. We should catch the exception ourselves, log a warning message and just keep going.

Does that make sense to you? Basically, I think we should try to recover wherever it makes sense.

Thanks
-Wes

laurentguilbert · 2014-01-20T09:16:44Z

Yes, makes perfect sense.

edwardotis · 2014-03-27T21:42:02Z

Hi all,

Any updates on this issue? I ran into the delay_or_eager bug recently with Celery 3.1

Thanks,

Ed

winhamwr · 2014-03-27T21:51:08Z

Hello Ed,

No updates so far. Waiting for either a pull request or some magical time fairy to let me write it :)

I still haven't upgraded to Celery 3.1 on my major project, so until that happens, I'm unlikely to be the one to write the pull request.

Thanks
-Wes

edwardotis · 2014-04-11T03:55:25Z

Gotcha, thanks for the update, Wes.

wengole pushed a commit to wengole/nasman that referenced this issue Mar 4, 2015

WiP - Jobtastic is broken with Clery 3.1 see PolicyStat/jobtastic#28

2b2c980

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uncatched exception "Connection refused" #28

Uncatched exception "Connection refused" #28

laurentguilbert commented Jan 13, 2014

winhamwr commented Jan 13, 2014

laurentguilbert commented Jan 13, 2014

laurentguilbert commented Jan 14, 2014

winhamwr commented Jan 14, 2014

winhamwr commented Jan 14, 2014

laurentguilbert commented Jan 15, 2014

winhamwr commented Jan 16, 2014

laurentguilbert commented Jan 20, 2014

edwardotis commented Mar 27, 2014

winhamwr commented Mar 27, 2014

edwardotis commented Apr 11, 2014

Uncatched exception "Connection refused" #28

Uncatched exception "Connection refused" #28

Comments

laurentguilbert commented Jan 13, 2014

winhamwr commented Jan 13, 2014

laurentguilbert commented Jan 13, 2014

laurentguilbert commented Jan 14, 2014

winhamwr commented Jan 14, 2014

winhamwr commented Jan 14, 2014

laurentguilbert commented Jan 15, 2014

winhamwr commented Jan 16, 2014

laurentguilbert commented Jan 20, 2014

edwardotis commented Mar 27, 2014

winhamwr commented Mar 27, 2014

edwardotis commented Apr 11, 2014