post-mortem DHH crash #1712

chadwhitacre · 2013-12-03T20:19:31Z

Right after https://www.youtube.com/watch?v=p1E-svVd9Xc we crashed. Manifested as a drained Aspen thread pool:

CPU load isn't bad, response times spike.

chadwhitacre · 2013-12-03T20:31:21Z

If 40 were too many threads I would expect CPU load to reflect that. Spiking response times make me think that we hit some slow queries which backed up our threads. What other parameters are we missing? Network I/O on the box?

chadwhitacre · 2013-12-04T14:50:40Z

I'm not going to debug this. :-(

At least it's noted, alongside #1541.

zbynekwinkler · 2013-12-04T17:00:03Z

I think we should strive to fail faster (200s is way too long). I do not think it is possible with python threads - I do not know of a way how to cancel a running thread in python from "outside" after a certain time :(. Logging requests that took too much time would make sense in this context but this info is in some way already available in papertrail logs.

chadwhitacre closed this as completed Dec 4, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

post-mortem DHH crash #1712

post-mortem DHH crash #1712

chadwhitacre commented Dec 3, 2013

chadwhitacre commented Dec 3, 2013

chadwhitacre commented Dec 4, 2013

zbynekwinkler commented Dec 4, 2013

post-mortem DHH crash #1712

post-mortem DHH crash #1712

Comments

chadwhitacre commented Dec 3, 2013

chadwhitacre commented Dec 3, 2013

chadwhitacre commented Dec 4, 2013

zbynekwinkler commented Dec 4, 2013