You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We see a regular timeout happening when requesting raw content on github. The default timeout is set to 1000ms. In the discussion that started on Slack, we seem to have several tracks to "explore":
extend the timeout to x000ms
drawback: the timeout might still occur, probably less frequently, but this only push the problem somewhere else
implement a re-try in the Downloader - if timeout, just retry a few times before really failing
add an MD cache
Initial discussion transcript:
@kptdobe Github was to slow to answer (>1s):
GET https://raw.githubusercontent.com/davidnuescheler/lr-landing/d90a42b1babf33d430450e05cbd3dc1edc6b7135/index.md timed out after 1000 ms @rofe this is happening quite often... @stefan-guggisberg To be precise: It's Fastly's connect timeout, i.e. setting up a secure connection to raw.github.com took more than 1s, which is quite long. @MarquiseRosier This looks like dispatch territory? If it happens too often, maybe we can speed it up there. Perhaps using helix-fetch? (edited) @kptdobe I do not understand. The action triggering this error is
/helix-pages/51eb728d800d5307212e9f96a65a6f9ff7ec1e47/html
how would that be Fastly ?
Wherever it is, it happens "frequently", we should consider having a retry or something to handle that case. @stefan-guggisberg Ok, the error you quoted led me on the wrong track. We do have a raw.github.com origin configured in the helix-pages service. The connect timeout for this origin is 1000 ms. That's why I thought it's Fastly experiencing a timeout while connecting to the origin. In such a situation Fastly returns a 503.
After looking into it I saw that Fastly returned a 504 so my assumption was wrong 🙂 @stefan-guggisberg Here's the problem: https://dashboard.epsagon.com/spans/0e6c4b50-cb13-7b34-b4a1-0207aa2d6547?tab=graph @stefan-guggisberg The request to https://raw.githubusercontent.com/davidnuescheler/lr-landing/d90a42b1babf33d430450e05cbd3dc1edc6b7135/index.md
took 1780 ms. So it's our timeout of 1000 ms that we pass to our request library.
And yes, we might want to consider increasing it. https://github.com/adobe/helix-pipeline/blob/master/docs/secrets.md#HTTP_TIMEOUT @kptdobe I am not sure about the timeout increase. It happens "rarely". If you increase the timeout, it might still happen but more "rarely". So we push the problem to something harder to analyse. We should think about it. I'll create a ticket tomorrow to follow up and start the discussion. @stefan-guggisberg Well, you said it happens "frequently" 🙂
I agree that we shouldn't increase the timeout unless there's no other option. I am reluctant to increase it.
David mentioned a couple of times that we need some sort of md cache 😉 @trieloff I wouldn’t take it for granted that an MD cache would solve much. Much of the stuff that is cacheable is already cached at a higher level, so an MD cache might just add another layer of caching, with low cache efficiency.
But we could implement a retry logic in the downloader, which would simply try again n times, with exponential backoff.
The text was updated successfully, but these errors were encountered:
Do we have a way of getting the response time distribution (ms to first byte would be ideal) from Coralogix? Once we know the real response time distribution, we can set an informed timeout with a target failure rate.
In addition, I think the re-try strategy is something that could be added both to a downloader and to an MD cache. In the downloader it would be easier, because the downloader already exists.
We see a regular timeout happening when requesting raw content on github. The default timeout is set to 1000ms. In the discussion that started on Slack, we seem to have several tracks to "explore":
Initial discussion transcript:
The text was updated successfully, but these errors were encountered: