Retry list/download operation depending on the HTTP response #300

orbitz · 2024-12-18T09:25:02Z

Is your feature request related to a problem? Please describe.
We are hosting our own LIST and DOWNLOAD proxies and sometimes an operation might fail due to a temporary problem, for example a 502 or a connection failure in the case of a transient network hiccup. As far as I have seen, tenv does not support this, or at least it does not expose environment variables to configure the behaviour.

Describe the solution you'd like
For our own software we support a few retries with exponential backoff. We have found this works for us, depending on the failure case. I think the ability to do this and configure it via env variables would be useful.

Describe alternatives you've considered
We considered executing the install ourselves and retrying if it fails however the interface we provide to users is meant to not know they are using tenv underneath, they just call tofu, so it is not necessarily feasible to ensure an install is done prior to the execution because we do not know when a customer will issue the command.

Additional context

The text was updated successfully, but these errors were encountered:

kvendingoldo · 2024-12-18T10:17:29Z

retries with exponential backoff sound like a good idea!

dvaumoron · 2024-12-23T21:58:18Z

I am not sure that this feature should be embed in tenv code base, however you can use the tenvlib package to manage this kind of use case.

orbitz · 2024-12-26T07:51:00Z

Any argument as to why? I'm guessing 90%+ usage of tenv is via the tenv CLI. And currently that means there is no good retry mechanism for that usage.

dvaumoron · 2024-12-26T12:52:53Z

a good retry mecanism for a CLI usage ? I don't get your point, the user can lauch the command again in case of network/server temporary failure

why does it worth the cost of increasing maintenance effort ?

orbitz · 2024-12-27T08:57:32Z

It simplifies the interface for the user, generally these failures are ephemeral and just retrying will resolve it (depending on the failure).
If you use tenv indirectly via tofu or terraform where it fetches and installs as part of the tofu or terraform command, then the user has to distinguish between tenv failing and and tofu or terraform failing, so it's harder to determine if you even should retry, whereas tenv has the knowledge already so it can make that decision. tenv is already doing magic behind the scenes to make using tofu and terraform feel like a seamless experience, but part of doing network calls, I would argue, require including retries in them to maintain that seamless experience, so networks are fickle beasts.
I guess I don't know Go well enough to say, but I would imagine adding retry for these operations is just wrapping it in a retry function? Is there more to it than that?

dvaumoron · 2025-01-02T17:19:19Z

I disagree on that point, a user without response will wonder why his command hang (this is the reason behind the choice to display tenv output when there is an installation, we don't want the user to kill a "slow" process, instead of being a true proxy as it try to be in other case)
this point seem interesting, however you seem to forget that tofu (or terraform) are designed to be idempotent, so there should not be issue to re run a command. Moreover tenv use colorized output in proxy mode to allows to know where the error come from (proxied output stay unchanged), although I guess some improvement could be done there (better error message and option to change the color to avoid missing the difference when the default display color is the same).
it depends on what you call "wrapping it in a retry function", because "tenv has the knowledge already so it can make that decision" mean you want a logic, although what kind ? network failure ? server failure depending on http code ? Is it useless to launch again a signature check ? Not really with Cosign, some failure are linked to server stability, so there are decision to make to do that properly.

Again, I think that could become a lot of work for a small benefit

orbitz · 2025-01-02T19:30:23Z

Points 1 and 2 are perhaps easy for a human for interpret, but for automation it is harder. Did tofu init fail because of an ephemeral network error or because of a configuration problem?

For point 3, if you believe it is too much complexity for tenv, I'm in no position to disagree.

I think there is a lot of value here. I believe you are mostly considering the use case of humans directly interacting with tenv, however if you're using tenv in automation, you're using it hundreds of times per hour and those ephemeral failures add up. Additionally, tenv downloads multiple network operations, it's not a great experience to have to re-run the whole operation is just one of those fails (listing, downloading main binary, downloading signatures).

That being said, how does this ticket end? Is there further discussion to be had or is the decision made, in which case we can close the ticket?

kvendingoldo added the enhancement New feature or request label Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry list/download operation depending on the HTTP response #300

Retry list/download operation depending on the HTTP response #300

orbitz commented Dec 18, 2024

kvendingoldo commented Dec 18, 2024

dvaumoron commented Dec 23, 2024

orbitz commented Dec 26, 2024

dvaumoron commented Dec 26, 2024 •

edited

Loading

orbitz commented Dec 27, 2024

dvaumoron commented Jan 2, 2025

orbitz commented Jan 2, 2025

Retry list/download operation depending on the HTTP response #300

Retry list/download operation depending on the HTTP response #300

Comments

orbitz commented Dec 18, 2024

kvendingoldo commented Dec 18, 2024

dvaumoron commented Dec 23, 2024

orbitz commented Dec 26, 2024

dvaumoron commented Dec 26, 2024 • edited Loading

orbitz commented Dec 27, 2024

dvaumoron commented Jan 2, 2025

orbitz commented Jan 2, 2025

dvaumoron commented Dec 26, 2024 •

edited

Loading