Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry list/download operation depending on the HTTP response #300

Open
orbitz opened this issue Dec 18, 2024 · 7 comments
Open

Retry list/download operation depending on the HTTP response #300

orbitz opened this issue Dec 18, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@orbitz
Copy link

orbitz commented Dec 18, 2024

Is your feature request related to a problem? Please describe.
We are hosting our own LIST and DOWNLOAD proxies and sometimes an operation might fail due to a temporary problem, for example a 502 or a connection failure in the case of a transient network hiccup. As far as I have seen, tenv does not support this, or at least it does not expose environment variables to configure the behaviour.

Describe the solution you'd like
For our own software we support a few retries with exponential backoff. We have found this works for us, depending on the failure case. I think the ability to do this and configure it via env variables would be useful.

Describe alternatives you've considered
We considered executing the install ourselves and retrying if it fails however the interface we provide to users is meant to not know they are using tenv underneath, they just call tofu, so it is not necessarily feasible to ensure an install is done prior to the execution because we do not know when a customer will issue the command.

Additional context

@kvendingoldo
Copy link
Collaborator

retries with exponential backoff sound like a good idea!

@kvendingoldo kvendingoldo added the enhancement New feature or request label Dec 18, 2024
@dvaumoron
Copy link
Contributor

I am not sure that this feature should be embed in tenv code base, however you can use the tenvlib package to manage this kind of use case.

@orbitz
Copy link
Author

orbitz commented Dec 26, 2024

Any argument as to why? I'm guessing 90%+ usage of tenv is via the tenv CLI. And currently that means there is no good retry mechanism for that usage.

@dvaumoron
Copy link
Contributor

dvaumoron commented Dec 26, 2024

a good retry mecanism for a CLI usage ? I don't get your point, the user can lauch the command again in case of network/server temporary failure

why does it worth the cost of increasing maintenance effort ?

@orbitz
Copy link
Author

orbitz commented Dec 27, 2024

  1. It simplifies the interface for the user, generally these failures are ephemeral and just retrying will resolve it (depending on the failure).
  2. If you use tenv indirectly via tofu or terraform where it fetches and installs as part of the tofu or terraform command, then the user has to distinguish between tenv failing and and tofu or terraform failing, so it's harder to determine if you even should retry, whereas tenv has the knowledge already so it can make that decision. tenv is already doing magic behind the scenes to make using tofu and terraform feel like a seamless experience, but part of doing network calls, I would argue, require including retries in them to maintain that seamless experience, so networks are fickle beasts.
  3. I guess I don't know Go well enough to say, but I would imagine adding retry for these operations is just wrapping it in a retry function? Is there more to it than that?

@dvaumoron
Copy link
Contributor

  1. I disagree on that point, a user without response will wonder why his command hang (this is the reason behind the choice to display tenv output when there is an installation, we don't want the user to kill a "slow" process, instead of being a true proxy as it try to be in other case)
  2. this point seem interesting, however you seem to forget that tofu (or terraform) are designed to be idempotent, so there should not be issue to re run a command. Moreover tenv use colorized output in proxy mode to allows to know where the error come from (proxied output stay unchanged), although I guess some improvement could be done there (better error message and option to change the color to avoid missing the difference when the default display color is the same).
  3. it depends on what you call "wrapping it in a retry function", because "tenv has the knowledge already so it can make that decision" mean you want a logic, although what kind ? network failure ? server failure depending on http code ? Is it useless to launch again a signature check ? Not really with Cosign, some failure are linked to server stability, so there are decision to make to do that properly.

Again, I think that could become a lot of work for a small benefit

@orbitz
Copy link
Author

orbitz commented Jan 2, 2025

Points 1 and 2 are perhaps easy for a human for interpret, but for automation it is harder. Did tofu init fail because of an ephemeral network error or because of a configuration problem?

For point 3, if you believe it is too much complexity for tenv, I'm in no position to disagree.

I think there is a lot of value here. I believe you are mostly considering the use case of humans directly interacting with tenv, however if you're using tenv in automation, you're using it hundreds of times per hour and those ephemeral failures add up. Additionally, tenv downloads multiple network operations, it's not a great experience to have to re-run the whole operation is just one of those fails (listing, downloading main binary, downloading signatures).

That being said, how does this ticket end? Is there further discussion to be had or is the decision made, in which case we can close the ticket?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants