Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry failed external connections #98

Open
hexylena opened this issue Jun 11, 2018 · 10 comments
Open

Retry failed external connections #98

hexylena opened this issue Jun 11, 2018 · 10 comments

Comments

@hexylena
Copy link
Member

We've been running our automated tool installation and we occasionally see errors due to transient network failures that cause failures of the jenkins job. If we wrapped all bioblend / external network connections in a retry (we use https://github.com/litl/backoff for some stuff here) this would be extremely helpful.

E.g.

(512/720) repository iwtomics_loadandplot already installed at revision ce633cc8f5f9. Skipping.
Traceback (most recent call last):
  File "/home/centos/shiningpanda/jobs/023ca033/virtualenvs/d41d8cd9/bin/shed-tools", line 11, in <module>
    sys.exit(main())
  File "/home/centos/shiningpanda/jobs/023ca033/virtualenvs/d41d8cd9/lib/python2.7/site-packages/ephemeris/shed_tools.py", line 744, in main
    install_tool_manager.install_repositories()
  File "/home/centos/shiningpanda/jobs/023ca033/virtualenvs/d41d8cd9/lib/python2.7/site-packages/ephemeris/shed_tools.py", line 572, in install_repositories
    repository = self.get_changeset_revision(repository)
  File "/home/centos/shiningpanda/jobs/023ca033/virtualenvs/d41d8cd9/lib/python2.7/site-packages/ephemeris/shed_tools.py", line 720, in get_changeset_revision
    installable_revisions = ts.repositories.get_ordered_installable_revisions(repository['name'], repository['owner'])
  File "/home/centos/shiningpanda/jobs/023ca033/virtualenvs/d41d8cd9/lib/python2.7/site-packages/bioblend/toolshed/repositories/__init__.py", line 149, in get_ordered_installable_revisions
    r = self._get(url=url, params=params)
  File "/home/centos/shiningpanda/jobs/023ca033/virtualenvs/d41d8cd9/lib/python2.7/site-packages/bioblend/galaxy/client.py", line 136, in _get
    status_code=r.status_code)
bioblend.ConnectionError: GET: error 502: '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">\n<html>\n\n<head>\n<title>Galaxy</title>\n<style type="text/css">\nbody\n{\n    font: 75% verdana, "Bitstream Vera Sans", geneva, arial, helvetica, helve, sans-serif;\n    background: white url(//error/content_bg.png) top repeat-x;\n    color: #303030;\n    padding: 0;\n    border: 0;\n    margin: 0;\n    margin-right: 0;\n    margin-left: 0;\n}\n\ndiv.pageTitle\n{\n    font-size: x-large;\n    font-weight: bold;\n}\n\ndiv.pageTitle a:link, div.pageTitle a:visited, div.pageTitle a:active, div.pageTitle a:hover\n{\n    text-decoration: none;\n    color: #ece7f2;\n}\n/*a:link, a:visited, a:active\n{\n}*/\ntd.masthead\n{\n    vertical-align: middle;\n    background: #023858 url(//error/masthead_bg.png) bottom;\n    height: 40px;\n    padding-left: 10px;\n}\ntd.content\n{\n    vertical-align: top;\n    padding: 10px;\n}\na:link, a:visited, a:active\n{\n    color: #303030;\n}\n</style>\n</head>\n<table width="100%" border="0" cellspacing="0" cellpadding="0">\n    <tr>\n        <td class="masthead"><div class="pageTitle"><a target="_blank" href="http://galaxyproject.org">Galaxy</a></div></td>\n    </tr>\n    <tr>\n        <td class="content">\n            <h2>The Tool Shed could not be reached</h2>\n            <p>\n            \n            \n                            You are seeing this message because a request to The Tool Shed timed out or was refused. This may\n                be a temporary issue which could be resolved by retrying the operation you were performing. If you receive this\n                message repeatedly or for an extended amount of time, please                contact an administrator.\n                        \n            </p>\n        </td>\n    </tr>\n</table>\n\n</html>\n', 0 attempts left: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">
@rhpvorderman
Copy link
Contributor

I agree with this. We should probably do this when refactoring shed-tools #91 . The code is very spaghetti ish now. I volunteer, but it is unclear yet when I have time to do this. Probably somewhere in july august.

mtangaro added a commit to indigo-dc/ansible-role-galaxycloud-tools that referenced this issue Aug 5, 2018
@rhpvorderman
Copy link
Contributor

I have been thinking about this. Wouldn't be a global BIOBLEND_RETRY environment variable or some other setting in bioblend preferable? I mean, it would be a lot more effective than fixing bioblend's errors in all the tools that use it.
What do you think about this @erasche ?

@hexylena
Copy link
Member Author

hexylena commented Aug 6, 2018

Yeah that could work @rhpvorderman. Something sort of like logging?

import bioblend
bioblend.clientConfig(retry=5)

You have a good point that it'd be easier to fix there than everything that depends on it. @nsoranzo do you have an opinion on this?

@nsoranzo
Copy link
Member

nsoranzo commented Aug 6, 2018

For GET requests it could be added to BioBlend, for POST probably not a good idea. Not sure about PUT and DELETE.

@hexylena
Copy link
Member Author

hexylena commented Aug 6, 2018

obviously will default to 0 so we don't add a footgun for bioblend users.

  • Most of the DELETEs I see are for object uniquely by ID in bioblend. Should be fine to retry N times.
  • PUT ought to be safe, re-updating an object should not fail. The only case I imagine that not working is the quota API, but the quota API should be fixed.
  • POST we could do if we implemented UUIDs on requests but that's a huge overhaul so maybe we add a retry_dangerous=False in client config?

@nsoranzo
Copy link
Member

nsoranzo commented Aug 6, 2018

For a start we can add a retry_idempotent, i.e. excluding POST and forcing it to 0 for the quota PUT requests.

@hexylena
Copy link
Member Author

hexylena commented Aug 6, 2018

@rhpvorderman you can do some of this already via

from bioblend import galaxy

gx = galaxy.GalaxyInstance(...)`
gx.max_get_attempts(5)

@hexylena
Copy link
Member Author

hexylena commented Aug 6, 2018

@nsoranzo I'm working up a PR now. Would allowing the user to choose which methods to retry be desirable? retry_idempotent sounds good though.

@nsoranzo
Copy link
Member

nsoranzo commented Aug 6, 2018

I think users may not be too familiar with REST API concepts to make the best choice, depends if there is a use case.

@hexylena
Copy link
Member Author

hexylena commented Aug 6, 2018

How about something like this? (Does not have to be this) galaxyproject/bioblend#266 Has retry_idempotent so users can use that.

I think my only case is things like installation, but there I guess we possibly need more complex/intelligent retry logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants