Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover from network disruption instead of errors and quitting without data #5

Open
hillaryj opened this issue Mar 11, 2021 · 0 comments
Labels
bug This issue or pull request addresses broken functionality

Comments

@hillaryj
Copy link
Contributor

🐛 Summary

When network configuration changes during execution of the Docker image, all data collected to that point is not accessible. As a first step, perhaps if we catch the error chain that leads to quitting and add a small wait time to retry the connection, we'd be able to recover gracefully.

The Docker processing takes roughly 2 hours to perform on my local environment, so losing connection without getting the data out or being able to resume is a significant and frustrating impact.

To reproduce

Steps to reproduce the behavior:

  1. Start execution per the instructions in the README
  2. Disconnect or change VPN connections in a way that interrupts the established headless browser
  3. Errors!

Expected behavior

Either:

  • Output the data collected so far
  • Resume processing when the Docker starts up again if it's close to when data collection started
  • Wait and retry the internet connection, then resume processing

Any helpful log output or screenshots

Paste the results here:

vdp-scanner_1          | 2021-03-05 16:35:16,262 WARNING Falling back to HTTPS without TLS verification for 'GODIRECT.GOV'
vdp-scanner_1          | Traceback (most recent call last):
vdp-scanner_1          |   File "/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen
vdp-scanner_1          |     httplib_response = self._make_request(
vdp-scanner_1          |   File "/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 382, in _make_request
vdp-scanner_1          |     self._validate_conn(conn)
vdp-scanner_1          |   File "/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1010, in _validate_conn
vdp-scanner_1          |     conn.connect()
vdp-scanner_1          |   File "/.venv/lib/python3.9/site-packages/urllib3/connection.py", line 411, in connect
vdp-scanner_1          |     self.sock = ssl_wrap_socket(
vdp-scanner_1          |   File "/.venv/lib/python3.9/site-packages/urllib3/util/ssl_.py", line 428, in ssl_wrap_socket
vdp-scanner_1          |     ssl_sock = _ssl_wrap_socket_impl(
vdp-scanner_1          |   File "/.venv/lib/python3.9/site-packages/urllib3/util/ssl_.py", line 472, in _ssl_wrap_socket_impl
vdp-scanner_1          |     return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
vdp-scanner_1          |   File "/usr/local/lib/python3.9/ssl.py", line 500, in wrap_socket
vdp-scanner_1          |     return self.sslsocket_class._create(
vdp-scanner_1          |   File "/usr/local/lib/python3.9/ssl.py", line 1040, in _create
vdp-scanner_1          |     self.do_handshake()
vdp-scanner_1          |   File "/usr/local/lib/python3.9/ssl.py", line 1309, in do_handshake
vdp-scanner_1          |     self._sslobj.do_handshake()
vdp-scanner_1          | ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)
vdp-scanner_1          |
vdp-scanner_1          | During handling of the above exception, another exception occurred:
vdp-scanner_1          |
vdp-scanner_1          | Traceback (most recent call last):
vdp-scanner_1          |   File "/.venv/lib/python3.9/site-packages/requests/adapters.py", line 439, in send
vdp-scanner_1          |     resp = conn.urlopen(
vdp-scanner_1          |   File "/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 755, in urlopen
vdp-scanner_1          |     retries = retries.increment(
vdp-scanner_1          |   File "/.venv/lib/python3.9/site-packages/urllib3/util/retry.py", line 573, in increment
vdp-scanner_1          |     raise MaxRetryError(_pool, url, error or ResponseError(cause))
vdp-scanner_1          | urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='godirect.gov', port=443): Max retries exceeded with url: /vulnerability-disclosure-policy (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
vdp-scanner_1          |
vdp-scanner_1          | During handling of the above exception, another exception occurred:
vdp-scanner_1          |
vdp-scanner_1          | Traceback (most recent call last):
vdp-scanner_1          |   File "/task/vdp_scanner.py", line 106, in check_for_vdp
vdp-scanner_1          |     result = self._hasher.hash_url(urlunparse(url))
vdp-scanner_1          |   File "/.venv/lib/python3.9/site-packages/hash_http_content/hasher.py", line 272, in hash_url
vdp-scanner_1          |     raise err
vdp-scanner_1          |   File "/.venv/lib/python3.9/site-packages/hash_http_content/hasher.py", line 258, in hash_url
vdp-scanner_1          |     resp = requests.get(url, timeout=self._timeout, verify=verify)
vdp-scanner_1          |   File "/.venv/lib/python3.9/site-packages/requests/api.py", line 76, in get
vdp-scanner_1          |     return request('get', url, params=params, **kwargs)
vdp-scanner_1          |   File "/.venv/lib/python3.9/site-packages/requests/api.py", line 61, in request
vdp-scanner_1          |     return session.request(method=method, url=url, **kwargs)
vdp-scanner_1          |   File "/.venv/lib/python3.9/site-packages/requests/sessions.py", line 542, in request
vdp-scanner_1          |     resp = self.send(prep, **send_kwargs)
vdp-scanner_1          |   File "/.venv/lib/python3.9/site-packages/requests/sessions.py", line 655, in send
vdp-scanner_1          |     r = adapter.send(request, **kwargs)
vdp-scanner_1          |   File "/.venv/lib/python3.9/site-packages/requests/adapters.py", line 514, in send
vdp-scanner_1          |     raise SSLError(e, request=request)
vdp-scanner_1          | requests.exceptions.SSLError: HTTPSConnectionPool(host='godirect.gov', port=443): Max retries exceeded with url: /vulnerability-disclosure-policy (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
vdp-scanner_1          |
vdp-scanner_1          | During handling of the above exception, another exception occurred:
vdp-scanner_1          |
vdp-scanner_1          | Traceback (most recent call last):
vdp-scanner_1          |   File "/task/vdp_scanner.py", line 307, in <module>
vdp-scanner_1          |     main()
vdp-scanner_1          |   File "/task/vdp_scanner.py", line 301, in main
vdp-scanner_1          |     scanner.process_domain(domain_info)
vdp-scanner_1          |   File "/task/vdp_scanner.py", line 149, in process_domain
vdp-scanner_1          |     vdp_result = self.check_for_vdp(domain_info["Domain Name"])
vdp-scanner_1          |   File "/task/vdp_scanner.py", line 114, in check_for_vdp
vdp-scanner_1          |     result = self._hasher.hash_url(urlunparse(url), verify=False)
vdp-scanner_1          |   File "/.venv/lib/python3.9/site-packages/hash_http_content/hasher.py", line 296, in hash_url
vdp-scanner_1          |     processed = self._handlers.get(content_type, self._handle_plaintext)(
vdp-scanner_1          |   File "/.venv/lib/python3.9/site-packages/hash_http_content/hasher.py", line 216, in _handle_html
vdp-scanner_1          |     page_contents: str = asyncio.get_event_loop().run_until_complete(
vdp-scanner_1          |   File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
vdp-scanner_1          |     return future.result()
vdp-scanner_1          |   File "/.venv/lib/python3.9/site-packages/pyppeteer/page.py", line 803, in content
vdp-scanner_1          |     return await frame.content()
vdp-scanner_1          |   File "/.venv/lib/python3.9/site-packages/pyppeteer/frame_manager.py", line 384, in content
vdp-scanner_1          |     return await self.evaluate('''
vdp-scanner_1          |   File "/.venv/lib/python3.9/site-packages/pyppeteer/frame_manager.py", line 308, in evaluate
vdp-scanner_1          |     return await context.evaluate(
vdp-scanner_1          |   File "/.venv/lib/python3.9/site-packages/pyppeteer/execution_context.py", line 53, in evaluate
vdp-scanner_1          |     handle = await self.evaluateHandle(
vdp-scanner_1          |   File "/.venv/lib/python3.9/site-packages/pyppeteer/execution_context.py", line 108, in evaluateHandle
vdp-scanner_1          |     _rewriteError(e)
vdp-scanner_1          |   File "/.venv/lib/python3.9/site-packages/pyppeteer/execution_context.py", line 237, in _rewriteError
vdp-scanner_1          |     raise type(error)(msg)
vdp-scanner_1          | pyppeteer.errors.NetworkError: Execution context was destroyed, most likely because of a navigation.
vdp-scanner_1          | [I:pyppeteer.launcher] terminate chrome process...
@hillaryj hillaryj added the bug This issue or pull request addresses broken functionality label Mar 11, 2021
@hillaryj hillaryj mentioned this issue Mar 11, 2021
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue or pull request addresses broken functionality
Projects
None yet
Development

No branches or pull requests

1 participant