[bug] : Model not loading while using existing container image to setup MME on sagemaker #170
Description
Checklist
- I've prepended issue tag with type of change: [bug]
- (If applicable) I've attached the script to reproduce the bug
- (If applicable) I've documented below the DLC image/dockerfile this relates to
- (If applicable) I've documented below the tests I've run on the DLC image
- I'm using an existing DLC image listed here: https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html
- I've built my own container based off DLC (and I've attached the code used to build my own image)
Concise Description:
Getting this error, when invoking a MME on sagemaker setup using 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.3.0-cpu-py37-ubuntu18.04
container image.
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=14448): Max retries exceeded with url: /v1/models/d2295a7526f9df36354b8a2c4adc4f63 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f70966dba50>: Failed to establish a new connection: [Errno 111] Connection refused'))
Traceback (most recent call last):
File "/sagemaker/python_service.py", line 157, in _handle_load_model_post
self._wait_for_model(model_name)
File "/sagemaker/python_service.py", line 247, in _wait_for_model
response = session.get(url)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 546, in get
return self.request('GET', url, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
DLC image/dockerfile:
763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.3.0-cpu-py37-ubuntu18.04
Current behavior:
Expected behavior:
Model should load up and return prediction
Additional context:
I have setup a MME using the above mentioned container and invoking the endpoint using a lambda. The model files are in placed in S3 and are in the correct directory structure with a version number.