[bug] : Model not loading while using existing container image to setup MME on sagemaker

Checklist
- [x] I've prepended issue tag with type of change: [bug]
- [ ] (If applicable) I've attached the script to reproduce the bug
- [x] (If applicable) I've documented below the DLC image/dockerfile this relates to
- [ ] (If applicable) I've documented below the tests I've run on the DLC image
- [x] I'm using an existing DLC image listed here: https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html
- [ ] I've built my own container based off DLC (and I've attached the code used to build my own image)

*Concise Description:*
Getting this error, when invoking a MME on sagemaker setup using `763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.3.0-cpu-py37-ubuntu18.04` container image.

urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=14448): Max retries exceeded with url: /v1/models/d2295a7526f9df36354b8a2c4adc4f63 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f70966dba50>: Failed to establish a new connection: [Errno 111] Connection refused'))
Traceback (most recent call last):
  File "/sagemaker/python_service.py", line 157, in _handle_load_model_post
    self._wait_for_model(model_name)
  File "/sagemaker/python_service.py", line 247, in _wait_for_model
    response = session.get(url)
  File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 546, in get
    return self.request('GET', url, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)

*DLC image/dockerfile:*
763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.3.0-cpu-py37-ubuntu18.04
*Current behavior:*

*Expected behavior:*
Model should load up and return prediction
*Additional context:*
I have setup a MME using the above mentioned container and invoking the endpoint using a lambda. The model files are in placed in S3 and are in the correct directory structure with a version number. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bug] : Model not loading while using existing container image to setup MME on sagemaker #170

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[bug] : Model not loading while using existing container image to setup MME on sagemaker #170

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions