Skip to content
This repository has been archived by the owner on May 23, 2024. It is now read-only.

Container build not installing/finding dependencies contained in model.tar file #167

Open
taylorsweet opened this issue Sep 17, 2020 · 7 comments

Comments

@taylorsweet
Copy link

taylorsweet commented Sep 17, 2020

Issue: Inference.py dependencies aren't installed in SageMaker tensorflow serving container.
Resulting error: ModuleNotFoundError: No module named 'nltk'

Versioning details
Sagemaker env: conda_python3
Tensorflow version: 2.3.0
Tensorflow serving container versions: 2.0 (also tried 2.1, 2.2, 2.3)

Directory structure containing model & dependencies (prior to tarring)
+-- 1
| +-- variables
| +-- +-- variables.data-00000-of-00001
| +-- +-- variables.index
| +-- saved_model.pb

| +-- code
| +-- +-- inference.py
| +-- +-- requirements.txt
| +-- +-- word_vectors.txt
| +-- +-- bigram.pkl

I have also tried deploying from a separate directory that has a code>lib>external_module, which contains the nltk module itself rather than a requirements file. Neither of these approaches work - both return the same module not found error.

Deployment from SageMaker notebook using the Python SDK:
tensorflow_serving_model = Model(model_data = model_data
      ,role=role
      ,framework_version='2.0'
      ,entry_point='inference.py') #running without the entry point works as expected

tensorflow_serving_model.deploy(initial_instance_count=1,
      instance_type='ml.c4.xlarge')

requirements.txt
nltk==3.4.5
more_itertools==8.2.0
gensim==3.8.3

Note: There are no issues with the model file. When I instantiate the tensorflow_serving_model.Model() instance without specifying the inference.py entry point, my model runs successfully and I get predictions back after passing an ndarry.

Thoughts on how to get nltk (and other dependencies) loaded on the serving container? Thank you!!

@chuyang-deng
Copy link
Contributor

chuyang-deng commented Sep 18, 2020

Hi @taylorsweet, it looks like your /code directory is under /1 directory. In order to have the container to install your bring-in dependencies, please make sure the /code dir is at the same level of /<model_version>.

Therefore, in your case, the directory structure should look like this:

/opt/ml/model/:
  |--model_name
        |--1
            |--variables
            |--saved_model.pb
  |--code
        |--inference.py
        |--requirements.txt

@taylorsweet
Copy link
Author

taylorsweet commented Sep 18, 2020

Thanks Chuyang!

The directory that I'm tarring now looks like this, but I'm getting the same error
  |--cnn_model
     |--1
      |--variables
      |--saved_model.pb
  |--code
    |--inference.py
    |--requirements.txt

@taylorsweet
Copy link
Author

image

@taylorsweet
Copy link
Author

image

@Abd-elr4hman
Copy link

@taylorsweet @chuyang-deng I have similar issue!
I am trying to deploy a tensorflow model and provide the post processing in an inference.py file...

I previously managed to deploy the model and invoke it in a notebook and then do the post processing in a jupyter notebook with the following code:

model = Model(
    name=name_from_base('tf-yolov4'),
    model_data=model_artifact,
    role=role,
    framework_version='2.3'
)

now i want to do the post processing by providing an inference.py file so i followed the docs here:
https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/using_tf.html#sagemaker-tensorflow-docker-containers

and used this snippet:

from sagemaker.tensorflow.serving import Model

tensorflow_serving_model = Model(entry_point='inference.py',
                        dependencies=['requirements.txt'],
                        model_data=model_artifact,
                        role='MySageMakerRole')

The dependencies i added:

numpy
tensorflow

My problem is:
the deployment process when i call

predictor = tensorflow_serving_model.deploy(initial_instance_count=1, instance_type='ml.m5.xlarge')

doesn't complete, and when i checked cloud watch i found the following:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
    worker.init_process()
  File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/ggevent.py", line 162, in init_process
    super().init_process()
  File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/base.py", line 119, in init_process
    self.load_wsgi()
  File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/base.py", line 144, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/usr/local/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/usr/local/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 49, in load
    return self.load_wsgiapp()
  File "/usr/local/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 39, in load_wsgiapp
    return util.import_app(self.app_uri)
  File "/usr/local/lib/python3.7/site-packages/gunicorn/util.py", line 358, in import_app
    mod = importlib.import_module(module)
  File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/sagemaker/python_service.py", line 414, in <module>
    resources = ServiceResources()
  File "/sagemaker/python_service.py", line 400, in __init__
    self._python_service_resource = PythonServiceResource()
  File "/sagemaker/python_service.py", line 83, in __init__
    self._handler, self._input_handler, self._output_handler = self._import_handlers()
  File "/sagemaker/python_service.py", line 278, in _import_handlers
    spec.loader.exec_module(inference)
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/opt/ml/model/code/inference.py", line 2, in <module>
    import numpy as np

and

ModuleNotFoundError: No module named 'numpy'

which led me to believe that my inference.py was used by the container but not the requirements.txt file i provided therefore No module named 'numpy'!

I've aslo tarred my model file as follows:

model.tar.gz/
             |--[model_version_number]/|--variables|--saved_model.pb
            code/
                |--inference.py|--requirements.txt

and when i use

tensorflow_serving_model = Model(
                        model_data=model_artifact,
                        role='MySageMakerRole')

it does deploy my model successfully but completely ignore my code/ directory.

@fhuthmacher
Copy link

I face exactly the same issue, were you able to resolve it?

@zulexemplar
Copy link

Hi I am having same problem. can someone help pls?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants