-
Notifications
You must be signed in to change notification settings - Fork 101
Feature: Support multiple inference.py files and universal inference.… #228
Changes from 10 commits
116fb22
a20d398
35cef64
5cde4a4
46328d1
195239d
0b44c19
50b1d79
002292a
96346a9
00ef7fd
55a6336
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -164,8 +164,6 @@ For example: | |
|
||
## Pre/Post-Processing | ||
|
||
**NOTE: There is currently no support for pre-/post-processing with multi-model containers.** | ||
|
||
SageMaker TensorFlow Serving Container supports the following Content-Types for requests: | ||
|
||
* `application/json` (default) | ||
|
@@ -181,7 +179,8 @@ The container will convert data in these formats to [TensorFlow Serving REST API | |
and will send these requests to the default serving signature of your SavedModel bundle. | ||
|
||
You can also add customized Python code to process your input and output data. To use this feature, you need to: | ||
1. Add a python file named `inference.py` to the code directory inside your model archive. | ||
1. Add a python file named `inference.py` to the `code` directory inside your model archive. | ||
2. If you want to use a universal `inference.py` file with a multi-model endpoint, then include it in the `code` directory at the s3 URI where the model archives are stored. | ||
2. In `inference.py`, implement either a pair of `input_handler` and `output_handler` functions or a single `handler` function. Note that if `handler` function is implemented, `input_handler` and `output_handler` will be ignored. | ||
|
||
To implement pre/post-processing handler(s), you will need to make use of the `Context` object created by Python service. The `Context` is a `namedtuple` with following attributes: | ||
|
@@ -359,35 +358,36 @@ def _process_output(data, context): | |
|
||
You can also bring in external dependencies to help with your data processing. There are 2 ways to do this: | ||
1. If your model archive contains `code/requirements.txt`, the container will install the Python dependencies at runtime using `pip install -r`. | ||
2. If you invoke a multi-model endpoint, only the universal `requirements.txt` file's Python dependencies will be installed at runtime. Dependencies specified in any `requirements.txt` file present in a model archive will not be installed. | ||
2. If you are working in a network-isolation situation or if you don't want to install dependencies at runtime everytime your Endpoint starts or Batch Transform job runs, you may want to put pre-downloaded dependencies under `code/lib` directory in your model archive, the container will then add the modules to the Python path. Note that if both `code/lib` and `code/requirements.txt` are present in the model archive, the `requirements.txt` will be ignored. | ||
|
||
Your untarred model directory structure may look like this if you are using `requirements.txt`: | ||
|
||
model1 | ||
/opt/ml/models/model1/model | ||
|--[model_version_number] | ||
|--variables | ||
|--saved_model.pb | ||
model2 | ||
/opt/ml/models/model2/model | ||
|--[model_version_number] | ||
|--assets | ||
|--variables | ||
|--saved_model.pb | ||
code | ||
/opt/ml/code | ||
|--inference.py | ||
|--requirements.txt | ||
|
||
Your untarred model directory structure may look like this if you have downloaded modules under `code/lib`: | ||
|
||
model1 | ||
/opt/ml/models/model1/model | ||
|--[model_version_number] | ||
|--variables | ||
|--saved_model.pb | ||
model2 | ||
/opt/ml/models/model2/model | ||
|--[model_version_number] | ||
|--assets | ||
|--variables | ||
|--saved_model.pb | ||
code | ||
/opt/ml/code | ||
|--lib | ||
|--external_module | ||
|--inference.py | ||
|
@@ -672,7 +672,7 @@ Only 90% of the ports will be utilized and each loaded model will be allocated w | |
For example, if the ``SAGEMAKER_SAFE_PORT_RANGE`` is between 9000 to 9999, the maximum number of models that can be loaded to the endpoint at the same time would be 499 ((9999 - 9000) * 0.9 / 2). | ||
|
||
### Using Multi-Model Endpoint with Pre/Post-Processing | ||
Multi-Model Endpoint can be used together with Pre/Post-Processing. Each model will need its own ``inference.py`` otherwise default handlers will be used. An example of the directory structure of Multi-Model Endpoint and Pre/Post-Processing would look like this: | ||
Multi-Model Endpoint can be used together with Pre/Post-Processing. Each model can either have its own ``inference.py`` or use a universal ``inference.py``. If both model-specific and universal ``inference.py`` files are provided, then the model-specific ``inference.py`` file is used. If both files are absent, then the default handlers will be used. An example of the directory structure of Multi-Model Endpoint with a model-specific ``inference.py`` file would look like this: | ||
|
||
/opt/ml/models/model1/model | ||
|--[model_version_number] | ||
|
@@ -687,7 +687,20 @@ Multi-Model Endpoint can be used together with Pre/Post-Processing. Each model w | |
|--lib | ||
|--external_module | ||
|--inference.py | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How to provide model specific inference.py via SM SDK MME? Can you provide add notebook in SM examples? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can create a notebook for SM examples which will demonstrate the usage of model-specific inference.py files. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated the directory structure |
||
Another example with of the directory structure of Multi-Model Endpoint with a universal ``inference.py`` file is as follows: | ||
|
||
/opt/ml/models/model1/model | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above, remove There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated |
||
|--[model_version_number] | ||
|--variables | ||
|--saved_model.pb | ||
/opt/ml/models/model2/model | ||
|--[model_version_number] | ||
|--assets | ||
|--variables | ||
|--saved_model.pb | ||
/opt/ml/code | ||
|--requirements.txt | ||
|--inference.py | ||
## Contributing | ||
|
||
Please read [CONTRIBUTING.md](https://github.com/aws/sagemaker-tensorflow-serving-container/blob/master/CONTRIBUTING.md) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,7 @@ | |
import os | ||
import subprocess | ||
import grpc | ||
import sys | ||
|
||
import falcon | ||
import requests | ||
|
@@ -26,8 +27,8 @@ | |
import tfs_utils | ||
|
||
SAGEMAKER_MULTI_MODEL_ENABLED = os.environ.get("SAGEMAKER_MULTI_MODEL", "false").lower() == "true" | ||
MODEL_DIR = "models" if SAGEMAKER_MULTI_MODEL_ENABLED else "model" | ||
INFERENCE_SCRIPT_PATH = f"/opt/ml/{MODEL_DIR}/code/inference.py" | ||
MODEL_DIR = "" if SAGEMAKER_MULTI_MODEL_ENABLED else "model/" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why needs to change the dir structures? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem with this path is that This problem has been highlighted in other Github issues (#212 and #211) as well and a PR (#215) was created to solve the issue but it was not merged. We can use the path |
||
INFERENCE_SCRIPT_PATH = f"/opt/ml/{MODEL_DIR}code/inference.py" | ||
|
||
SAGEMAKER_BATCHING_ENABLED = os.environ.get("SAGEMAKER_TFS_ENABLE_BATCHING", "false").lower() | ||
MODEL_CONFIG_FILE_PATH = "/sagemaker/model-config.cfg" | ||
|
@@ -77,6 +78,7 @@ def __init__(self): | |
# between each grpc port and channel | ||
self._setup_channel(grpc_port) | ||
|
||
self._default_handlers_enabled = False | ||
if os.path.exists(INFERENCE_SCRIPT_PATH): | ||
# Single-Model Mode & Multi-Model Mode both use one inference.py | ||
self._handler, self._input_handler, self._output_handler = self._import_handlers() | ||
|
@@ -85,6 +87,7 @@ def __init__(self): | |
) | ||
else: | ||
self._handlers = default_handler | ||
self._default_handlers_enabled = True | ||
|
||
self._tfs_enable_batching = SAGEMAKER_BATCHING_ENABLED == "true" | ||
self._tfs_default_model_name = os.environ.get("TFS_DEFAULT_MODEL_NAME", "None") | ||
|
@@ -143,6 +146,7 @@ def _handle_load_model_post(self, res, data): # noqa: C901 | |
# validate model files are in the specified base_path | ||
if self.validate_model_dir(base_path): | ||
try: | ||
self._import_custom_modules(model_name) | ||
tfs_config = tfs_utils.create_tfs_config_individual_model(model_name, base_path) | ||
tfs_config_file = "/sagemaker/tfs-config/{}/model-config.cfg".format(model_name) | ||
log.info("tensorflow serving model config: \n%s\n", tfs_config) | ||
|
@@ -221,6 +225,17 @@ def _handle_load_model_post(self, res, data): # noqa: C901 | |
} | ||
) | ||
|
||
def _import_custom_modules(self, model_name): | ||
inference_script_path = "/opt/ml/models/{}/model/code/inference.py".format(model_name) | ||
python_lib_path = "/opt/ml/models/{}/model/code/lib".format(model_name) | ||
if os.path.exists(python_lib_path): | ||
log.info("add Python code library path") | ||
sys.path.append(python_lib_path) | ||
if os.path.exists(inference_script_path): | ||
handler, input_handler, output_handler = self._import_handlers(inference_script_path) | ||
model_handlers = self._make_handler(handler, input_handler, output_handler) | ||
self.model_handlers[model_name] = model_handlers | ||
|
||
def _cleanup_config_file(self, config_file): | ||
if os.path.exists(config_file): | ||
os.remove(config_file) | ||
|
@@ -264,8 +279,20 @@ def _handle_invocation_post(self, req, res, model_name=None): | |
|
||
try: | ||
res.status = falcon.HTTP_200 | ||
|
||
res.body, res.content_type = self._handlers(data, context) | ||
handlers = self._handlers | ||
if SAGEMAKER_MULTI_MODEL_ENABLED and model_name in self.model_handlers: | ||
inference_script_path = "/opt/ml/models/{}/model/code/" \ | ||
"inference.py".format(model_name) | ||
log.info("Inference script found at path {}.".format(inference_script_path)) | ||
log.info("Inference script exists, importing handlers.") | ||
handlers = self.model_handlers[model_name] | ||
elif not self._default_handlers_enabled: | ||
log.info("Universal inference script found at path " | ||
"{}.".format(INFERENCE_SCRIPT_PATH)) | ||
log.info("Universal inference script exists, importing handlers.") | ||
else: | ||
log.info("Inference script does not exist, using default handlers.") | ||
res.body, res.content_type = handlers(data, context) | ||
except Exception as e: # pylint: disable=broad-except | ||
log.exception("exception handling request: {}".format(e)) | ||
res.status = falcon.HTTP_500 | ||
|
@@ -276,8 +303,7 @@ def _setup_channel(self, grpc_port): | |
log.info("Creating grpc channel for port: %s", grpc_port) | ||
self._channels[grpc_port] = grpc.insecure_channel("localhost:{}".format(grpc_port)) | ||
|
||
def _import_handlers(self): | ||
inference_script = INFERENCE_SCRIPT_PATH | ||
def _import_handlers(self, inference_script=INFERENCE_SCRIPT_PATH): | ||
spec = importlib.util.spec_from_file_location("inference", inference_script) | ||
inference = importlib.util.module_from_spec(spec) | ||
spec.loader.exec_module(inference) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why needs extra subdir model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is the directory structure which is expected when we create the endpoint. We might need to confirm with the hosting team regarding this directory structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is what the directory structure will look like on the platform once the files are downloaded to disk. Better not to confuse users since this is referring to the directory structure of the archive