Skip to content
This repository has been archived by the owner on May 23, 2024. It is now read-only.

Feature: Support multiple inference.py files and universal inference.… #228

Closed
wants to merge 12 commits into from
Closed

Feature: Support multiple inference.py files and universal inference.… #228

wants to merge 12 commits into from

Conversation

sachanub
Copy link

@sachanub sachanub commented Aug 15, 2022

Design doc: https://quip-amazon.com/biizAu4KYIuP/Multi-Model-Endpoint-Model-Specific-Inference-Files

Issue #, if available: https://t.corp.amazon.com/D30094580

Description of changes:

  1. Added support for multiple inference.py files if Sagemaker multi model mode is enabled.
  2. Modified python_service.py and serve.py to use universal inference.py as default and multiple inference.py files if available.
  3. Universal requirements.txt file is used.
  4. Changed directory structure of universal inference.py file.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: 9110754
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: b19568b
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: 1a5d9e4
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: fa6d180
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: d99b720
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: 77378c6
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: 96678b6
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: d2361b7
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: 8860ba2
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: d50072a
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: 3ef74b0
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: 31a6f1f
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: dfe1bee
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: 5ba8af2
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: d3f7cea
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: aba0963
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: 919de5c
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: f9a736f
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: 5e4ffa9
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: d046d40
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: bd336a8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: 7334fb8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: 4647b73
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: 008c2b0
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: f171de9
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: a456ba9
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: a7e9024
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: 54db583
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: 7b4e572
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository


@pytest.mark.skip_gpu
def test_specific_versions():
MODEL_NAME = MODEL_NAMES[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we test model0 only?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model0 is half_plus_three and Model1 is half_plus_two .We test only model0 in this one because half_plus_three has two versions available in the repo but I found only one version for half_plus_two in the repo.

Comment on lines 312 to 318
def _get_number_of_gpu_on_host(self):
nvidia_smi_exist = os.path.exists("/usr/bin/nvidia-smi")
if nvidia_smi_exist:
return len(subprocess.check_output(['nvidia-smi', '-L'])
.decode('utf-8').strip().split('\n'))

return 0

Copy link
Contributor

@waytrue17 waytrue17 Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not relevant. Could you try to pull from the latest commit?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed them from the recent commit.

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: 32d6b9a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: 8048712
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: ab13c3f
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: 4d9b478
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: 275c8d9
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

…py file along with universal requirements.txt file
@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: 116fb22
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: 5cde4a4
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: 46328d1
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: 195239d
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-tensorflow-serving-container-pr
  • Commit ID: 002292a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

README.md Outdated
2. If you are working in a network-isolation situation or if you don't want to install dependencies at runtime everytime your Endpoint starts or Batch Transform job runs, you may want to put pre-downloaded dependencies under `code/lib` directory in your model archive, the container will then add the modules to the Python path. Note that if both `code/lib` and `code/requirements.txt` are present in the model archive, the `requirements.txt` will be ignored.

Your untarred model directory structure may look like this if you are using `requirements.txt`:

model1
/opt/ml/models/model1/model
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why needs extra subdir model?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is the directory structure which is expected when we create the endpoint. We might need to confirm with the hosting team regarding this directory structure.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is what the directory structure will look like on the platform once the files are downloaded to disk. Better not to confuse users since this is referring to the directory structure of the archive

@@ -687,7 +687,20 @@ Multi-Model Endpoint can be used together with Pre/Post-Processing. Each model w
|--lib
|--external_module
|--inference.py
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to provide model specific inference.py via SM SDK MME? Can you provide add notebook in SM examples?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can create a notebook for SM examples which will demonstrate the usage of model-specific inference.py files.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the directory structure

@@ -26,8 +27,8 @@
import tfs_utils

SAGEMAKER_MULTI_MODEL_ENABLED = os.environ.get("SAGEMAKER_MULTI_MODEL", "false").lower() == "true"
MODEL_DIR = "models" if SAGEMAKER_MULTI_MODEL_ENABLED else "model"
INFERENCE_SCRIPT_PATH = f"/opt/ml/{MODEL_DIR}/code/inference.py"
MODEL_DIR = "" if SAGEMAKER_MULTI_MODEL_ENABLED else "model/"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why needs to change the dir structures?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this path is that /opt/ml/models is a read-only file system for multi-model endpoints. Assuming the structure as shown below in an s3 bucket, the universal inference.py can’t be downloaded to /opt/ml/models/code/inference.py.

image

This problem has been highlighted in other Github issues (#212 and #211) as well and a PR (#215) was created to solve the issue but it was not merged.

We can use the path /opt/ml/code/inference.py for the universal inference.py file in the case of a multi-model endpoint.

README.md Outdated
2. If you are working in a network-isolation situation or if you don't want to install dependencies at runtime everytime your Endpoint starts or Batch Transform job runs, you may want to put pre-downloaded dependencies under `code/lib` directory in your model archive, the container will then add the modules to the Python path. Note that if both `code/lib` and `code/requirements.txt` are present in the model archive, the `requirements.txt` will be ignored.

Your untarred model directory structure may look like this if you are using `requirements.txt`:

model1
/opt/ml/models/model1/model
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is what the directory structure will look like on the platform once the files are downloaded to disk. Better not to confuse users since this is referring to the directory structure of the archive

@@ -687,7 +687,20 @@ Multi-Model Endpoint can be used together with Pre/Post-Processing. Each model w
|--lib
|--external_module
|--inference.py
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

README.md Outdated

/opt/ml/models/model1/model
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, remove /opt/ml/models

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@sachanub sachanub closed this Feb 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants