Correct use of model_server_workers #2379

anotinelg · 2020-01-30T11:56:15Z

anotinelg
Jan 30, 2020

Describe the bug
The documentation states that when i deploy a model with model_server_workers = None,

model_server_workers (int) – Optional. The number of worker processes used by the inference server. If None, server will use one worker per vCPU.

However what i found is when i deploy my model in a ml.c5.2xlarge (8 vCPU, one CPU i Guess), it only uses 1 worker (show logs below)

if i pass the parameters into the deploy function, it correctly set the Default workers per model to the number i have specified through the model_server_workers parameter.
As a conclusion, the documentation is not updated, or the behaviour when model_server_workers = NOne does not work.

To reproduce
Deploy any model on a ml.c5.2xlarge, check the log and the entry Default workers per model

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots or logs

This is an extract of the log from the endpoint:

**Number of CPUs: 1**
Max heap size: 3739 M
Python executable: /usr/local/bin/python3.6
Config file: /etc/sagemaker-mms.properties
Inference address: http://0.0.0.0:8080
Management address: http://127.0.0.1:8081
Model Store: /.sagemaker/mms/models
Initial Models: ALL
Log dir: /logs
Metrics dir: /logs
Netty threads: 0
Netty client threads: 0
**Default workers per model: 1**

System information
A description of your system. Please provide:

SageMaker Python SDK version: '1.42.1'
Framework name (eg. PyTorch) or algorithm (eg. KMeans): custom script on MXNET 1.4.1
Framework version: 1.4.1
Python version: 3
CPU or GPU: CPU
Custom Docker image (Y/N): AWS docker

Additional context
Add any other context about the problem here.

knakad · 2020-01-30T21:50:40Z

knakad
Jan 30, 2020

Hi Antoine,

Thanks for reaching out!

I couldn't reproduce it with an arbitrary estimator. Would you please provide the exact sagemaker-python-sdk code that you used to run into this?

0 replies

anotinelg · 2020-02-12T10:40:23Z

anotinelg
Feb 12, 2020
Author

hi @knakad,
I am using MXNetModel object to deploy, in a standard way. i think it does not depend on the script, any dummy one will do the job.

version of sagemaker package = 1.42.1

My code looks like that:

from sagemaker.mxnet.model import MXNetModel
args ={'model_data': 's3://../model.tar.gz',
           'name': 'mymodel',
           'role': 'arn:aws:iam::...',
           'entry_point': '.anyscript_will_do_the_job',
           'dependencies': [],
            'framework_version': '1.4.1',
              'py_version': 'py3',
         'code_location': 's3://..',
         'vpc_config': None,
        'sagemaker_session': None, 
        'model_server_workers': None}

net = MXNetModel(**args)
net.deploy(instance_type='ml.c5.xlarge', initial_instance_count=1)

for more info, i print the result of the function MXNetModel.prepare_container_def:

{'Image': 'CCCCCCCC.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.4.1-cpu-py3', 'Environment': {'SAGEMAKER_PROGRAM': 'inference_script.py', 'SAGEMAKER_SUBMIT_DIRECTORY': 's3://.../model.tar.gz', 'SAGEMAKER_ENABLE_CLOUDWATCH_METRICS': 'false', 'SAGEMAKER_CONTAINER_LOG_LEVEL': '20', 'SAGEMAKER_REGION': 'us-east-1'}, 'ModelDataUrl': 's3://.../model.tar.gz'}

As expected there is mention to SAGEMAKER_MODEL_SERVER_WORKERS variable, because the model_server_workers parameter is None. i think this is the correct behaviour. My guess is that the problem is in the code that is loaded in the inference instance (sagemaker_inference_toolkit or sagemaker-mxnet-serving-container) that does not handle correctly the case when SAGEMAKER_MODEL_SERVER_WORKERS is None.

I have printed the Environment variables from the inference instance

2020-02-11 10:34:20,782 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - environ({'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI': '/v2/credentials/301670ea-2a02-4517-a501-3dd3b5c6a4c4', 'SAGEMAKER_ENABLE_CLOUDWATCH_METRICS': 'false', 'PATH': '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin', 'AWS_DEFAULT_REGION': 'us-east-1', 'AWS_EXECUTION_ENV': 'AWS_ECS_EC2', 'ECS_CONTAINER_METADATA_URI': 'http://169.254.170.2/v3/0f7fade6-e2c8-42ef-a2cb-144678ab33f3', 'MXNET_KVSTORE_REDUCTION_NTHREADS': '1', 'LANG': 'C.UTF-8', 'SAGEMAKER_CONTAINER_LOG_LEVEL': '20', 'LD_LIBRARY_PATH': ':/usr/local/lib', 'SAGEMAKER_SUBMIT_DIRECTORY': 's3://letgo-data-science-data/sagemaker/listingclass/models/dev-listingclass-v100-0-0-2020-02-11-11-29-28/model.tar.gz', 'PYTHONPATH': '/opt/ml/model/code::/.sagemaker/mms/models/model', '__KMP_REGISTERED_LIB_1': '0x7fddcff64a08-cafe1df8-libiomp5.so', 'AWS_REGION': 'us-east-1', 'PYTHONIOENCODING': 'UTF-8', 'SAGEMAKER_REGION': 'us-east-1', 'OMP_NUM_THREADS': '1', 'PYTHONDONTWRITEBYTECODE': '1', 'MXNET_CPU_PRIORITY_NTHREADS': '1', 'MXNET_CPU_WORKER_NTHREADS': '1', 'TEMP': '/home/model-server/tmp', 'PYTHONUNBUFFERED': '1', 'SAGEMAKER_SAFE_PORT_RANGE': '25000-25999', 'HOSTNAME': 'model.aws.local', 'LC_ALL': 'C.UTF-8', 'HOME': '/root', 'SAGEMAKER_PROGRAM': 'inference_script.py', 'MMS_DECODE_INPUT_REQUEST': 'false', 'MXNET_USE_OPERATOR_TUNING': '0'})

0 replies

laurenyu · 2020-02-14T00:33:17Z

laurenyu
Feb 14, 2020

I tried to reproduce this, and got this in my logs:

2020-02-14 00:05:38,498 [INFO ] main com.amazonaws.ml.mms.ModelServer - 
MMS Home: /usr/local/lib/python3.6/site-packages
Current directory: /
Temp directory: /home/model-server/tmp
Number of GPUs: 0
Number of CPUs: 4
Max heap size: 2766 M
Python executable: /usr/local/bin/python3.6
Config file: /etc/sagemaker-mms.properties
Inference address: http://0.0.0.0:8080
Management address: http://127.0.0.1:8081
Model Store: /.sagemaker/mms/models
Initial Models: ALL
Log dir: /logs
Metrics dir: /logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 4
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500

"Default workers per model: 4" does match the number of CPUs.

edit: did some more digging - the inference toolkit does not have a defined default for the number of workers, and the underlying model server's default is the number of CPUs, so from a code standpoint, things look as though they should align with the documentation as well.

0 replies

anotinelg · 2020-02-14T11:29:33Z

anotinelg
Feb 14, 2020
Author

That is strange!!
What version of SDK sagemaker are you using? the 1.42.1 ? Can you i test your code from my side?

0 replies

anotinelg · 2020-02-14T14:11:35Z

anotinelg
Feb 14, 2020
Author

i have deployed another time with:

image 763104351884.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.4.1-cpu-py3
ml.c5.2xlarge
instance = 1
Single model type

why do my logs shows a number of CPUS equal to 1? i have seen that in your log you have Number of CPUs: 4

2020-02-14 11:36:09,267 [INFO ] main com.amazonaws.ml.mms.ModelServer - 
MMS Home: /usr/local/lib/python3.6/site-packages
Current directory: /
Temp directory: /home/model-server/tmp
Number of GPUs: 0
Number of CPUs: 1
Max heap size: 3739 M
Python executable: /usr/local/bin/python3.6
Config file: /etc/sagemaker-mms.properties
Inference address: http://0.0.0.0:8080
Management address: http://127.0.0.1:8081
Model Store: /.sagemaker/mms/models
Initial Models: ALL
Log dir: /logs
Metrics dir: /logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500

According https://aws.amazon.com/ec2/physicalcores/, c5.2xlarge has 4 Physical Core Count. is C5.2xlarge the same as ml.c5.2xlarge??

0 replies

anotinelg · 2020-02-14T15:03:52Z

anotinelg
Feb 14, 2020
Author

actually when i print the number of CPU from my script deployed i have different results:

logging.warning(f"CPU COUNT: {multiprocessing.cpu_count()}")
logging.warning(f"CPU COUNT (os): {os.cpu_count()}")

2020-02-14 14:50:40,497 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - CPU COUNT: 8
2020-02-14 14:50:40,497 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - CPU COUNT (os): 8

0 replies

laurenyu · 2020-02-14T23:03:40Z

laurenyu
Feb 14, 2020

In my previous response, I had been running batch transform jobs (I think with ml.m4.xlarge instances) because I happened to have those handy. I tried again, this time modifying this notebook for a basic endpoint deployment, and saw what you got:

[INFO ] main com.amazonaws.ml.mms.ModelServer - 
MMS Home: /usr/local/lib/python3.6/site-packages
Current directory: /
Temp directory: /home/model-server/tmp
Number of GPUs: 0
Number of CPUs: 1
Max heap size: 3739 M
Python executable: /usr/local/bin/python3.6
Config file: /etc/sagemaker-mms.properties
Inference address: http://0.0.0.0:8080
Management address: http://127.0.0.1:8081
Model Store: /.sagemaker/mms/models
Initial Models: ALL
Log dir: /logs
Metrics dir: /logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500

I tried this with an ml.c5.2xlarge, ml.c5.xlarge, and ml.m4.2xlarge.

Since I replicated the issue you're encountering through deploying to an endpoint but not when using batch transform, I'm going to reach out to the team that owns SageMaker Hosting and see if they have any insight.

0 replies

anotinelg · 2020-02-17T09:46:30Z

anotinelg
Feb 17, 2020
Author

HI @laurenyu, At least you have been able to reproduce it! ;-)
Will you update the issue in that ticket or on another one created in https://github.com/aws/sagemaker-mxnet-serving-container?

Small question, do you know what image are you using with the batch transform? is is the same as 763104351884.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.4.1-cpu-py3?

0 replies

laurenyu · 2020-02-17T17:37:23Z

laurenyu
Feb 17, 2020

I've passed along this issue link, so we'll keep updates here for now.

For batch transform, I was using the same image, which is why I wonder if there is something happening with the hosting platform rather than the MXNet serving image itself.

0 replies

muhyun · 2020-03-30T16:58:08Z

muhyun
Mar 30, 2020

I ran a simple java code on the sagemaker inference instance to get the number of CPUs;

Runtime.getRuntime().availableProcessors());

, and it returns "1". This should be investigated from sagemaker side.

0 replies

dotgc · 2020-05-14T17:08:44Z

dotgc
May 14, 2020

This happens because Runtime.getRuntime().availableProcessors()); returns 1 in a docker environment by default

More details:

0 replies

romavlasov · 2021-01-13T16:54:42Z

romavlasov
Jan 13, 2021

I have the same problem deploying models via aws cdk.

Image: "763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:1.6.0-gpu-py3"
Instance: "ml.g4dn.xlarge" (number of vCPU: 4)

Log:

main org.pytorch.serve.ModelServer
Torchserve version: 0.2.1
TS Home: /opt/conda/lib/python3.6/site-packages
Current directory: /
Temp directory: /home/model-server/tmp
Number of GPUs: 1
Number of CPUs: 1
Max heap size: 3806 M
Python executable: /opt/conda/bin/python
Config file: /etc/sagemaker-ts.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8080
Metrics address: http://127.0.0.1:8082
Model Store: /.sagemaker/ts/models

@laurenyu Does any update about this issue?

0 replies

ldong87 · 2021-02-05T04:55:02Z

ldong87
Feb 5, 2021

I have the same problem as @romavlasov when deploying with ml.g4dn.xlarge.

0 replies

romavlasov · 2021-03-23T09:52:13Z

romavlasov
Mar 23, 2021

@ldong87 Did you find any workaround?

0 replies

ldong87 · 2021-03-26T15:54:58Z

ldong87
Mar 26, 2021

@romavlasov it's possible they confuse vCPU and physical CPUs in virtualization. I tried with ml.g4dn.2xlarge and the default number of workers is ok I think.
In the meantime, setting the model_server_workers in endpoint deployment code resolves the problem for my pytorch code in ml.g4dn.xlarge.

0 replies

iwa-vtl · 2025-03-18T13:58:02Z

iwa-vtl
Mar 18, 2025

Hi was this ever resolved? I have 36 models that I want to deploy MMS real-time endpoints for... I am using four instances of m5.4xlarge which has 16 vCPU... what should i set the model_server_workers to to make sure i am utilizing all the resources optimally?

0 replies

Correct use of model_server_workers #2379

Uh oh!

Replies: 16 comments

Uh oh!

Uh oh!

anotinelg Feb 12, 2020 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anotinelg Feb 14, 2020 Author

Uh oh!

anotinelg Feb 14, 2020 Author

Uh oh!

anotinelg Feb 14, 2020 Author

Uh oh!

Uh oh!

anotinelg Feb 17, 2020 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anotinelg
Feb 12, 2020
Author

anotinelg
Feb 14, 2020
Author

anotinelg
Feb 14, 2020
Author

anotinelg
Feb 14, 2020
Author

anotinelg
Feb 17, 2020
Author