Replies: 15 comments
-
Hi Antoine, Thanks for reaching out! I couldn't reproduce it with an arbitrary estimator. Would you please provide the exact sagemaker-python-sdk code that you used to run into this? |
Beta Was this translation helpful? Give feedback.
-
hi @knakad, version of sagemaker package = 1.42.1 My code looks like that:
for more info, i print the result of the function MXNetModel.prepare_container_def:
As expected there is mention to SAGEMAKER_MODEL_SERVER_WORKERS variable, because the model_server_workers parameter is None. i think this is the correct behaviour. My guess is that the problem is in the code that is loaded in the inference instance (sagemaker_inference_toolkit or sagemaker-mxnet-serving-container) that does not handle correctly the case when SAGEMAKER_MODEL_SERVER_WORKERS is None. I have printed the Environment variables from the inference instance
|
Beta Was this translation helpful? Give feedback.
-
I tried to reproduce this, and got this in my logs:
"Default workers per model: 4" does match the number of CPUs. edit: did some more digging - the inference toolkit does not have a defined default for the number of workers, and the underlying model server's default is the number of CPUs, so from a code standpoint, things look as though they should align with the documentation as well. |
Beta Was this translation helpful? Give feedback.
-
That is strange!! |
Beta Was this translation helpful? Give feedback.
-
i have deployed another time with:
why do my logs shows a number of CPUS equal to 1? i have seen that in your log you have
According https://aws.amazon.com/ec2/physicalcores/, c5.2xlarge has 4 Physical Core Count. is C5.2xlarge the same as ml.c5.2xlarge?? |
Beta Was this translation helpful? Give feedback.
-
actually when i print the number of CPU from my script deployed i have different results:
|
Beta Was this translation helpful? Give feedback.
-
In my previous response, I had been running batch transform jobs (I think with ml.m4.xlarge instances) because I happened to have those handy. I tried again, this time modifying this notebook for a basic endpoint deployment, and saw what you got:
I tried this with an ml.c5.2xlarge, ml.c5.xlarge, and ml.m4.2xlarge. Since I replicated the issue you're encountering through deploying to an endpoint but not when using batch transform, I'm going to reach out to the team that owns SageMaker Hosting and see if they have any insight. |
Beta Was this translation helpful? Give feedback.
-
HI @laurenyu, At least you have been able to reproduce it! ;-) Small question, do you know what image are you using with the batch transform? is is the same as 763104351884.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.4.1-cpu-py3? |
Beta Was this translation helpful? Give feedback.
-
I've passed along this issue link, so we'll keep updates here for now. For batch transform, I was using the same image, which is why I wonder if there is something happening with the hosting platform rather than the MXNet serving image itself. |
Beta Was this translation helpful? Give feedback.
-
I ran a simple java code on the sagemaker inference instance to get the number of CPUs; Runtime.getRuntime().availableProcessors()); , and it returns "1". This should be investigated from sagemaker side. |
Beta Was this translation helpful? Give feedback.
-
This happens because Runtime.getRuntime().availableProcessors()); returns 1 in a docker environment by default More details: |
Beta Was this translation helpful? Give feedback.
-
I have the same problem deploying models via aws cdk. Image: "763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:1.6.0-gpu-py3" Log:
@laurenyu Does any update about this issue? |
Beta Was this translation helpful? Give feedback.
-
I have the same problem as @romavlasov when deploying with ml.g4dn.xlarge. |
Beta Was this translation helpful? Give feedback.
-
@ldong87 Did you find any workaround? |
Beta Was this translation helpful? Give feedback.
-
@romavlasov it's possible they confuse vCPU and physical CPUs in virtualization. I tried with ml.g4dn.2xlarge and the default number of workers is ok I think. |
Beta Was this translation helpful? Give feedback.
-
Describe the bug
The documentation states that when i deploy a model with model_server_workers = None,
However what i found is when i deploy my model in a ml.c5.2xlarge (8 vCPU, one CPU i Guess), it only uses 1 worker (show logs below)
if i pass the parameters into the deploy function, it correctly set the Default workers per model to the number i have specified through the model_server_workers parameter.
As a conclusion, the documentation is not updated, or the behaviour when model_server_workers = NOne does not work.
To reproduce
Deploy any model on a ml.c5.2xlarge, check the log and the entry Default workers per model
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots or logs
This is an extract of the log from the endpoint:
System information
A description of your system. Please provide:
Additional context
Add any other context about the problem here.
Beta Was this translation helpful? Give feedback.
All reactions