Skip to content
This repository has been archived by the owner on May 23, 2024. It is now read-only.

IndexError: list index out of range #229

Open
Chen188 opened this issue Dec 5, 2022 · 0 comments
Open

IndexError: list index out of range #229

Chen188 opened this issue Dec 5, 2022 · 0 comments

Comments

@Chen188
Copy link

Chen188 commented Dec 5, 2022

Describe the bug
tensorflow serving throw following error while enable multiple TFS instances through SAGEMAKER_TFS_INSTANCE_COUNT env:

INFO:__main__:tensorflow version info:
TensorFlow ModelServer: 2.8.3-rc1+dev.sha.no_git
TensorFlow Library: 2.8.3
INFO:__main__:tensorflow serving command: tensorflow_model_server --port=9000 --rest_api_port=8501 --model_config_file=/sagemaker/model-config.cfg --max_num_load_retries=0    --per_process_gpu_memory_fraction=0.2667
INFO:__main__:started tensorflow serving (pid: 26)
Traceback (most recent call last):
  File "/sagemaker/serve.py", line 502, in <module>
    ServiceManager().start()
  File "/sagemaker/serve.py", line 483, in start
    self._start_tfs()
  File "/sagemaker/serve.py", line 326, in _start_tfs
    p = self._start_single_tfs(i)
  File "/sagemaker/serve.py", line 420, in _start_single_tfs
    self._tfs_grpc_ports[instance_id],
IndexError: list index out of range

To reproduce

from sagemaker.tensorflow.serving import TensorFlowModel

model_local_batch = TensorFlowModel(
    source_dir='sm-code-pb', entry_point='inference.py',
    model_data=model_data,
    role=role,
    framework_version='2.8',
    env = {
        'SAGEMAKER_TFS_INSTANCE_COUNT': '3',         # number of TFS instances, 3 is good for 16G GPU mem
    }
)
instance_type = 'local_gpu' # 'local' for CPU instance

predictor_local_batch = model_local_batch.deploy(initial_instance_count=1, instance_type=instance_type)

if SAGEMAKER_SAFE_PORT_RANGE is also passed into env, issue solved.

Expected behavior
Enable multiple TFS instances without passing SAGEMAKER_SAFE_PORT_RANGE manually.

Screenshots or logs
If applicable, add screenshots or logs to help explain your problem.

System information
A description of your system. Please provide:

  • DLC 727897471807.dkr.ecr.cn-north-1.amazonaws.com.cn/tensorflow-inference:2.8-gpu
  • Custom Docker image (Y/N): N

Additional context
Add any other context about the problem here.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant