Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create endpoint with InferenceAmiVersion #602

Merged
merged 3 commits into from
Nov 21, 2024
Merged

Conversation

CoderHam
Copy link
Contributor

@CoderHam CoderHam commented Nov 20, 2024

This PR introduces several changes to the client.py file, primarily focusing on the create_endpoint function. The modifications aim to enhance the deployment process for both fine-tuned and pre-trained models, ensuring compatibility with various Python versions and providing a more robust error-handling mechanism.

Summary of Changes:

  • Import Updates: The import statements have been streamlined, removing redundant imports from the typing module.
  • create_endpoint Function:
    • A new variable useBoto is introduced to determine the deployment method based on the presence of s3_models_dir.
    • The function now includes a try-except block to handle potential errors when deleting the model.
    • The logic for setting the role variable has been improved, considering the useBoto flag.
    • The model deployment process has been updated to handle validation parameter issues in Python 3.6.
    • For pre-trained models, the PR adds a new deployment method using boto to set the InferenceAmiVersion.
  • S3 URL Parsing: The PR replaces the parse_s3_url function with lazy_sagemaker().s3.parse_s3_url for parsing S3 URLs, ensuring consistency and compatibility.

Detailed Changes:

  • Import Updates:
    • Removed: TokenLikelihood from the generation module.
    • Removed: Tuple from the typing module.
  • create_endpoint Function:
    • Added: useBoto variable to determine the deployment method.
    • Added: try-except block to handle potential errors when deleting the model.
    • Modified: The logic for setting the role variable now considers the useBoto flag.
    • Modified: The model deployment process now handles validation parameter issues in Python 3.6.
    • Added: A new deployment method for pre-trained models using boto to set the InferenceAmiVersion.
  • S3 URL Parsing:
    • Replaced: parse_s3_url with lazy_sagemaker().s3.parse_s3_url for parsing S3 URLs.

@CoderHam CoderHam marked this pull request as ready for review November 20, 2024 17:33
@CoderHam CoderHam merged commit 756515a into main Nov 21, 2024
4 checks passed
@CoderHam CoderHam deleted the hemant-aws-sagemaker branch November 21, 2024 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants