Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: Implement integration tests covering JumpStart PrivateHub workflows #4883

Merged
merged 17 commits into from
Oct 8, 2024

Conversation

malav-shastri
Copy link
Collaborator

@malav-shastri malav-shastri commented Oct 3, 2024

Issue #, if available:

Description of changes:
Adding 6 new integ test cases for PrivateHub functionalities/workflows

Added tests:
test_private_hub
test_hub_model_reference
test_jumpstart_hub_model
test_jumpstart_hub_gated_model
test_jumpstart_gated_model_inference_component_enabled
test_instatiating_model

Testing done:

  • all the newly added tests are passing
  • integ tests
  • unit tests
  • black -l 100 .
  • flake 8

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the CONTRIBUTING doc
  • I certify that the changes I am introducing will be backward compatible, and I have discussed concerns about this, if any, with the Python SDK team
  • I used the commit message format described in CONTRIBUTING
  • I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
  • I have checked that my tests are not configured for a specific region or account (if appropriate)
  • I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@malav-shastri malav-shastri requested a review from a team as a code owner October 3, 2024 07:34
@malav-shastri malav-shastri changed the title Ch integ tests tests: Implement integration tests covering JumpStart PrivateHub workflows Oct 3, 2024
Copy link
Collaborator

@Captainia Captainia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some errors in integ test hook Hub with name PySDK-HubTest-7d6019c6-e043-484e-83c4-fa08706ec4f2 does not exist., could be the tests are ran in parallel

sagemaker_session.delete_hub(hub["HubName"])


def _delete_hub_contents(sagemaker_session, test_hub_name):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this deletes only the model references (not models or other contents), recommend to take in a optional arg of which content type to delete

JUMPSTART_LOGGER.info("starting test")
JUMPSTART_LOGGER.info(f"get identity {get_sm_session().get_caller_identity_arn()}")

model_id = "catboost-classification-model"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make TEST_MODEL_IDS a enum? so you don't need to repeat

# Createhub
create_hub_response = hub_instance.create(
description="This is a Test Private Hub.",
display_name="malavhs Test hub",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not commit this to mainline

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

)
models.extend(response["hub_content_summaries"])

return models[0]["hub_content_arn"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only care about the first model arn, do we need to paginate all the responses?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right we don't need to paginate here, infact this function only returns a single model because I am filtering on the exact model id. But the output is list so I am accessing the first element.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this to the unit tests? integ tests are only for when we need to create new resources.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evakravi sorry I didn't get that. this is the utility function to get the Public hub content arn which in turn we use for creating the model reference in the Hub, why do we need to move it to unit tests?

list_hub_response = sagemaker_session.list_hubs(name_contains=HUB_NAME_PREFIX)

for hub in list_hub_response["HubSummaries"]:
if hub["HubName"] != SM_JUMPSTART_PUBLIC_HUB_NAME:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we don't delete public hub, should we also restrict to only delete the hubs that are associated with a pysdk integ test? This can delete the hubs we don't want to in the account that runs the integ test

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am already filtering the hubs related only to PySDK integ test on line 142
sagemaker_session.list_hubs(name_contains=HUB_NAME_PREFIX)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurrent runs of the test will cause an issue with this clean-up strategy. Should it clean-up the hub created by this specific test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i actually had to change it later on, I have updated the commit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the suite id unique to each run? That would prevent overlapping test sessions on the same account/region from interfering with each other.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evakravi yes but I wasn't deleting the unique Hub created by a specific Pytest session. I was rather deleting all the hubs created by PySDK test regardless of whether it was created by a specific pytest session or not. PySDK runs this command to run integ tests pytest tests/integ -m 'not local_mode and not cron and not slow_test' -n 240 which means there are 240 parallel workers and each would corresponds to a separate pytest session. Because I am not cleaning up specific test session created hub, this will delete the ones which are being used by other pytest sessions and tests in parallel.

Copy link
Contributor

@AWS-pratab AWS-pratab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add tests for the fine-tuning/training paths as well.

list_hub_response = sagemaker_session.list_hubs(name_contains=HUB_NAME_PREFIX)

for hub in list_hub_response["HubSummaries"]:
if hub["HubName"] != SM_JUMPSTART_PUBLIC_HUB_NAME:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurrent runs of the test will cause an issue with this clean-up strategy. Should it clean-up the hub created by this specific test?

JUMPSTART_TAG = "JumpStart-SDK-Integ-Test-Suite-Id"

SM_JUMPSTART_PUBLIC_HUB_NAME = "SageMakerPublicHub"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Likely this is already defined somewhere in src

Comment on lines +41 to +47
TEST_MODEL_IDS = {
"catboost-classification-model",
"huggingface-txt2img-conflictx-complex-lineart",
"meta-textgeneration-llama-2-7b",
"meta-textgeneration-llama-3-2-1b",
"catboost-regression-model",
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the integ test runs in PDX. Double check these are available in PDX region.

Copy link
Collaborator Author

@malav-shastri malav-shastri Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should be, I have chosen these models from the existing Jumpstart hub integ tests

model_id=model_id,
role=get_sm_session().get_caller_identity_arn(),
sagemaker_session=get_sm_session(),
hub_name=os.environ[ENV_VAR_JUMPSTART_SDK_TEST_HUB_NAME],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test via the HubArn path as well, since we noticed an issue around that once.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry didn't get it, this is the HubArn path right? we ask customers to provide hub_name in the jumpstart model parameter, but we convert it into HubArn right after it. Sure customers can provide arn directly to model class but in that case we just leave it as it is and that gets passed to the rest of the code.

)

# uses ml.m5.4xlarge instance
model.deploy(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should assert the success status of the endpoint, by adding a wait step to poll on status.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, we may even want to consider sending it one request using the default payload, although that's optional.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have missed that default payload request here. Similar to other inference tests let me add it here

@malav-shastri
Copy link
Collaborator Author

We should add tests for the fine-tuning/training paths as well.

I have purposely skipped it because training isn't supported for model references. And training for jumpstart model is already being covered in the test suit.

@@ -1036,13 +1036,15 @@ def _get_deployment_configs(
image_uri=image_uri,
region=self.region,
model_version=self.model_version,
hub_arn=self.hub_arn,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a unit test for this? seems like the current coverage didn't cover this bug

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

synced with @evakravi offline, this needs to be covered through unit tests and I'll add it as a fast follow.

)

from sagemaker.jumpstart.constants import JUMPSTART_DEFAULT_REGION_NAME


def _setup():
print("Setting up...")
os.environ.update({ENV_VAR_JUMPSTART_SDK_TEST_SUITE_ID: get_test_suite_id()})
test_suit_id = get_test_suite_id()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: test_suite_id

hub = Hub(
hub_name=os.environ[ENV_VAR_JUMPSTART_SDK_TEST_HUB_NAME], sagemaker_session=get_sm_session()
)
hub.create(description=test_hub_description)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we necessarily create a Hub every time a JS integ test is run? Does this bring any problems?

Copy link
Collaborator Author

@malav-shastri malav-shastri Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think of any problems tbh with this strategy, we're cleaning it up in the end. do you think we should be approaching it differently?

def _delete_hubs(sagemaker_session):
# list Hubs created by PySDK integration tests
list_hub_response = sagemaker_session.list_hubs(
name_contains=os.environ[ENV_VAR_JUMPSTART_SDK_TEST_HUB_NAME]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we create a utility to get the hub name from the env var?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to confirm you mean a function to get os.environ[ENV_VAR_JUMPSTART_SDK_TEST_HUB_NAME] ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: would there ever be several hubs here?
The _setup() method seems to create only one hub, and since the name is set there can only be one, can it not?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JGuinegagne yeah there'll be only one, I changed this recently previously I was deleting all the hubs starting with a specific prefix but that's messing up with the concurrent pytest executions. Let me no use list_hubs here, thanks.

assert model.inference_component_name == predictor.component_name


def test_instatiating_model(setup, add_models):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if we can run all the tests currently for JumpStart but for PrivateHub.

RIsk is that this test coverage drifts from the non-private-hub tests as new models/features are added.

Copy link
Collaborator Author

@malav-shastri malav-shastri Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can run all the tests currently for JumpStart but for PrivateHub.

I am not sure, are you thinking about reusing the test code of Jumpstart public hub specific tests and integrate PrivateHub workflow triggers there? won't it be too complicated? keeping separate increases the readability? also not all features available for content type Model is available for content type ModelReferences?

RIsk is that this test coverage drifts from the non-private-hub tests as new models/features are added.

sorry I can't understand what you mean by this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Evan is suggesting defining a single set of model tests, and systematically test them against the public hub and a private hub. That would be substantial rework from your current PR though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix typo please: test_instantiating_model

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Evan is suggesting defining a single set of model tests, and systematically test them against the public hub and a private hub. That would be substantial rework from your current PR though.

+1 That would require more efforts and rework. I can take it as an improvement but my question is I want to understand what's the benefit of it? is it better design or just a personal preference to implement it?

def _delete_hubs(sagemaker_session):
# list Hubs created by PySDK integration tests
list_hub_response = sagemaker_session.list_hubs(
name_contains=os.environ[ENV_VAR_JUMPSTART_SDK_TEST_HUB_NAME]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: would there ever be several hubs here?
The _setup() method seems to create only one hub, and since the name is set there can only be one, can it not?

if hub["HubName"] != SM_JUMPSTART_PUBLIC_HUB_NAME:
# delete all hub contents first
_delete_hub_contents(sagemaker_session, hub["HubName"])
sagemaker_session.delete_hub(hub["HubName"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

careful, if there are more than one hub to delete, you might get throttled by the TPS limit of 1.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there'll be only 1 hub at a time

hub_name=test_hub_name,
hub_content_type=HubContentType.MODEL_REFERENCE.value,
hub_content_name=models["HubContentName"],
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

careful with throttling here.



@pytest.fixture(scope="session")
def add_models():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider renaming add_model_references

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, thanks

)

# uses ml.m5.4xlarge instance
model.deploy(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, we may even want to consider sending it one request using the default payload, although that's optional.

assert model.inference_component_name == predictor.component_name


def test_instatiating_model(setup, add_models):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Evan is suggesting defining a single set of model tests, and systematically test them against the public hub and a private hub. That would be substantial rework from your current PR though.

assert model.inference_component_name == predictor.component_name


def test_instatiating_model(setup, add_models):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix typo please: test_instantiating_model

hub_name=os.environ[ENV_VAR_JUMPSTART_SDK_TEST_HUB_NAME], sagemaker_session=get_sm_session()
)

# Create Model Reference
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unnecessary comment

# Describe Model
describe_model_response = hub_instance.describe_model(model_name=model_id)
assert describe_model_response is not None
assert type(describe_model_response) == DescribeHubContentResponse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: consider asserting that

  • the HubContentName corresponds to the model_id
  • the HubContentVersion corresponds to the latest version in the public hub
  • the HubContentType corresponds is a ModelReference

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, could you address this if you have time? I think what we are currently checking is pretty shallow, no harm to assert these in the tests.

@@ -115,6 +116,20 @@ def download_file(local_download_path, s3_bucket, s3_key, s3_client) -> None:
s3_client.download_file(s3_bucket, s3_key, local_download_path)


def get_public_hub_model_arn(hub: Hub, model_id: str) -> str:
filter_value = f"model_id == {model_id}"
response = hub.list_sagemaker_public_hub_models(filter=filter_value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: why not use a describe_ method?

Or do you intend to test discovery through list? In that case, please rename the method accordingly.

Copy link
Collaborator Author

@malav-shastri malav-shastri Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a util function for these tests not a test itself. use of describe method would get us a public hub content arn which would consist model version in it at the end. Create Model Reference don't accept arn with model version in it. On the other side we have implemented this list method in a way that it gets us model name and a public hub arn which can be accepted by create_model_reference api call.

# Describe Model
describe_model_response = hub_instance.describe_model(model_name=model_id)
assert describe_model_response is not None
assert type(describe_model_response) == DescribeHubContentResponse
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, could you address this if you have time? I think what we are currently checking is pretty shallow, no harm to assert these in the tests.

@pintaoz-aws pintaoz-aws merged commit d18e41b into aws:master Oct 8, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants