Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for autotune limits in service_shard_update #3830

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

karmeleon
Copy link

Added to PaaSTA at large in #3744. Also pulls in commit from #3829 to make min/max instance values optional and adds a new --dry-run param that skips committing and pushing changes to yelpsoa-configs.

Confirmed that the new kwargs do as expected, generating valid configs. If anyone knows a way to do this that doesn't involve a mess of nested if statements, I'm all ears lol

Comment on lines -197 to -200
"min_instances": args.min_instance_count,
"max_instances": args.prod_max_instance_count
if deploy_prefix == "prod"
else args.non_prod_max_instance_count,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nemacysts (from #3829) how would you suggest testing paasta's behavior with no max/min instances?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karmeleon sorry, yesterday was a busy day!

i don't think we've setup the local paasta playground for autoscaling, but we should be able to create an instance in infrastage (or any test cluster) and see what happens

lemme do that real fast, actually :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops - replied in the wrong place: #3829 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i.e., we'd need to tweak paasta to set a default min/max instances if one is not set explicitly or by autotune)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PAASTA-18283

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for checking! to clarify, we're totally okay with min/max instances both being at 1 for brand new shards, since they typically see traffic ramp up gradually enough for autotune to deal with it. do you think it's safe to ship this branch now, or should we wait for that ticket to be completed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm trying to verify that autotune will do the "right" thing and setup min/max instances - but when i ran the rightsizer scripts locally it didn't seem to...that said, i'm not sure if it's cause i need to wait for splunk to have more data

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no worries, think we're gonna go in and manually update live configs to add autotune min/maxes to stabilize things a bit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, sorry - forgot about this: i think i hallucinated that autotune would fixup the autoscaling config: my test instance has not had autotune add min/max instances correctly :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i.e., we'll either want to revert the {min,max}_instances changes here or tackle PAASTA-18283)

Copy link
Member

@nemacysts nemacysts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally lgtm - we'll probably want to back out the min/max instance changes + fix the autotune limits typo, but otherwise i don't see any concerns here

Comment on lines -197 to -200
"min_instances": args.min_instance_count,
"max_instances": args.prod_max_instance_count
if deploy_prefix == "prod"
else args.non_prod_max_instance_count,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i.e., we'll either want to revert the {min,max}_instances changes here or tackle PAASTA-18283)

Comment on lines +42 to +48
parser.add_argument(
"-d",
"--dry-run",
help="Do not commit changes to git",
action="store_true",
dest="dry_run",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

(i'm also a fan of --for-real flags so that dry-run is the default, but that would be a little more annoying to rollout here)

args.autotune_max_disk,
)
):
instance_config["autotune"] = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
instance_config["autotune"] = {}
instance_config["autotune_limits"] = {}

is the key we want instead


might also be nice to do something like:

limit_config = instance_config["autotune_limits"]
...
limit_config["cpus"]["min"] = args.autotune_min_cpus
...

so that we don't need to repeat instance_config["autotune_limits"] so many times :)

Copy link
Member

@nemacysts nemacysts Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think maybe we could trim this code down by doing something like:

limit_config["cpus"] = {
    "min": args.autotune_min_cpus,
    "max": args.autotune_max_cpus
}
limit_config["mem"] = {
    "min": args.autotune_min_mem,
    "max": args.autotune_max_mem
}
limit_config["disk"] = {
    "min": args.autotune_min_disk,
    "max": args.autotune_max_disk
}

and then adding a processing step that removes any min/max keys with None values - and any resource (e.g., cpus, mem, disk) keys with no subkeys

(that said, maybe the code to do that processing step wouldn't look too different and doing this would be a waste of time :p)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants