Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for autotune limits in service_shard_update #3830

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 118 additions & 8 deletions paasta_tools/contrib/service_shard_update.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,13 @@ def parse_args():
action="store_true",
dest="verbose",
)
parser.add_argument(
"-d",
"--dry-run",
help="Do not commit changes to git",
action="store_true",
dest="dry_run",
)
Comment on lines +42 to +48
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

(i'm also a fan of --for-real flags so that dry-run is the default, but that would be a little more annoying to rollout here)

parser.add_argument(
"--source-id",
help="String to attribute the changes in the commit message.",
Expand All @@ -56,22 +63,19 @@ def parse_args():
"--min-instance-count",
help="If a deploy group is added, the min_instance count to create it with",
required=False,
default=1,
dest="min_instance_count",
)
parser.add_argument(
"--prod-max-instance-count",
help="If a deploy group is added, the prod max_instance count to create it with",
required=False,
default=100,
type=int,
dest="prod_max_instance_count",
)
parser.add_argument(
"--non-prod-max-instance-count",
help="If a deploy group is added, the non-prod max_instance count to create it with",
required=False,
default=5,
type=int,
dest="non_prod_max_instance_count",
)
Expand Down Expand Up @@ -115,6 +119,48 @@ def parse_args():
type=int,
dest="timeout_server_ms",
)
parser.add_argument(
"--autotune-min-cpus",
help="Minimum number of CPUs Autotune should give the shard",
required=False,
type=float,
dest="autotune_min_cpus",
)
parser.add_argument(
"--autotune-max-cpus",
help="Maximum number of CPUs Autotune should give the shard",
required=False,
type=float,
dest="autotune_max_cpus",
)
parser.add_argument(
"--autotune-min-mem",
help="Minimum amount of memory Autotune should give the shard",
required=False,
type=int,
dest="autotune_min_mem",
)
parser.add_argument(
"--autotune-max-mem",
help="Maximum amount of memory Autotune should give the shard",
required=False,
type=int,
dest="autotune_max_mem",
)
parser.add_argument(
"--autotune-min-disk",
help="Minimum amount of disk Autotune should give the shard",
required=False,
type=int,
dest="autotune_min_disk",
)
parser.add_argument(
"--autotune-max-disk",
help="Maximum amount of disk Autotune should give the shard",
required=False,
type=int,
dest="autotune_max_disk",
)
return parser.parse_args()


Expand Down Expand Up @@ -194,14 +240,28 @@ def main(args):

instance_config = {
"deploy_group": f"{deploy_prefix}.{args.shard_name}",
"min_instances": args.min_instance_count,
"max_instances": args.prod_max_instance_count
if deploy_prefix == "prod"
else args.non_prod_max_instance_count,
Comment on lines -197 to -200
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nemacysts (from #3829) how would you suggest testing paasta's behavior with no max/min instances?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karmeleon sorry, yesterday was a busy day!

i don't think we've setup the local paasta playground for autoscaling, but we should be able to create an instance in infrastage (or any test cluster) and see what happens

lemme do that real fast, actually :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops - replied in the wrong place: #3829 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i.e., we'd need to tweak paasta to set a default min/max instances if one is not set explicitly or by autotune)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PAASTA-18283

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for checking! to clarify, we're totally okay with min/max instances both being at 1 for brand new shards, since they typically see traffic ramp up gradually enough for autotune to deal with it. do you think it's safe to ship this branch now, or should we wait for that ticket to be completed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm trying to verify that autotune will do the "right" thing and setup min/max instances - but when i ran the rightsizer scripts locally it didn't seem to...that said, i'm not sure if it's cause i need to wait for splunk to have more data

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no worries, think we're gonna go in and manually update live configs to add autotune min/maxes to stabilize things a bit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, sorry - forgot about this: i think i hallucinated that autotune would fixup the autoscaling config: my test instance has not had autotune add min/max instances correctly :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i.e., we'll either want to revert the {min,max}_instances changes here or tackle PAASTA-18283)

"env": {
"PAASTA_SECRET_BUGSNAG_API_KEY": "SECRET(bugsnag_api_key)",
},
}

if args.min_instance_count is not None:
instance_config["min_instances"] = args.min_instance_count

if (
args.prod_max_instance_count is not None
and deploy_prefix == "prod"
):
instance_config["max_instances"] = args.prod_max_instance_count

if (
args.non_prod_max_instance_count is not None
and deploy_prefix != "prod"
):
instance_config[
"max_instances"
] = args.non_prod_max_instance_count

if args.metrics_provider is not None or args.setpoint is not None:
instance_config["autoscaling"] = {}
if args.metrics_provider is not None:
Expand All @@ -214,6 +274,56 @@ def main(args):
instance_config["cpus"] = args.cpus
if args.mem is not None:
instance_config["mem"] = args.mem
if any(
(
args.autotune_min_cpus,
args.autotune_max_cpus,
args.autotune_min_mem,
args.autotune_max_mem,
args.autotune_min_disk,
args.autotune_max_disk,
)
):
instance_config["autotune"] = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
instance_config["autotune"] = {}
instance_config["autotune_limits"] = {}

is the key we want instead


might also be nice to do something like:

limit_config = instance_config["autotune_limits"]
...
limit_config["cpus"]["min"] = args.autotune_min_cpus
...

so that we don't need to repeat instance_config["autotune_limits"] so many times :)

Copy link
Member

@nemacysts nemacysts Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think maybe we could trim this code down by doing something like:

limit_config["cpus"] = {
    "min": args.autotune_min_cpus,
    "max": args.autotune_max_cpus
}
limit_config["mem"] = {
    "min": args.autotune_min_mem,
    "max": args.autotune_max_mem
}
limit_config["disk"] = {
    "min": args.autotune_min_disk,
    "max": args.autotune_max_disk
}

and then adding a processing step that removes any min/max keys with None values - and any resource (e.g., cpus, mem, disk) keys with no subkeys

(that said, maybe the code to do that processing step wouldn't look too different and doing this would be a waste of time :p)

if (
args.autotune_min_cpus is not None
or args.autotune_max_cpus is not None
):
instance_config["autotune"]["cpus"] = {}
if args.autotune_min_cpus is not None:
instance_config["autotune"]["cpus"][
"min"
] = args.autotune_min_cpus
if args.autotune_max_cpus is not None:
instance_config["autotune"]["cpus"][
"max"
] = args.autotune_max_cpus
if (
args.autotune_min_mem is not None
or args.autotune_max_mem is not None
):
instance_config["autotune"]["mem"] = {}
if args.autotune_min_mem is not None:
instance_config["autotune"]["mem"][
"min"
] = args.autotune_min_mem
if args.autotune_max_mem is not None:
instance_config["autotune"]["mem"][
"max"
] = args.autotune_max_mem
if (
args.autotune_min_disk is not None
or args.autotune_max_disk is not None
):
instance_config["autotune"]["disk"] = {}
if args.autotune_min_disk is not None:
instance_config["autotune"]["disk"][
"min"
] = args.autotune_min_disk
if args.autotune_max_disk is not None:
instance_config["autotune"]["disk"][
"max"
] = args.autotune_max_disk
# If the service config does not contain definitions for the shard in each ecosystem
# Add the missing definition and write to the corresponding config
if args.shard_name not in config_file.keys():
Expand Down Expand Up @@ -244,7 +354,7 @@ def main(args):
log.info(f"{args.shard_name} is in smartstack config already, skipping.")

# Only commit to remote if changes were made
if changes_made:
if changes_made and not args.dry_run:
updater.commit_to_remote()
trigger_deploys(args.service)
else:
Expand Down
Loading