-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for autotune limits in service_shard_update #3830
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -39,6 +39,13 @@ def parse_args(): | |||||
action="store_true", | ||||||
dest="verbose", | ||||||
) | ||||||
parser.add_argument( | ||||||
"-d", | ||||||
"--dry-run", | ||||||
help="Do not commit changes to git", | ||||||
action="store_true", | ||||||
dest="dry_run", | ||||||
) | ||||||
parser.add_argument( | ||||||
"--source-id", | ||||||
help="String to attribute the changes in the commit message.", | ||||||
|
@@ -56,22 +63,19 @@ def parse_args(): | |||||
"--min-instance-count", | ||||||
help="If a deploy group is added, the min_instance count to create it with", | ||||||
required=False, | ||||||
default=1, | ||||||
dest="min_instance_count", | ||||||
) | ||||||
parser.add_argument( | ||||||
"--prod-max-instance-count", | ||||||
help="If a deploy group is added, the prod max_instance count to create it with", | ||||||
required=False, | ||||||
default=100, | ||||||
type=int, | ||||||
dest="prod_max_instance_count", | ||||||
) | ||||||
parser.add_argument( | ||||||
"--non-prod-max-instance-count", | ||||||
help="If a deploy group is added, the non-prod max_instance count to create it with", | ||||||
required=False, | ||||||
default=5, | ||||||
type=int, | ||||||
dest="non_prod_max_instance_count", | ||||||
) | ||||||
|
@@ -115,6 +119,48 @@ def parse_args(): | |||||
type=int, | ||||||
dest="timeout_server_ms", | ||||||
) | ||||||
parser.add_argument( | ||||||
"--autotune-min-cpus", | ||||||
help="Minimum number of CPUs Autotune should give the shard", | ||||||
required=False, | ||||||
type=float, | ||||||
dest="autotune_min_cpus", | ||||||
) | ||||||
parser.add_argument( | ||||||
"--autotune-max-cpus", | ||||||
help="Maximum number of CPUs Autotune should give the shard", | ||||||
required=False, | ||||||
type=float, | ||||||
dest="autotune_max_cpus", | ||||||
) | ||||||
parser.add_argument( | ||||||
"--autotune-min-mem", | ||||||
help="Minimum amount of memory Autotune should give the shard", | ||||||
required=False, | ||||||
type=int, | ||||||
dest="autotune_min_mem", | ||||||
) | ||||||
parser.add_argument( | ||||||
"--autotune-max-mem", | ||||||
help="Maximum amount of memory Autotune should give the shard", | ||||||
required=False, | ||||||
type=int, | ||||||
dest="autotune_max_mem", | ||||||
) | ||||||
parser.add_argument( | ||||||
"--autotune-min-disk", | ||||||
help="Minimum amount of disk Autotune should give the shard", | ||||||
required=False, | ||||||
type=int, | ||||||
dest="autotune_min_disk", | ||||||
) | ||||||
parser.add_argument( | ||||||
"--autotune-max-disk", | ||||||
help="Maximum amount of disk Autotune should give the shard", | ||||||
required=False, | ||||||
type=int, | ||||||
dest="autotune_max_disk", | ||||||
) | ||||||
return parser.parse_args() | ||||||
|
||||||
|
||||||
|
@@ -194,14 +240,28 @@ def main(args): | |||||
|
||||||
instance_config = { | ||||||
"deploy_group": f"{deploy_prefix}.{args.shard_name}", | ||||||
"min_instances": args.min_instance_count, | ||||||
"max_instances": args.prod_max_instance_count | ||||||
if deploy_prefix == "prod" | ||||||
else args.non_prod_max_instance_count, | ||||||
Comment on lines
-197
to
-200
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @nemacysts (from #3829) how would you suggest testing paasta's behavior with no max/min instances? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @karmeleon sorry, yesterday was a busy day! i don't think we've setup the local paasta playground for autoscaling, but we should be able to create an instance in infrastage (or any test cluster) and see what happens lemme do that real fast, actually :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. whoops - replied in the wrong place: #3829 (comment) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (i.e., we'd need to tweak paasta to set a default min/max instances if one is not set explicitly or by autotune) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. PAASTA-18283 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks for checking! to clarify, we're totally okay with min/max instances both being at 1 for brand new shards, since they typically see traffic ramp up gradually enough for autotune to deal with it. do you think it's safe to ship this branch now, or should we wait for that ticket to be completed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i'm trying to verify that autotune will do the "right" thing and setup min/max instances - but when i ran the rightsizer scripts locally it didn't seem to...that said, i'm not sure if it's cause i need to wait for splunk to have more data There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no worries, think we're gonna go in and manually update live configs to add autotune min/maxes to stabilize things a bit There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. whoops, sorry - forgot about this: i think i hallucinated that autotune would fixup the autoscaling config: my test instance has not had autotune add min/max instances correctly :( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (i.e., we'll either want to revert the {min,max}_instances changes here or tackle PAASTA-18283) |
||||||
"env": { | ||||||
"PAASTA_SECRET_BUGSNAG_API_KEY": "SECRET(bugsnag_api_key)", | ||||||
}, | ||||||
} | ||||||
|
||||||
if args.min_instance_count is not None: | ||||||
instance_config["min_instances"] = args.min_instance_count | ||||||
|
||||||
if ( | ||||||
args.prod_max_instance_count is not None | ||||||
and deploy_prefix == "prod" | ||||||
): | ||||||
instance_config["max_instances"] = args.prod_max_instance_count | ||||||
|
||||||
if ( | ||||||
args.non_prod_max_instance_count is not None | ||||||
and deploy_prefix != "prod" | ||||||
): | ||||||
instance_config[ | ||||||
"max_instances" | ||||||
] = args.non_prod_max_instance_count | ||||||
|
||||||
if args.metrics_provider is not None or args.setpoint is not None: | ||||||
instance_config["autoscaling"] = {} | ||||||
if args.metrics_provider is not None: | ||||||
|
@@ -214,6 +274,56 @@ def main(args): | |||||
instance_config["cpus"] = args.cpus | ||||||
if args.mem is not None: | ||||||
instance_config["mem"] = args.mem | ||||||
if any( | ||||||
( | ||||||
args.autotune_min_cpus, | ||||||
args.autotune_max_cpus, | ||||||
args.autotune_min_mem, | ||||||
args.autotune_max_mem, | ||||||
args.autotune_min_disk, | ||||||
args.autotune_max_disk, | ||||||
) | ||||||
): | ||||||
instance_config["autotune"] = {} | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
is the key we want instead might also be nice to do something like: limit_config = instance_config["autotune_limits"]
...
limit_config["cpus"]["min"] = args.autotune_min_cpus
... so that we don't need to repeat There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think maybe we could trim this code down by doing something like: limit_config["cpus"] = {
"min": args.autotune_min_cpus,
"max": args.autotune_max_cpus
}
limit_config["mem"] = {
"min": args.autotune_min_mem,
"max": args.autotune_max_mem
}
limit_config["disk"] = {
"min": args.autotune_min_disk,
"max": args.autotune_max_disk
} and then adding a processing step that removes any min/max keys with (that said, maybe the code to do that processing step wouldn't look too different and doing this would be a waste of time :p) |
||||||
if ( | ||||||
args.autotune_min_cpus is not None | ||||||
or args.autotune_max_cpus is not None | ||||||
): | ||||||
instance_config["autotune"]["cpus"] = {} | ||||||
if args.autotune_min_cpus is not None: | ||||||
instance_config["autotune"]["cpus"][ | ||||||
"min" | ||||||
] = args.autotune_min_cpus | ||||||
if args.autotune_max_cpus is not None: | ||||||
instance_config["autotune"]["cpus"][ | ||||||
"max" | ||||||
] = args.autotune_max_cpus | ||||||
if ( | ||||||
args.autotune_min_mem is not None | ||||||
or args.autotune_max_mem is not None | ||||||
): | ||||||
instance_config["autotune"]["mem"] = {} | ||||||
if args.autotune_min_mem is not None: | ||||||
instance_config["autotune"]["mem"][ | ||||||
"min" | ||||||
] = args.autotune_min_mem | ||||||
if args.autotune_max_mem is not None: | ||||||
instance_config["autotune"]["mem"][ | ||||||
"max" | ||||||
] = args.autotune_max_mem | ||||||
if ( | ||||||
args.autotune_min_disk is not None | ||||||
or args.autotune_max_disk is not None | ||||||
): | ||||||
instance_config["autotune"]["disk"] = {} | ||||||
if args.autotune_min_disk is not None: | ||||||
instance_config["autotune"]["disk"][ | ||||||
"min" | ||||||
] = args.autotune_min_disk | ||||||
if args.autotune_max_disk is not None: | ||||||
instance_config["autotune"]["disk"][ | ||||||
"max" | ||||||
] = args.autotune_max_disk | ||||||
# If the service config does not contain definitions for the shard in each ecosystem | ||||||
# Add the missing definition and write to the corresponding config | ||||||
if args.shard_name not in config_file.keys(): | ||||||
|
@@ -244,7 +354,7 @@ def main(args): | |||||
log.info(f"{args.shard_name} is in smartstack config already, skipping.") | ||||||
|
||||||
# Only commit to remote if changes were made | ||||||
if changes_made: | ||||||
if changes_made and not args.dry_run: | ||||||
updater.commit_to_remote() | ||||||
trigger_deploys(args.service) | ||||||
else: | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
(i'm also a fan of
--for-real
flags so that dry-run is the default, but that would be a little more annoying to rollout here)