Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get app tests working for app-autoscaler #19

Merged
merged 25 commits into from
May 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
18ed1d0
Adding additional bosh dns aliases
cweibel Apr 24, 2024
7626ff3
Remove extra unused domains
cweibel Apr 24, 2024
632aebf
Try again?
cweibel Apr 24, 2024
058d3cd
Adding extra domains
cweibel Apr 24, 2024
2e4f9dc
Adding certs for operator and metricsforwarder
cweibel Apr 25, 2024
c8f9ce4
Adding rr certs
cweibel Apr 26, 2024
5929e6d
Remove tls from broker...
cweibel Apr 26, 2024
920f24a
Remove tls from api...
cweibel Apr 26, 2024
4ff9772
Trying to get cpuutil to pass with more cpu
cweibel Apr 29, 2024
41480e6
Comma for the win
cweibel Apr 29, 2024
a6fdcb2
Attempt to parallelize some of the tests
cweibel Apr 29, 2024
d5781c0
Force use of supplied org and space
cweibel Apr 29, 2024
3147e7b
Take out parallel tests
cweibel Apr 29, 2024
392bb60
Remove login/bsg to see of api tests pass
cweibel Apr 29, 2024
e90f168
remove existing user
cweibel Apr 30, 2024
ad0c4b6
Splitting config file for app tests from the other two
cweibel May 1, 2024
e9479d0
semicolon
cweibel May 1, 2024
dc97f5f
Determining how many ways can you screw up bash if/then/else syntax, …
cweibel May 1, 2024
59392f6
Remove extra comma from app config
cweibel May 1, 2024
12c2c95
Setting timeout to 2 hours, using built in skip for cpuutil
cweibel May 1, 2024
5f02871
Bumping memory of test app
cweibel May 2, 2024
b9933c5
Bumping memory of test app
cweibel May 2, 2024
c651ad7
Switch back to main
cweibel May 2, 2024
8c7435e
Spelling fix
cweibel May 2, 2024
0c962be
removing unused APP_CPU_ENTITLEMENT
cweibel May 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 87 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ This is done in a few steps:
7. Acceptance Tests
8. Manual Tests
9. Install plugin
10. Maintenace & Misc


Overall instructions for install are at https://github.com/cloudfoundry/app-autoscaler-release/blob/main/README.md . While the overall process is correct, many of the details, such as user names and base yaml files to use are incorrect. These are corrected for in this set of instructions. With that out of the way, let's deploy!
Expand Down Expand Up @@ -182,7 +183,7 @@ OK

## 6. Enable the broker for an organization

There are two ways of enabling service access: via the pipeline and manually
There are two ways of enabling service access: via the pipeline and manually:


### Enable via the pipeline (Preferred)
Expand Down Expand Up @@ -216,19 +217,23 @@ The app-autoscaler-release as part of their releases includes a tarball with a p

These contain three types of tests:

- broker - ~5 minutes to run
- broker - ~5 minutes to run
- api - ~5 minutes to run
- app - currently paused since custom metrics are not working and the test takes ~1 hour to run
- app - ~1 hour to run

In the pipeline for `cg-deploy-autoscaler`, each of these tests are configured to run after the deployment of development, staging and production as `acceptance-tests-{api,broker,app}-{development,staging,production}`. These tests are configured to not run in parallel as each runs a `cleanup.sh` script at the end which deletes orgs with the naming convention `ASATS-*`.
In the pipeline for `cg-deploy-autoscaler`, each of these tests are configured to run after the deployment of development, staging and production as `acceptance-tests-{api,broker,app}-{development,staging,production}`. These tests are configured to not run in parallel as each runs a `cleanup.sh` script at the end which deletes orgs with the naming convention `ASATS-*`. Also resist the urge to add `--nodes=4 --flake-attempts=3` as ginkgo options to `bin/test`, the app tests in particular fail frequently with this enabled.

There is also a set of 3 acceptance tests for development debugging which are commented out, along with a custom resouce towards the bottom. These exist to debug the bash and other scripting without having to wait for a deployment to development to finish first. Please comment these back out before submitting PRs.

Finally, the "app" tests for custom metrics require the org/space to be able to communicate to the route registrared autoscaler api endpoint. The default CF application security group does not allow public url access so the test is customized to at runtime create an organization called `ASATS-Autoscaler-Acceptance-Tests` which has `cf bind-security-group public_networks_egress ...` applied to it. The other tests (broker and api) use an org/space which named and created by the acceptance tests themselves. The "broker" tests fail if the `ASATS-Autoscaler-Acceptance-Tests` org is used. This is why there are two different sets of config files in `acceptance-tests.sh`.


## 8. Manual Tests


Create a service instance:
### CPU Example

This first test will show how to create an Autoscaler Policy based on CPU, start by creating a service instance:

```
cf create-service app-autoscaler autoscaler-free-plan my-autoscaler
Expand Down Expand Up @@ -307,7 +312,7 @@ memory usage: 256M

Since there is no load on this hello-world style app, you'll then see the number of application instances from 4, to 3, to 2, to 1 over the course of 8 or so minutes.

### To test the scale up, we'll need to add load and drop the threshold
#### To test the scale up, we'll need to add load and drop the threshold

Start by creating a new policy with an 11% scale up threshold:

Expand Down Expand Up @@ -384,6 +389,72 @@ memory usage: 256M

Yay!

### Scheduled Example

The policy below has been manipulated to be artificially low to result in the application scaling to 4 app instances during a 30 minute window:

```
cat << POLICY > my_policy.json
{
"instance_min_count": 1,
"instance_max_count": 4,
"scaling_rules":
[
{
"metric_type": "memoryused",
"breach_duration_secs": 60,
"threshold": 0,
"operator": ">",
"cool_down_secs": 60,
"adjustment": "+1"
}
],
"schedules": {
"timezone": "America/New_York",
"specific_date": [
{
"start_date_time": "2024-04-23T16:30",
"end_date_time": "2024-04-23T17:00",
"instance_min_count": 1,
"instance_max_count": 4,
"initial_min_instance_count": 2
}
]
}
}
POLICY
```

A list of timezones used are defined at [https://docs.oracle.com/middleware/12211/wcs/tag-ref/MISC/TimeZones.html](https://docs.oracle.com/middleware/12211/wcs/tag-ref/MISC/TimeZones.html)

Unbind then bind the service instance to the app to have the policy applied, then verify the policy is in effect:

```
cf unbind-service my_cf3_app my-autoscaler
cf bind-service my_cf3_app my-autoscaler -c my_policy.json
cf asp my_cf3_app
```

Checked back after 5pm, the following scaling events had occurred:

```
cf ash my_cf3_app

Retrieving scaling event history for app my_cf3_app...
Scaling Type Status Instance Changes Time Action Error
dynamic succeeded 3->4 2024-04-23T16:32:13-04:00 +1 instance(s) because memoryused > 0MB for 60 seconds
dynamic succeeded 2->3 2024-04-23T16:30:13-04:00 +1 instance(s) because memoryused > 0MB for 60 seconds
dynamic succeeded 1->2 2024-04-23T16:26:13-04:00 +1 instance(s) because memoryused > 0MB for 60 seconds
```

Starting from the bottom, you can see when the policy was applied at 4:26PM it immediately bumped the instances to 2 because of the `"initial_min_instance_count": 2`, then starting at 4:30PM it started to add 1 app instance at a time until the max of 4 was reached.

Nifty!


### Looking for more examples?

All of the possible policy configurations can be found at [https://github.com/cloudfoundry/app-autoscaler-release/blob/main/docs/policy.md](https://github.com/cloudfoundry/app-autoscaler-release/blob/main/docs/policy.md) (note that custom metrics do not currently work).

## 9. Install plugin

Expand Down Expand Up @@ -423,4 +494,14 @@ cpu 1percentage 2023-09-20T14:00:48-04:00
...
```

## 10. Maintenace & Misc

In no particular order:

- There are two CAs and a series of certs which are maintained in credhub. Rotation of the certs is no different than the CF ones.
- There is an ops file called `route-registrar-tls.yml` which lays out the ground work add TLS to the route registrared endpoints and a separate set of `-rr-` certs. The current implementation only supports mTLS which the CF gorouters don't seem to yet support. The two CAs created in the autoscaler deployment are also included in the Gorouter's list of CAs enabled by an ops file in cg-deploy-cf.
- Scaling of the autoscaler vms themselves will need to be figured out as customers begin to use this more, right now the sizing is optimized to keep costs down.
- Will also need to circle back to the RDS instances, will likely want to add REINDEX jobs to some of the tables which store metrics, scaling history and other tables which have a high frequency of deletes.
- The defaults for data retentions for metrics history, scaling history and others are kept at the defaults defined in the spec, an example of this can be seen [here](https://github.com/cloudfoundry/app-autoscaler-release/blob/main/jobs/operator/spec#L215-L217).
- Policies with recurring or date schedules still require scaling rules with a metric type defined. If you want to force an app to scale at a particular make sure that the scaling rules are easy to achieve (ie: cpu > 0)
- The dynamic_policy_test.go tests for disk will fail with the default 128MB of memory in Staging and Production (oddly works fine in development), this was bumped in the configuration file to 1024 MB for the `app` tests
44 changes: 44 additions & 0 deletions bosh/opsfiles/bosh-dns-cf-deployment-name.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,3 +63,47 @@
instance_group: log-cache
network: default
query: '*'

- type: replace
path: /addons/name=bosh-dns-aliases/jobs/name=bosh-dns-aliases/properties/aliases/-
value:
domain: app-autoscaler.metricsforwarder.service.cf.internal
targets:
- deployment: app-autoscaler
domain: bosh
instance_group: metricsforwarder
network: default
query: '*'

- type: replace
path: /addons/name=bosh-dns-aliases/jobs/name=bosh-dns-aliases/properties/aliases/-
value:
domain: app-autoscaler.operator.service.cf.internal
targets:
- deployment: app-autoscaler
domain: bosh
instance_group: operator
network: default
query: '*'


- type: remove
path: /addons/name=bosh-dns-aliases/jobs/name=bosh-dns-aliases/properties/aliases/domain=((deployment_name)).autoscalerpostgres.service.cf.internal

- type: remove
path: /domains/metricsgateway

- type: remove
path: /domains/metricsserver

- type: remove
path: /domains/postgres


- type: replace
path: /domains/metricsforwarder?
value: app-autoscaler.metricsforwarder.service.cf.internal

- type: replace
path: /domains/operator?
value: app-autoscaler.operator.service.cf.internal
144 changes: 134 additions & 10 deletions bosh/opsfiles/certificates.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Used
# Reset default duration
## Used
- type: remove
path: /variables/name=eventgenerator_client_cert/options/duration?
- type: remove
Expand All @@ -13,28 +14,151 @@
path: /variables/name=scheduler_client_cert/options/duration?
- type: remove
path: /variables/name=scheduler_server_cert/options/duration?
- type: remove
path: /variables/name=loggr_syslog_agent_metrics/options/duration?
- type: remove
path: /variables/name=loggr_syslog_agent_cache_tls/options/duration?
- type: remove
path: /variables/name=loggr_syslog_agent_tls/options/duration?
- type: remove
path: /variables/name=loggr_syslog_binding_cache_api_tls/options/duration?
- type: remove
path: /variables/name=loggr_syslog_binding_cache_metrics/options/duration?
- type: remove
path: /variables/name=loggr_syslog_binding_cache_tls/options/duration?
- type: remove
path: /variables/name=metricsforwarder_autoscaler_metricsforwarder_loggregator_tls/options/duration?
- type: replace
path: /variables/name=app_autoscaler_ca_cert/options/duration?
value: 3650
- type: replace
path: /variables/name=metric_scraper_ca/options/duration?
value: 3650
# Not used, but covering bases
## Not used, but covering bases
- type: remove
path: /variables/name=apiserver_server_cert/options/duration?
- type: remove
path: /variables/name=loggr_syslog_agent_cache_tls/options/duration?
path: /variables/name=servicebroker_server_cert/options/duration?


# Reset default key_length

- type: remove
path: /variables/name=loggr_syslog_agent_metrics/options/duration?
path: /variables/name=eventgenerator_client_cert/options/key_length?
## Used
- type: remove
path: /variables/name=loggr_syslog_agent_tls/options/duration?
path: /variables/name=eventgenerator_client_cert/options/key_length?
- type: remove
path: /variables/name=loggr_syslog_binding_cache_api_tls/options/duration?
path: /variables/name=eventgenerator_server_cert/options/key_length?
- type: remove
path: /variables/name=loggr_syslog_binding_cache_metrics/options/duration?
path: /variables/name=loggregator_agent_metrics_tls/options/key_length?
- type: remove
path: /variables/name=loggr_syslog_binding_cache_tls/options/duration?
path: /variables/name=scalingengine_client_cert/options/key_length?
- type: remove
path: /variables/name=metricsforwarder_autoscaler_metricsforwarder_loggregator_tls/options/duration?
path: /variables/name=scalingengine_server_cert/options/key_length?
- type: remove
path: /variables/name=scheduler_client_cert/options/key_length?
- type: remove
path: /variables/name=scheduler_server_cert/options/key_length?
- type: remove
path: /variables/name=loggr_syslog_agent_metrics/options/key_length?
- type: remove
path: /variables/name=loggr_syslog_agent_cache_tls/options/key_length?
- type: remove
path: /variables/name=loggr_syslog_agent_tls/options/key_length?
- type: remove
path: /variables/name=loggr_syslog_binding_cache_api_tls/options/key_length?
- type: remove
path: /variables/name=loggr_syslog_binding_cache_metrics/options/key_length?
- type: remove
path: /variables/name=loggr_syslog_binding_cache_tls/options/key_length?
- type: remove
path: /variables/name=metricsforwarder_autoscaler_metricsforwarder_loggregator_tls/options/key_length?
- type: remove
path: /variables/name=servicebroker_server_cert/options/duration?
path: /variables/name=app_autoscaler_ca_cert/options/key_length?
- type: remove
path: /variables/name=metric_scraper_ca/options/key_length?
## Not used, but covering bases
- type: remove
path: /variables/name=apiserver_server_cert/options/key_length?
- type: remove
path: /variables/name=servicebroker_server_cert/options/key_length?







# Add certs for route_registrar to use
- type: replace
path: /variables/-
value:
name: operator_server_rr_cert
options:
alternative_names:
- app-autoscaler.operator.service.cf.internal
ca: app_autoscaler_ca_cert
common_name: app-autoscaler.operator.service.cf.internal
type: certificate
update_mode: converge

- type: replace
path: /variables/-
value:
name: metricsforwarder_server_rr_cert
options:
alternative_names:
- app-autoscaler.metricsforwarder.service.cf.internal
ca: app_autoscaler_ca_cert
common_name: app-autoscaler.metricsforwarder.service.cf.internal
type: certificate
update_mode: converge

- type: replace
path: /variables/-
value:
name: scalingengine_server_rr_cert
options:
alternative_names:
- app-autoscaler.scalingengine.service.cf.internal
ca: app_autoscaler_ca_cert
common_name: app-autoscaler.scalingengine.service.cf.internal
type: certificate
update_mode: converge

- type: replace
path: /variables/-
value:
name: apiserver_server_rr_cert
options:
alternative_names:
- app-autoscaler.apiserver.service.cf.internal
ca: app_autoscaler_ca_cert
common_name: app-autoscaler.apiserver.service.cf.internal
type: certificate
update_mode: converge

- type: replace
path: /variables/-
value:
name: scheduler_server_rr_cert
options:
alternative_names:
- app-autoscaler.autoscalerscheduler.service.cf.internal
ca: app_autoscaler_ca_cert
common_name: app-autoscaler.autoscalerscheduler.service.cf.internal
type: certificate
update_mode: converge

- type: replace
path: /variables/-
value:
name: eventgenerator_server_rr_cert
options:
alternative_names:
- app-autoscaler.eventgenerator.service.cf.internal
ca: app_autoscaler_ca_cert
common_name: app-autoscaler.eventgenerator.service.cf.internal
type: certificate
update_mode: converge
4 changes: 4 additions & 0 deletions bosh/opsfiles/releases.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,7 @@
version: "12.2.2"
url: "https://bosh.io/d/github.com/cloudfoundry-incubator/app-autoscaler-release?v=12.2.2"
sha1: "a1fffce71219318d1fb27ec5fc3ff84e757337ed"

# Not used
- type: remove
path: /releases/name=postgres
Loading
Loading