Skip to content
This repository has been archived by the owner on Dec 1, 2021. It is now read-only.

while scaling - affects the running AI response - causing spikes and slowness #472

Open
danupo068 opened this issue Apr 20, 2019 · 3 comments

Comments

@danupo068
Copy link

Description
when using app autoscaler the during the scaling the process .. the overall throughput of the app is going down .. the app instances(AI) that are already existing shows higher response times while the new app instances are coming /scaling.. this is hugely impacting our production systems while using autoscaler; appreciate your insights into this esp with large scale prod environments
Observations:
The are some queries around scaling_events is largely effecting performance esp some tables does not have indices

@cf-gitbot
Copy link
Collaborator

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/165481535

The labels on this github issue will be updated when the story is started.

@danupo068 danupo068 changed the title while scaling - the running AI response times have increased while scaling - affects the running AI response - causing spikes and slowness Apr 20, 2019
@cdlliuy
Copy link
Contributor

cdlliuy commented Apr 23, 2019

@danupo068

  1. which version of app-autoscaler are you using in your production system?
  2. app-autoscaler doesn't include in data injection into your application instances. Could you provide more detail information about the things happened?
  3. the scaling history query just happens between autoscaler components inside. Yes , it may be an issue for the index missing, but it should not affect app instances performance.

In summary, more information is welcomed.

@boyang9527
Copy link
Contributor

@cdlliuy let us have some experiments on that. This might be related to the health check of new app instances (the default is "port" instead of http. If port is used, CF will think it is ready but actually it is not, this will cause failures/long delays). Another potential reason is load balancing, new instances will need time to warm up so it will have longer response time than existing instances, while the router is doing it in a round-robin way.

These two may explain the reason why overall response time increases but can not explain increased response time for existing instances. we need investigation.

@danupo068 FYI with above. If you can provide more information, that will be much better for us to diagnose. For example, what language runtime you are using, the health check type, the scaling rules etc.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants