-
Notifications
You must be signed in to change notification settings - Fork 583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new operational mode - percent with CPU #3351
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@spacetourist, I finally took some time to review both the initial issue, as well as this PR. Thanks for the rich description, as well as the data tables, they were really useful in understanding what's wrong here.
As I see it, the bigger issue here is around the "r"
flag, freeswitch-enabled destinations set aside. There is something intuitively off about the 100 - 100 / MAX * 100
formula which I cannot find a solid explanation for (we should plot it!), but it seems like any relative disparity of the inputs seems to be softened or normalized by this formula, effectively bringing the outputs a lot closer, relative to each other. For example, putting all your data in a single table:
Final | Float | Transf-2 | Max-Load | Transf-1 | Sessions |
---|---|---|---|---|---|
91 | 91.11 | 100 - 100 / 1125 * 100 | 1125 | .75 * (2500 - (1100 - 100)) | 1100 |
91 | 91.66 | 100 - 100 / 1200 * 100 | 1200 | .75 * (2500 - (1000 - 100)) | 1000 |
92 | 92.15 | 100 - 100 / 1275 * 100 | 1275 | .75 * (2500 - (900 - 100)) | 900 |
92 | 92.59 | 100 - 100 / 1350 * 100 | 1350 | .75 * (2500 - (800 - 100)) | 800 |
92 | 92.98 | 100 - 100 / 1425 * 100 | 1425 | .75 * (2500 - (700 - 100)) | 700 |
Like, how on earth did we obtain a 1%
difference between least-loaded/most-loaded in the output, coming from a 57%
difference between least-loaded/most-loaded in the inputs? The reduction was done in two steps: from 57% -> 26.6% -> 1%
. What is intrinsically wrong with this formula and can we mathematically change it in order to obtain better weights?
Now, you also sensed this problem based on your empirical evidence (why are my calls going to the more loaded FS?!) and the PERCENT_WITH_CPU
approach is a two-fold improvement:
- first, you change the computation from
2 steps
into1 step
. There is no more of that "pseudo max_load" intermediary value, which helps preserve more of the original ratios. - secondly, you perform the
.cpu_idle
multiplication in the last step, after the100 - 100 * X
formula, which will also help to reflect more of the FS instance data into the final weight.
While I am 100% for merging this new "c"
(CPU) flag / exclusive with "r"
right away, I will leave you a fun question about the "r"
mode in general and whether we should enable it in the first place: If I give you two FS instances, one running on a raspberry PI at 1/2
calls (50%) and another one on a super-server at 500/1000
calls (50%), would you "relatively" balance a call to any of them? Or is the situation not so relative, after all? :)
Morning @liviuchircu - I'm now happy with the state of this PR with no immediate plans for further changes. Having said that, I have the following ideas for future improvements to this module:
There are also a number of algorithm changes which might be worth a look in time, in particular addressing issues such as your fun question from above. To accomodate systems of wildly different capacities we ought to be looking at the impact of the allocation as well as the state preallocation. Using your example, the Pi may have the lowest pre-allocation load but with that call occupying a whopping 50% of the overall capacity we'd be able to reevaluate the decision. Obviously anyone actually running that wild mix of instance sizes would be asking for trouble anyway but there is clearly a lot more that can be done to increase the module flexibility. At this time I'm reasonably confident that these changes will solve for my problem so I'm keen to get your feedback, cheers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @spacetourist,
We've discussed this internally and while the new "Integrated Estimation" mode seems to solve your concrete problem, the module will still lack flexibility when it comes to different ratios of sessions
to max-sessions
. For example, with a sufficiently high max_sessions
(e.g. in the order of thousands), and current_sessions
in the order of hundreds, the new mode's formula will still output relatively similar weights, without giving the user any control to change it.
So we backtracked a bit and concluded that the problem can be alleviated while the max_load
is being computed, during lb_update_max_loads()
. In order to give full control to the user over their FreeSWITCH sessions scaling (some want 100 max sessions, others 500 or even 2500!), we could add a new freeswitch_sessions_exponent
(default: 1
, no change), that would be applied as a power to the current Sessions
value. Here is how such an exponent would modify the output max_load
:
The picture shows 4 possible exponent settings: 1
, 1.01
, 1.05
and 1.1
, which already create a dramatic change in the relative difference between the output max_load
values.
In your case, probably a "1.1"
value of the modparam would suffice, and it would fix your scenario with typical Max-Sessions values of 2500
. The exponentiation would be added to this code section:
if (psz < dst->fs_sock->stats.max_sess) {
dst->rmap[ri].max_load =
(dst->fs_sock->stats.id_cpu / (float)100) *
(dst->fs_sock->stats.max_sess -
(powf(dst->fs_sock->stats.sess, new_modparam) - psz));
This is just a working example as we get closer to the final solution, but the idea remains: the relative
mode should be made to work as-is, rather than inventing new, obscure flags. And there is no need to leak all kinds of random information (CPU load? current_sessions? etc.) into get_dst_load()
, which is ultimately meant to provide a couple algorithms of interpreting the max_load
of a destination, nothing more.
Hi @liviuchircu - that's an interesting idea but I'm not sure it solves some key aspects of my issue. The main issue is the sheer volume of calls I'm dealing with - I exceed 200/cps on a couple of instances so must take that into account between heartbeats to avoid allocating all of those calls to the same instance until the next execution of I also have several OpenSIPs instances feeding into the same bank of FreeSWITCH servers meaning that the profile size is not really relevant to the calculation. In some ways having the max load score being a close contest isn't a bad thing here provided the "s" flag is also enabled as calls will be distributed randomly to those instances until the next heartbeat clarifies the real active session counts. I'll give this some more thought as I agree there are further improvements we could make to:
It may be that what I'm looking for falls too far outside of the scope of the module authors intentions to implement something general here but I'm keen to work towards a solution which has the flexibility needed to both be tunable and applicable to the wider community. |
* ✨ new operational mode - percent with CPU * 🐛 syntax errors * 🐛 cherry pick duplicate * 🐛 prevent divide by zero * 📝 improve log message * 📝 improve log message * 🐛 incorrect type * 📝 improve log message * 📝 improve log message * 📝 improve log message * 📝 improve log message * Dev cpufactor (#1) * new operational mode - percent with CPU * 🐛 fix print of str type * remove comment * modify character choice for new flag * document new integrated estimation flag usage * ✨ create CPU factor option flag * document new integrated estimation flag usage * 📝 CPU factor is optional, improve description * capture docs in template rather than README directly
I wanted to follow up on this PR now that I have finally moved forward and have this running in production. My results will of course be anecdotal however I am pleased to report that the modifications have had the desired effect and even with four distinct OpenSIPs instances (no data sharing) in front of a bank of 13 media servers I'm seeing call loads balance to within 50 calls (4k-8k concurrent). I'm only currently pushing ~25% of the traffic through the load balancer so I expect the lines to converge further in the coming weeks. The smoothing effect of having even this proportion of my load allocated is really beneficial and outside of peak moments I see very flat load across the media servers. Thanks for all the assistance in getting this patch working properly, I'll provide another update and some charts once I have all the traffic using this system. 🚀 |
Summary
Implements a further load_balancer module strategy for distributing calls more evenly when dealing with high request volumes.
Details/Solution
The change caches the heartbeat data into the module and performs the following calculation for each request:
( 100 - ( 100 * current_sessions + sessions_since_last_heartbeat / max_sessions ) ) * CPU Idle factor
This disregards the dialog profile counts and allocates simply based on the last known call stats and any changes that have been made locally. The intention is to distribute calls to the last known least loaded server whilst not overloading a single system given the latency of the heartbeat data. AFAIK the minimum on both sides is 1s. For a system handling hundreds of calls per second to shared destinations this aims to balance the individual routing decisions more evenly.
Compatibility
This should not impact the other module features.
Closing issues
Closes #3297