Skip to content

QueuedThreadPool will not reserve a thread with jobs waiting #13208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: jetty-12.1.x
Choose a base branch
from

Conversation

gregw
Copy link
Contributor

@gregw gregw commented Jun 4, 2025

Fix #13187 by:

  • Modified QTP's ReservedThreadExecutor so that it will not reserve a thread if there are queued jobs waiting.
  • Deprecating many meaningless getters that combined the concept of available for execute(task) with available for tryExecute(task).
  • Improved javadoc

Fix #13187 by:
 + Modified QTP's ReservedThreadExecutor so that it will not reserve a thread if there are queued jobs waiting.
 + Deprecating many meaningless getters that combined the concept of available for execute(task) with available for tryExecute(task).
 + Improved javadoc
gregw added 2 commits June 4, 2025 12:59
Fix #13187 by:
 + Modified QTP's ReservedThreadExecutor so that it will not reserve a thread if there are queued jobs waiting.
 + Deprecating many meaningless getters that combined the concept of available for execute(task) with available for tryExecute(task).
 + Improved javadoc
Fix #13187 by:
 + Modified QTP's ReservedThreadExecutor so that it will not reserve a thread if there are queued jobs waiting.
 + Deprecating many meaningless getters that combined the concept of available for execute(task) with available for tryExecute(task).
 + Improved javadoc
@gregw gregw requested a review from lachlan-roberts June 4, 2025 03:26
// If we have more jobs queued than the number of pending reserved threads,
// then we have some real jobs in the queue.
// So we will not start a reserved thread as it may stop a real job from being executed.
if (isTaskWaiting())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am borderline -1 on this change.

This fundamentally changes Jetty's behavior from EXECUTE_PRODUCE_CONSUME to PRODUCE_EXECUTE_CONSUME when it's under enough load that tasks start getting queued.

In my understanding, that is supposed to help lower max latency at the expense of higher average latency. This sounds reasonable, but how much effect does that really have? And is that a good thing? Wouldn't that have an impact on throughput which could lead to a snowball effect?

Plus, this is fundamentally racy if the number of threads has been tuned to be just enough to handle the load: in such case, spawning a new reserved thread becomes totally random and you still have the issue of a reserved thread idling out while there is a job waiting in the queue depending on the exact timing of this test vs the queuing of the job.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lorban I understand your reluctance, so I've moved the javadoc changes to a different PR for 12.1.x and we can take our time on this one.

So here is my thinking:

  • Reserving a thread is always an optional optimisation. We can run with maxReservedThreads (as it should be called) of 0.
  • When running in a constrained environment of limited threads, then reserving a thread is less important than getting real work done.

Perhaps this should only be conditional on us reaching maxThreads. Prior to that it is better to start a thread to reserve it, than to avoid doing so. However, I doubt we will have threads queued (at least not for long) if we are at max threads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'm aligned with your view of the importance this optimization could have.

My fear is that a system that needs all its threads to serve a load with the optimized path might not be able to serve that same load with the unoptimized path, and would enter a spiral of death by switching to a slower code path when it reaches a certain load.

Maybe I'm over-worrying, but I think I won't be able to find peace of mind without seeing the results of some limit tests showing how QTP+AdaptiveExecutionStrategy behave when they have to handle a load too big for the thread pool for a short duration: do they temporarily degrade performance or do they collapse? Can they recover once the load goes down again or did they go to the sorry place, never coming back?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for more perf/stress testing.... if only we knew somebody to do that :)

I was interesting that one of the tests that needed to be adjusted was checking how timeouts worked if jobs were queued in the QTP. The test failed because with this change, we could handle one more request than without it, as instead of being reserved a thread took one more job off the queue to execute. That felt like a good change to me.

@lorban
Copy link
Contributor

lorban commented Jun 4, 2025

Definitely, QTP's getters and javadoc need updating so I'm happy with that change.

But the not reserving a thread if there are queued jobs waiting part is IMHO problematic and gives me a feeling of a generic solution trying to solve a special case problem.

My view of the problem is that we have a finite resource (a number of threads) that is shared between different consumers (execution of 'normal' jobs and reservations for 'special' jobs) fighting in a concurrent context. And we're not happy that sometimes it's not the ideal candidate that wins the resource race so we'd like to introduce some extra fairness.

I'm very opinionated on this kind of tweaks as it's never possible to be totally fair without a serious reduction of performance, and unwanted side effects of tweaks that supposedly add some level of fairness on the cheap are hard to foresee.

@gregw
Copy link
Contributor Author

gregw commented Jun 5, 2025

I've moved the javadoc changes to #13212 so they can get merged quickly.

gregw and others added 9 commits June 5, 2025 11:31
Fix #13187 by:
 + Modified QTP's ReservedThreadExecutor so that it will not reserve a thread if there are queued jobs waiting.
 + Deprecating many meaningless getters that combined the concept of available for execute(task) with available for tryExecute(task).
 + Improved javadoc
Fix #13187 by:
 + Modified QTP's ReservedThreadExecutor so that it will not reserve a thread if there are queued jobs waiting.
 + Deprecating many meaningless getters that combined the concept of available for execute(task) with available for tryExecute(task).
 + Improved javadoc
Fix #13187 by:
 + Modified QTP's ReservedThreadExecutor so that it will not reserve a thread if there are queued jobs waiting.
 + Deprecating many meaningless getters that combined the concept of available for execute(task) with available for tryExecute(task).
 + Improved javadoc
Fix #13187 by:
 + Modified QTP's ReservedThreadExecutor so that it will not reserve a thread if there are queued jobs waiting.
 + Deprecating many meaningless getters that combined the concept of available for execute(task) with available for tryExecute(task).
 + Improved javadoc
…1.x/13187/qtpReluctantReservation

# Conflicts:
#	jetty-core/jetty-http2/jetty-http2-tests/src/test/java/org/eclipse/jetty/http2/tests/BlockedWritesWithSmallThreadPoolTest.java
#	jetty-core/jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java
Signed-off-by: Ludovic Orban <[email protected]>
@lorban
Copy link
Contributor

lorban commented Jun 10, 2025

I've written an attempt at a limit test to see how this change impacts performance, see the new AESLimit class.

On my machine, it seems that with this change:

  • the latency doubles up to the 99.9th percentile
  • then lowers by ~20% up to the 99.999th
  • then doubles again up to the max latency

but throughput increases by ~40%:

  • with the patch, AdaptiveExecutionStrategy reports pec=4_777_050,epc=2_376_119
  • without the patch AdaptiveExecutionStrategy reports pec=1_486_968,epc=3_178_115

Assuming the test I wrote is sane, it's clear that AdaptiveExecutionStrategy successfully trades throughput for latency and that the patch noticeably degrades that tradeoff. Since the test I wrote does almost nothing in the executed task, I can't say for sure if the extra throughput would be maintained when the executed task would suffer from cache misses due to the context change.

I'm not sure how deep we should dig in, but it seems apparent that if we value EPC, this change has a negative impact.

@lorban
Copy link
Contributor

lorban commented Jun 10, 2025

After discussing with @sbordet, we agreed that the AESLimit test was not measuring what it was supposed to. Another attempt had been made, and the fundamental end result is similar (even if the details are widely different): this change degrades latency overall.

We need to have an agreement about if the latest test is measuring the right things or not before moving on.

@gregw gregw removed this from Jetty 12.1.0 Jun 11, 2025
@gregw
Copy link
Contributor Author

gregw commented Jun 11, 2025

bumped to 12.1.1 at least

@gregw
Copy link
Contributor Author

gregw commented Jun 13, 2025

@sbordet @lorban There are only a few explanations for this PR causing increased latency:

  1. The test/benchmark is wrong
  2. The testing infrastructure is not stable and it was happen stance
  3. The extra test to check if there are tasks waiting is expensive... but it is a lock free check, conducted only by a small minority of threads between jobs. Although - I guess it would be worthwhile to somehow measure how many threads running once were reserved, so we know how many are going to do this check
  4. The reduction of available reserved threads is impacting performance. Which would be OK, as it indicates they are useful!

I currently think it is likely to be 1 or 2. But it would be interesting to measure 3.... if that can be done without adding additional latency :) It probably can be done with an inspection of the stack traces of all threads?

@lorban
Copy link
Contributor

lorban commented Jun 20, 2025

@sbordet @lorban There are only a few explanations for this PR causing increased latency:

1. The test/benchmark is wrong

It's always a possibility that the benchmark isn't measuring what we think it's measuring. Or that it's not representative of the reality.

2. The testing infrastructure is not stable and it was happen stance

Always a possibility too.

3. The extra test to check if there are tasks waiting is expensive... but it is a lock free check, conducted only by a small minority of threads between jobs.    Although - I guess it would be worthwhile to somehow measure how many threads running once were reserved, so we know how many are going to do this check

Here, the benchmark explicitly configures a QTP of 4 threads: 1 selector, 1 acceptor and 2 spares with a max reserved thread of 1 so that what's measured is what happens when the handling jobs are fighting against the reserved threads for one of the 2 spare threads.

Maybe we should not care too much about this constrained environment, but it was the original question that started this perf exercise.

4. The reduction of available reserved threads is impacting performance.  Which would be OK, as it indicates they are useful!

To be strictly precise: the reduction of available reserved threads is negatively impacting latency but seems it may have a positive impact on throughput. But I agree that we should and do value latency over throughput so I assume you saw the result of this benchmark as a degradation (assuming the benchmark is right, of course).

I currently think it is likely to be 1 or 2. But it would be interesting to measure 3.... if that can be done without adding additional latency :) It probably can be done with an inspection of the stack traces of all threads?

The benchmark can already do that on the cheap by adding the async profiler to the JMH profiler list. IIRC we did that and the impact wasn't related to any form of contention but rather on the fact that a reserved thread was less often available in the constrained environment of the benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Missing API to increase QueuedThreadPool maxThreads by leased threads + QoSHandler bug of exceeding maxRequestCount by 1
3 participants