QueuedThreadPool will not reserve a thread with jobs waiting #13208

gregw · 2025-06-04T02:04:52Z

Modified QTP's ReservedThreadExecutor so that it will not reserve a thread if there are queued jobs waiting.
Deprecating many meaningless getters that combined the concept of available for execute(task) with available for tryExecute(task).
Improved javadoc

Fix #13187 by: + Modified QTP's ReservedThreadExecutor so that it will not reserve a thread if there are queued jobs waiting. + Deprecating many meaningless getters that combined the concept of available for execute(task) with available for tryExecute(task). + Improved javadoc

lorban · 2025-06-04T07:48:48Z

jetty-core/jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java

+            // If we have more jobs queued than the number of pending reserved threads,
+            // then we have some real jobs in the queue.
+            // So we will not start a reserved thread as it may stop a real job from being executed.
+            if (isTaskWaiting())


I am borderline -1 on this change.

This fundamentally changes Jetty's behavior from EXECUTE_PRODUCE_CONSUME to PRODUCE_EXECUTE_CONSUME when it's under enough load that tasks start getting queued.

In my understanding, that is supposed to help lower max latency at the expense of higher average latency. This sounds reasonable, but how much effect does that really have? And is that a good thing? Wouldn't that have an impact on throughput which could lead to a snowball effect?

Plus, this is fundamentally racy if the number of threads has been tuned to be just enough to handle the load: in such case, spawning a new reserved thread becomes totally random and you still have the issue of a reserved thread idling out while there is a job waiting in the queue depending on the exact timing of this test vs the queuing of the job.

@lorban I understand your reluctance, so I've moved the javadoc changes to a different PR for 12.1.x and we can take our time on this one.

So here is my thinking:

Reserving a thread is always an optional optimisation. We can run with maxReservedThreads (as it should be called) of 0.

When running in a constrained environment of limited threads, then reserving a thread is less important than getting real work done.

Perhaps this should only be conditional on us reaching maxThreads. Prior to that it is better to start a thread to reserve it, than to avoid doing so. However, I doubt we will have threads queued (at least not for long) if we are at max threads.

I'm not sure I'm aligned with your view of the importance this optimization could have.

My fear is that a system that needs all its threads to serve a load with the optimized path might not be able to serve that same load with the unoptimized path, and would enter a spiral of death by switching to a slower code path when it reaches a certain load.

Maybe I'm over-worrying, but I think I won't be able to find peace of mind without seeing the results of some limit tests showing how QTP+AdaptiveExecutionStrategy behave when they have to handle a load too big for the thread pool for a short duration: do they temporarily degrade performance or do they collapse? Can they recover once the load goes down again or did they go to the sorry place, never coming back?

+1 for more perf/stress testing.... if only we knew somebody to do that :)

I was interesting that one of the tests that needed to be adjusted was checking how timeouts worked if jobs were queued in the QTP. The test failed because with this change, we could handle one more request than without it, as instead of being reserved a thread took one more job off the queue to execute. That felt like a good change to me.

lorban · 2025-06-04T08:08:39Z

Definitely, QTP's getters and javadoc need updating so I'm happy with that change.

But the not reserving a thread if there are queued jobs waiting part is IMHO problematic and gives me a feeling of a generic solution trying to solve a special case problem.

My view of the problem is that we have a finite resource (a number of threads) that is shared between different consumers (execution of 'normal' jobs and reservations for 'special' jobs) fighting in a concurrent context. And we're not happy that sometimes it's not the ideal candidate that wins the resource race so we'd like to introduce some extra fairness.

I'm very opinionated on this kind of tweaks as it's never possible to be totally fair without a serious reduction of performance, and unwanted side effects of tweaks that supposedly add some level of fairness on the cheap are hard to foresee.

jetty-core/jetty-util/src/main/java/org/eclipse/jetty/util/thread/ReservedThreadExecutor.java

jetty-core/jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java

…eservation

gregw · 2025-06-05T01:27:52Z

I've moved the javadoc changes to #13212 so they can get merged quickly.

Fix #13187 by: + Modified QTP's ReservedThreadExecutor so that it will not reserve a thread if there are queued jobs waiting. + Deprecating many meaningless getters that combined the concept of available for execute(task) with available for tryExecute(task). + Improved javadoc

…1.x/13187/qtpReluctantReservation # Conflicts: # jetty-core/jetty-http2/jetty-http2-tests/src/test/java/org/eclipse/jetty/http2/tests/BlockedWritesWithSmallThreadPoolTest.java # jetty-core/jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java

…eservation

Signed-off-by: Ludovic Orban <[email protected]>

lorban · 2025-06-10T11:01:15Z

I've written an attempt at a limit test to see how this change impacts performance, see the new AESLimit class.

On my machine, it seems that with this change:

the latency doubles up to the 99.9th percentile
then lowers by ~20% up to the 99.999th
then doubles again up to the max latency

but throughput increases by ~40%:

with the patch, AdaptiveExecutionStrategy reports pec=4_777_050,epc=2_376_119
without the patch AdaptiveExecutionStrategy reports pec=1_486_968,epc=3_178_115

Assuming the test I wrote is sane, it's clear that AdaptiveExecutionStrategy successfully trades throughput for latency and that the patch noticeably degrades that tradeoff. Since the test I wrote does almost nothing in the executed task, I can't say for sure if the extra throughput would be maintained when the executed task would suffer from cache misses due to the context change.

I'm not sure how deep we should dig in, but it seems apparent that if we value EPC, this change has a negative impact.

Signed-off-by: Ludovic Orban <[email protected]>

lorban · 2025-06-10T18:02:02Z

After discussing with @sbordet, we agreed that the AESLimit test was not measuring what it was supposed to. Another attempt had been made, and the fundamental end result is similar (even if the details are widely different): this change degrades latency overall.

We need to have an agreement about if the latest test is measuring the right things or not before moving on.

gregw · 2025-06-11T21:13:33Z

bumped to 12.1.1 at least

gregw · 2025-06-13T21:31:30Z

@sbordet @lorban There are only a few explanations for this PR causing increased latency:

The test/benchmark is wrong
The testing infrastructure is not stable and it was happen stance
The extra test to check if there are tasks waiting is expensive... but it is a lock free check, conducted only by a small minority of threads between jobs. Although - I guess it would be worthwhile to somehow measure how many threads running once were reserved, so we know how many are going to do this check
The reduction of available reserved threads is impacting performance. Which would be OK, as it indicates they are useful!

I currently think it is likely to be 1 or 2. But it would be interesting to measure 3.... if that can be done without adding additional latency :) It probably can be done with an inspection of the stack traces of all threads?

lorban · 2025-06-20T10:40:24Z

@sbordet @lorban There are only a few explanations for this PR causing increased latency:
1. The test/benchmark is wrong

It's always a possibility that the benchmark isn't measuring what we think it's measuring. Or that it's not representative of the reality.

2. The testing infrastructure is not stable and it was happen stance

Always a possibility too.

3. The extra test to check if there are tasks waiting is expensive... but it is a lock free check, conducted only by a small minority of threads between jobs.    Although - I guess it would be worthwhile to somehow measure how many threads running once were reserved, so we know how many are going to do this check

Here, the benchmark explicitly configures a QTP of 4 threads: 1 selector, 1 acceptor and 2 spares with a max reserved thread of 1 so that what's measured is what happens when the handling jobs are fighting against the reserved threads for one of the 2 spare threads.

Maybe we should not care too much about this constrained environment, but it was the original question that started this perf exercise.

4. The reduction of available reserved threads is impacting performance.  Which would be OK, as it indicates they are useful!

To be strictly precise: the reduction of available reserved threads is negatively impacting latency but seems it may have a positive impact on throughput. But I agree that we should and do value latency over throughput so I assume you saw the result of this benchmark as a degradation (assuming the benchmark is right, of course).

I currently think it is likely to be 1 or 2. But it would be interesting to measure 3.... if that can be done without adding additional latency :) It probably can be done with an inspection of the stack traces of all threads?

The benchmark can already do that on the cheap by adding the async profiler to the JMH profiler list. IIRC we did that and the impact wasn't related to any form of contention but rather on the fact that a reserved thread was less often available in the constrained environment of the benchmark.

gregw requested a review from sbordet June 4, 2025 02:04

gregw added this to Jetty 12.1.0 Jun 4, 2025

gregw linked an issue Jun 4, 2025 that may be closed by this pull request

Missing API to increase QueuedThreadPool maxThreads by leased threads + QoSHandler bug of exceeding maxRequestCount by 1 #13187

Open

gregw requested a review from lorban June 4, 2025 02:05

gregw mentioned this pull request Jun 4, 2025

Missing API to increase QueuedThreadPool maxThreads by leased threads + QoSHandler bug of exceeding maxRequestCount by 1 #13187

Open

gregw added 2 commits June 4, 2025 12:59

gregw requested a review from lachlan-roberts June 4, 2025 03:26

lorban requested changes Jun 4, 2025

View reviewed changes

scscgit reviewed Jun 4, 2025

View reviewed changes

jetty-core/jetty-util/src/main/java/org/eclipse/jetty/util/thread/ReservedThreadExecutor.java Outdated Show resolved Hide resolved

scscgit reviewed Jun 4, 2025

View reviewed changes

jetty-core/jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java Outdated Show resolved Hide resolved

Merge branch 'jetty-12.1.x' into fix/jetty-12.1.x/13187/qtpReluctantR…

3665629

…eservation

gregw mentioned this pull request Jun 5, 2025

QueuedThreadPool Improved javadoc #13212

Merged

gregw and others added 9 commits June 5, 2025 11:31

updates from review

79e99e6

updates from review

d607794

Merge branch 'jetty-12.1.x' into fix/jetty-12.1.x/13187/qtpReluctantR…

12b8f56

…eservation

add AES+QTP limit test

551a92e

Signed-off-by: Ludovic Orban <[email protected]>

update AES+QTP limit test to try to be more realistic

7ae33be

Signed-off-by: Ludovic Orban <[email protected]>

gregw added 2 commits June 11, 2025 11:50

Fixed tests

ccb4ee9

checkstyle

9635be9

gregw removed this from Jetty 12.1.0 Jun 11, 2025

gregw added this to Jetty 12.1.1 Jun 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

QueuedThreadPool will not reserve a thread with jobs waiting #13208

QueuedThreadPool will not reserve a thread with jobs waiting #13208

Uh oh!

gregw commented Jun 4, 2025

Uh oh!

lorban Jun 4, 2025

Uh oh!

gregw Jun 5, 2025

Uh oh!

lorban Jun 5, 2025

Uh oh!

gregw Jun 5, 2025

Uh oh!

lorban commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

gregw commented Jun 5, 2025

Uh oh!

lorban commented Jun 10, 2025

Uh oh!

lorban commented Jun 10, 2025

Uh oh!

gregw commented Jun 11, 2025

Uh oh!

gregw commented Jun 13, 2025

Uh oh!

lorban commented Jun 20, 2025

Uh oh!

Uh oh!

QueuedThreadPool will not reserve a thread with jobs waiting #13208

Are you sure you want to change the base?

QueuedThreadPool will not reserve a thread with jobs waiting #13208

Uh oh!

Conversation

gregw commented Jun 4, 2025

Uh oh!

lorban Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

gregw Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

lorban Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

gregw Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

lorban commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

gregw commented Jun 5, 2025

Uh oh!

lorban commented Jun 10, 2025

Uh oh!

lorban commented Jun 10, 2025

Uh oh!

gregw commented Jun 11, 2025

Uh oh!

gregw commented Jun 13, 2025

Uh oh!

lorban commented Jun 20, 2025

Uh oh!

Uh oh!