Increase default stack size limit on 64-bit systems #55185

MilesCranmer · 2024-07-20T16:01:33Z

This increases the default stack size limit on 64-bit systems from 4 MB to 8 MB, matching glibc and typical modern Linux and macOS machines, as well as the stack size limit of the root Julia process. Note that this is a limit rather than an allocation, and only results in memory usage if used. A larger limit by itself does not change the memory usage of Julia [1] [2].

Since the root task already has an 8 MB default limit, a different stack size limit in tasks can lead to some hard-to-debug errors in multithreaded code which can't be reproduced in serial versions.

#55184 will also help address this issue, so that a user can manually adjust their own stack limits with a documented API. However I think an 8 MB stack size limit is a better default, and matches the default in a variety of systems (you can check your system's default with ulimit -s). 64-bit systems have 16 exabytes of virtual address space available, and stack size limits do not inherently affect performance, see my note here. They only affect performance if one actually uses the larger stack size with deeper function calls.

Also see some stack size limit experiments here: https://discourse.julialang.org/t/experiments-with-julia-stack-sizes-and-enzyme/116511/2 which look at function nesting limits (which I have ran into myself when using AD libraries).

One alternative is for the stack size to be system-dependent, and computed based on the same information used by ulimit -s. However, I think this will make bugs harder to reproduce if workflows get close to stack size limits. A single limit across 64-bit systems seems reasonable (as is done currently).

Fixes #54998. cc @ViralBShah @nsajko

It seems like there is an 8 MB default for the signal stack too:

julia/src/signals-unix.c

Lines 40 to 41 in 3290904

    
           // 8M signal stack, same as default stack size (though we barely use this) 
        
           static const size_t sig_stack_size = 8 * 1024 * 1024;

vtjnash · 2024-07-20T17:05:03Z

This comes with a high risk of blowing up the page tables which blows up kernel memory usage on Linus and causes OOM kills with half the number of tasks. We could reduce this default, but increasing seems like a bad idea, as applications that benefit from it are likely to be more substantially benefited from a rewrite anyways

MilesCranmer · 2024-07-20T17:34:14Z

Edit: A stack size limit of 8 MB has no effect on OOM errors; see experiments in #55185 (comment).

This is virtual memory rather than physical memory, so it would only cause these issues if someone were to actually use a larger stack. But the limit itself just determines what is allowed/disallowed by a user.

By the way, I very much agree about rewriting code to avoid deep stacks. However, even Julia inference itself can get very deep – as seen in EnzymeAD/Enzyme.jl#1156 (comment) which can cause AD tools like Enzyme to run into stack overflows for normal code. (Hence why it would be nice for Task(f, n) to be documented in #55184)

I guess the main quality of life thing that would be nice with an 8 MB stack for tasks is that the root task in Julia already has an 8 MB stack. This inconsistency means that sometimes a bug you run into in multithreading code can't be reproduced by the serial version. (And depending on the package, sometimes the debug info doesn't make it clear that a stack overflow was hit).

I think this is actually why the bug I had in that Enzyme issue was hard to reproduce, as the stack size required for compilation was about 6 MB. So if I was the compilation was performed from the root task, it was fine, but if the compilation was performed from a Task, it would hit the stack overflow.

Maybe the best thing to do in the future would be to have the default Task stack as having 8 MB minus the stack size at the call site, so that the calling code can have stack overflows at exactly the same point. But for now I think 8 MB would be better just to match things up.

vtjnash · 2024-07-20T18:13:33Z

Julia inference is indeed one of those things that has needed to be re-written for several reasons, though it is a bit of a slog to fix that

MilesCranmer · 2024-07-20T19:03:25Z

Got it. I mean, in the meantime it would be nice to at least have #55184 so it's clear there are officially supported ways to work around such issues.

I do think it would be nice to have 8 MB stack in tasks to make limits consistent with the root task. Since it's virtual memory, there wouldn't be inherent performance changes, right? Or does libuv do anything different with virtual address space?

Otherwise, I guess it comes down to which is the worse footgun:

Running into a stackoverflow error in a thread, but not the root task.
Getting an OOM error due to actively using 2x deeper stacks.

Both are annoying but I think (1) is worse because it depends on whether a secondary thread or the root thread reaches a function first – since they have 2x different stack size limits. And having stack overflows in a secondary thread can result in confusing debugging info.

(2) seems less of an issue (though still problematic) because if you are hitting OOM errors with 8 MB stacks, you should already notice high memory consumption with a 4 MB stack.

(1) is easier to run into too – you only need a single deep stack, and be calling it from a single task. But (2) requires an additional condition – you have to also be spawning a lot of tasks, all of which are deep.

With a 8 MB stack size limit, we would basically swap (1) for (2). What do you think?

MilesCranmer · 2024-07-20T19:21:59Z

Also: OOM errors are loud, and show up in the root task. But if a secondary task is the only one to experience a stackoverflow, and the user doesn't explicitly have a istaskfailed in their code to check workers, they might end up with some tasks silently failing in the background

julia> function test_recursion_depth(maxdepth, depth=0, args...)
           depth >= maxdepth && return nothing
           print("\33[2K\rHello from depth $(depth)")
           test_recursion_depth(maxdepth, depth + 1, args...)
           print(devnull, "$(depth) $(args)")  # Just to prevent LLVM removing args
       end;

julia> test_recursion_depth(60_000)  # Works fine since root task is 8 MB
Hello from depth 59999
julia> t = Threads.@spawn test_recursion_depth(60_000)  # Crashes since thread is 4 MB
Task (runnable) @0x0000000280b0c1a0

Hello from depth 43325
julia>

Which means you might have non-root threads crash and not realise, apart from reduced performance.

vtjnash · 2024-07-20T19:51:18Z

Which means you might have non-root threads crash and not realise

That is also usually some sort of implementation bug, either with failing to call wait/fetch if the program is structured as a task nursery, bind if it interacts with a Channel, or calling Base.errormonitor if it is a background worker

This proposal does nothing to actually guarantee task space, as that may be already consumed by any arbitrary amount of other code already. So if the code is using recursion badly, then that needs fixed in the user code, as no amount of stack space will ever be sufficient to correct for it.

MilesCranmer · 2024-07-20T20:01:18Z

Of course – a footgun still requires the programmer to pull the trigger. Having smaller footguns is still a good thing though!

Two questions:

Why was a 4 MB stack size chosen?
In Julia, does a larger stack size limit, *by itself*, impact performance? Or is it only the active use of that limit, similar to other languages?

MilesCranmer · 2024-07-20T20:17:22Z

Even 8 MB is comparatively small to other languages when you consider the larger stack frame size in Julia. Here's C++:

#include <iostream>

void test_recursion_depth(long long maxdepth, long long depth = 0) {
    if (depth >= maxdepth) return;
    std::cout << "\33[2K\rHello from depth " << depth << std::flush;
    test_recursion_depth(maxdepth, depth + 1);
    std::cout << ""; // Prevent compiler optimizations
}

int main() { test_recursion_depth(1000000); }

which goes up to 174,271 on my machine. In Julia, the analogous code goes up to 86,649. And within a thread, it only goes up to 42,984.

oscardssmith · 2024-07-20T20:21:55Z

I think a lot of the reason Julia opts for a smaller stack is that when creating lots of tasks, you don't want too high of a memory footprint.

MilesCranmer · 2024-07-20T20:23:39Z

This isn't the stack size though, it's the stack size limit. Changing the limit by itself would have no effect on memory. See #54998 (comment)

MilesCranmer · 2024-07-20T20:39:48Z

Here, you can try for yourself by creating a 10 TB stack for a task:

julia> Task(() -> sleep(10), 10 * 1024 ^ 4)  |> schedule |> fetch

In other words, if you don't actually use the larger stack, there are no extra allocations.

At the same time, if you are launching many many tasks, those tasks probably do something small, and aren't going to make function calls 50,000 recursions deep (or else you would have other problems).

The benefit of a larger default task stack size limit is so you don't run into hard-to-debug errors like described above – due to the significant mismatch in stack size limits between root and secondary threads. Especially since Julia inference involves some deep recursive calls, sometimes it's not even the user's fault, and they end up with a stackoverflow in a thread without any error in the root process.

ViralBShah · 2024-07-21T06:02:06Z

Is it possible to make this runtime configurable for folks who need it / want to experiment with it, with appropriate warnings?

nsajko · 2024-07-21T07:07:31Z

if you don't actually use the larger stack, there are no extra allocations

Perhaps you're correct, I don't know, but I notice you ignored the first sentence by vtjnash above:

This comes with a high risk of blowing up the page tables which blows up kernel memory usage on Linus and causes OOM kills with half the number of tasks.

To try to properly test the statement by @vtjnash, I devised an experiment like this, trying to see for what values will I get OOM:

function make_task(stack_size::Int)
    f = () -> sleep(10)
    r = Task(f, stack_size)
    r.sticky = false
    r
end

function make_tasks(task_count::Int, stack_size::Int)
    [make_task(stack_size) for _ ∈ 1:task_count]
end

function run_tasks(tasks)
    foreach(schedule, tasks)
end

function experiment(task_count::Int, stack_size::Int)
    tasks = make_tasks(task_count, stack_size)
    run_tasks(tasks)
    tasks
end

I get this:

julia> experiment(30000, 4*1024*1024)
ERROR: OutOfMemoryError()
Stacktrace:
  [1] _Task
    @ ./boot.jl:523 [inlined]
  [2] Task
    @ ./task.jl:5 [inlined]
  [3] make_task
    @ ./REPL[1]:3 [inlined]
  [4] #3
    @ ./none:-1 [inlined]
  [5] iterate
    @ ./generator.jl:48 [inlined]
  [6] collect_to!
    @ ./array.jl:829 [inlined]
  [7] collect_to_with_first!
    @ ./array.jl:807 [inlined]
  [8] collect(itr::Base.Generator{UnitRange{Int64}, var"#3#4"{Int64}})
    @ Base ./array.jl:781
  [9] make_tasks
    @ ./REPL[2]:2 [inlined]
 [10] experiment(task_count::Int64, stack_size::Int64)
    @ Main ./REPL[4]:2
 [11] top-level scope
    @ REPL[5]:1

I think there may be a bug in the Task construction, because I think the system has plenty of memory left when OutOfMemoryError is thrown. Not sure what to take away from this.

MilesCranmer · 2024-07-21T20:55:22Z

That's a good idea.

Here is a modified version with printing so we can test the # of tasks before OOM errors:

const TASK_NUM = Ref(0)

function make_task(stack_size::Int)
    TASK_NUM[] += 1
    print("\33[2K\rHello from task ", TASK_NUM[])
    f = () -> sleep(10)
    r = Task(f, stack_size)
    r.sticky = false
    r
end

function make_tasks(task_count::Int, stack_size::Int)
    TASK_NUM[] = 0
    [make_task(stack_size) for _ ∈ 1:task_count]
end

function run_tasks(tasks)
    foreach(schedule, tasks)
end

function experiment(task_count::Int, stack_size::Int)
    tasks = make_tasks(task_count, stack_size)
    run_tasks(tasks)
    tasks
end

experiment(1000000, parse(Int, ARGS[1]) * 1024 * 1024)

with this I get

> julia --startup-file=no test_memory.jl 4  # 4 MB tasks (default)
Hello from task 29983
ERROR: OutOfMemoryError()

and with an 8 MB stack,

> julia --startup-file=no test_memory.jl 8  # 8 MB tasks
Hello from task 29983
ERROR: OutOfMemoryError()

So, it doesn't seem to change things

Increasing the stack size limit, I get identical behavior for:

Stack size limit per task	Tasks before OOM error
4 MB	29983
8 MB	29983
16 MB	29983
32 MB	29983
64 MB	29983
128 MB	29983
256 MB	29983
512 MB	29983
1024 MB	29983
2048 MB	29983
4096 MB	29983
8192 MB	16334

Then, when I reach 8192 MB of stack size limit per task, only then does the OOM starts to occur earlier, down to 16,334 tasks. I see the same behavior on Julia 1.6.7 through 1.11-rc1.

So, my feeling is that changing from 4 MB to 8 MB is pretty harmless, because the OOM error seems to be primarily a function of the number of tasks rather than the stack size limit per task (which shouldn't affect things anyways, unless we were asking for terabytes of address space per task. Virtual address space is huge).

I think it's a clear win for making debugging easier, as threads will not experience stack overflows 2x earlier than the root task. Wdyt?

MilesCranmer · 2024-07-21T23:25:06Z

With my PR #55201 we can also test this with Threads.@spawn, to see if changing the stack size limit there is any different from regular Tasks:

function make_task(stack_size::Int)
    TASK_NUM[] += 1
    print("\33[2K\rHello from task ", TASK_NUM[])
    Threads.@spawn reserved_stack=stack_size sleep(100)
end

The results are identical to the above, with regular Task objects, in that there is no effect on the maximum number of tasks for OOM, until you reach the 8 GB stack size-per-thread regime (1000x larger than what is being considered in this PR).

I tested --threads=1 and --threads=4, and there was also no difference. There was also no difference between :interactive and :default thread pools.

MilesCranmer · 2024-07-22T01:34:11Z

I have reproduced these experiments on a Linux machine with a very different memory profile than my mac. The number of tasks before OOM error is nearly the same (29,888 vs 29,982) – up to ~4,000 MB stack size limit per task.

At precisely a 3,742 MB stack size limit is the point at which I start to see a reduction in the max number of tasks. But before that, it's a flat 29,982 task limit on my linux machine.

So from these experiments it seems like an adjusted stack size of 8 MB per task does not actually change the occurrence of OOM errors, across Julia versions (tested v1.6.7 - v1.11.0-rc1) and operating systems (tested Linux and macOS – both 64-bit).

cc @vtjnash

MilesCranmer · 2024-07-23T14:21:49Z

Also… should this be flagged in a bug report? 30,000 tasks before OOM error seems small, no? And clearly the main contributor is not the stack size, must be something else.

PallHaraldsson · 2024-07-23T17:03:00Z

Why limit this to 64-bit systems? Given on 32-bit: "glibc i386, x86_64 7.4 MB".

Is it to limit testing? It seems if good for 64-bit should also be for 32-bit, since only a limit.

MilesCranmer · 2024-07-23T18:10:45Z

So, 32-bit systems have only 2^32 bytes of available virtual address space, which is 4 GB. This means that stack space limits are actually something to worry about for 32-bit. Even the current value of 2 MB is perhaps a bit large on 32-bit.

However, 64-bit systems have 2^64 bytes of available virtual address space, which is 16 exabytes. Basically it's so large we don't need to worry about it at all. The practical reason for setting the default stack size limit to number of MBs is to discourage users from using large stacks. But since the root task in Julia already has an 8 MB stack size limit, it would make life much simpler if threads have 8 MB stack size limits too.

fingolfin

This seems sensible to me and I see no real downside. All arguments against that I heard so far seem (up to my understanding -- of course I may have misunderstood something) to either miss that this is about a limit, not actual allocations; discuss hypothetical issues (e.g. with OOM) that are disproven by experiments (and to make also don't make sense from a theoretical point of view). And while I agree that if you need many tasks with large stacks then perhaps you need to go back to the drawing board, I think this kind of argument can equally be brought against a 4 MB stack (why not make the stack just 2MB, matching 32 bit system, or just 1 MB?). Given this flexible nature, I find the argument "this matches the default stack size of the main thread on most 64bit systems" to be very compelling.

ViralBShah · 2024-07-25T13:34:56Z

I also feel that if we merge this now - it is still early in the 1.12 release cycle, and we will have enough time to react to issues, or undo if necessary.

vtjnash · 2024-07-25T14:23:12Z

A smaller number would also be beneficial, since it would permit doing more optimizations. We are at sort of an unfortunately large size right now, where the optimizations possible aren't quite as substantial.

Considering vtjnash's comment

LilithHafner · 2024-07-25T14:32:17Z

How substantial are those possible optimizations?

vtjnash · 2024-07-25T16:54:03Z

It doesn't fix his issue though, it merely slightly pushes off when the underlying issue needs to be fixed, and in the meantime makes the time for it to finally crash slightly longer, the eventual stacktrace slower, and more likely for tooling to fail that tries to point at the actual cause of failure (as they tend to have limits in the 10k frame ranges).

MilesCranmer · 2024-07-25T22:47:00Z

Can we measure these things? Since the effect on OOM errors evidently does not appear until 1000x increases in stack size limits, perhaps the same will be true for these other theoretical optimizations.

With #55184 merged I think it’s good to note that this is just the default behavior. If someone needs custom stack size limits, for some nonstandard OS where having small stack size limits is very important, there’s now a documented way for an advanced user to do that. But otherwise I think the default stack size limit should be the same as the root task, otherwise it’s a needless footgun.

MilesCranmer · 2024-07-26T00:56:37Z

Ok I just spun up a Windows machine in AWS to test this. Running the code from this, The results are as follows:

Stack size limit per task	Tasks before OOM error
1 MB	499
2 MB	499
4 MB	499
8 MB	499
10 MB	499
12 MB	499
13 MB	378

So, again, it seems that changing from 4 MB default to 8 MB default does not effect the OOM error. Something else, other than the stack size limit, is the main cause.

If Windows is still a concern though, maybe a compromise is that we could raise it to 8 MB on Linux and macOS, and leave it at 4 MB on Windows?

(That being said, the maximum number of tasks on Windows before an OOM is tiny regardless of the stack size limit... Any idea why? Is this just an anomalous measurement from my cloud environment?)

MilesCranmer · 2024-07-26T01:15:09Z

this PR which will Tasks less reliable on Windows machines (where the OOM is more aggressive)

Why is this the case? This doesn't seem supported by the above experiments. Does Julia does something non-standard with call stacks that can cause reliability issues when asking for more virtual address space?

Maybe you could share an example of this behavior so we can analyze it?

MilesCranmer · 2024-07-26T01:34:08Z

For the record – I would also be reasonably content with a 4MB/4MB stack size limit for both the main thread and secondary threads. But, of course, that's not possible, because 8MB is the default for the main thread in most modern systems. And I think 8MB/8MB is much better than a 8MB/4MB which is a big footgun.

I think the very best solution would be to have the stack size of a Task be operating system dependent using the system call version of ulimit -s so that you always have this consistency. But that would be a breaking change due to the reduced stack size limit on some systems (and would cause reproducibility issues from different behavior on different systems), so would have to wait for 2.0. So before that point, I think assuming 8 MB/8 MB makes a lot of sense as a default, and it's also not a breaking change, since it's just a higher limit for threads.

nsajko · 2024-07-26T05:15:44Z

@MilesCranmer regarding our experiments above, they are not valid, because setting the second parameter of the Task constructor to a nonzero value causes eager allocation of the stack, see:

#55005 (comment)

This also explains the overly pessimistic results.

MilesCranmer · 2024-07-26T12:09:31Z

That makes much more sense, thanks! I'll try with a custom build.

MilesCranmer · 2024-07-26T14:14:08Z

New experiments:

const TASK_NUM = Ref(0)
function make_task()
    TASK_NUM[] += 1
    TASK_NUM[] % 1000 == 0 && print("\33[2K\rHello from task ", TASK_NUM[])
    r = Task(() -> sleep(10^10))
    r.sticky = false
    r
end
experiment(task_count::Int) = (TASK_NUM[] = 0; [make_task() for _ ∈ 1:task_count])
experiment(1000000000)

Now, rather than using Task(f, n), I actually re-build Julia for each stack size. Here are the experiments, on a Linux machine with 50GB of physical memory:

`JL_STACK_SIZE`	Tasks before OOM error
4 MB	92.7 million
8 MB	92.7 million
128 MB	92.7 million
1024 MB	92.7 million

So I basically get the same number of tasks, within statistical noise, before OOM.

giordano · 2024-07-26T14:25:07Z

I actually re-build Julia for each stack size.

If that helps with your experiments, if you want to only change

julia/src/options.h

Line 113 in 1dee000

#define JL_STACK_SIZE (4*1024*1024)

I think you should be to just run make -j -C src, instead of rebuilding whole of Julia.

MilesCranmer · 2024-07-26T15:19:26Z

Thanks.

I also tried with an explicit schedule of the tasks. In that case it is a flat 15.9 million tasks before OOM from 4MB default to 16MB default stack size limit – but seems like no statistically significant difference.

JeffBezanson · 2024-07-26T18:36:08Z

If we really wanted to, we could probably manage to limit the stack size of the main task to something less than the default, but I don't see a really compelling reason that 4mb is the right size, so we might as well go with this.

vtjnash · 2024-07-26T22:49:17Z

reasonably content with a 4MB/4MB stack size limit for both the main thread and secondary threads. But, of course, that's not possible

That is just an rlimit, which is trivial to set from Julia (normally we only query it), though I don't see much value in doing so

So I basically get the same number of tasks, within statistical noise, before OOM

That is because Julia stops honoring it once you hit about 10k Tasks and changes to a different internal Task mode (slow, because we cannot optimize it while this limit is above 2MB), because it is trying to avoiding a different limit in the kernel

vtjnash · 2024-07-26T22:50:29Z

it's a flat 29,982 task limit on my linux machine

That is a Julia-imposed limit, not a kernel limit, so you aren't really measuring what you set out to measure

MilesCranmer · 2024-07-26T23:25:37Z

@vtjnash here are the new experiments after our realization that Task(f, n) eagerly allocates the stack: #55185 (comment)

vtjnash · 2024-07-27T02:35:47Z

Right now we just fail to allocate them entirely after about 10k (the exact number is OS depending) and have some tricks to lie about it instead so that you don't notice too easily. This lie is necessary because the 4MB default is just a little too large to effectively pool the allocations. This is all transparent to you in your tests, as it was supposed to be hidden from you, but it does negate your attempts to test it

MilesCranmer · 2024-07-27T12:18:53Z

I’m not sure I follow. If the testing has an issue perhaps it would be easier if you describe how it should be modified? Earlier you explained how a larger stack size limit could increase occurrence of OOM errors, so these experiments are designed to test the magnitude of that potential problem. So far it seems to not be an issue in practice though.

ViralBShah · 2024-07-27T12:50:30Z

IIUC, if we are not allocating after 10k threads, then what is the harm to increase the limit per thread anyways?

@MilesCranmer Just out of curiosity, if we reduce the main thread to 4MB also, does it fix the enzyme crash?

MilesCranmer · 2024-07-27T13:31:32Z

@MilesCranmer Just out of curiosity, if we reduce the main thread to 4MB also, does it fix the enzyme crash?

I can check. My guess is that it would just cause the type inference stack overflow to show up in the main thread in addition to the secondary, which would be much easier for debugging.

Basically the 2x mismatch in stack size limits between the main thread and secondary threads resulted in this race conditioned stack overflow that I found quite challenging to debug. The difficulty was compounded by caching which meant the secondary thread could run fine so long as the main thread was the one to compile the code first. (This was the main trigger for me making this PR.)

A stack size limit of 1 MB, 2 MB, etc, would all be fine, so long as they are the same as the secondary thread stack size limit. This PR sets it to 8 MB since that’s the usual main thread stack size limit, but in principle I’d be ok with reducing both. It’s just that reducing the main thread stack size would be a major breaking change so is Julia 2.0 material.

And also from all the experiments I’ve ran, it seems Julia’s call stacks are normal in that they only allocate physical memory when used, not when they are instantiated. So, based on the numbers, there seems to be no downside.

JeffBezanson · 2024-07-30T16:58:57Z

I agree we should make stack sizes the same for all tasks; we don't have to promise anything about whether you get stack overflows, but it should at least not depend on which task the code runs on.

reducing the main thread stack size would be a major breaking change

This is interesting to think about tangentially --- I personally don't think changes involving resource use can count as breaking changes. For example, the representation of an object getting larger could lead to more OOMs, but changing the representation of an object is clearly allowed at least as a minor version change. A reasonable stack size decrease, to me, would be similar.

MilesCranmer · 2024-07-30T18:10:06Z

Yeah, I guess there's sort of a blurred decision boundary here. The main thread stack size of 8.0 MB changing to 7.5 MB seems in line with normal minor version changes, whereas 8.0 MB to 1.0 MB would mean some libraries that rely on deep stacks may need to be rewritten entirely.

vtjnash · 2024-07-30T19:59:52Z

Fwiw, it is actually abnormal for others threads to have as much stack space as the main thread. In glibc the default is potentially as low as 16kb! So if you use threads, at least some subset of your tasks may have much smaller stacks that are impossible to increase

oscardssmith · 2024-07-30T20:26:56Z

I think we're in a very different situation here since Julia doesn't let you control (natively at least) what thread your task spawns on.

MilesCranmer · 2024-07-30T20:29:39Z

glibc the default is potentially as low as 16kb

Also, from the post you refer to it looks like this is from an old operating system HP-UX, where I am guessing the main thread size is probably equivalently tiny.

JeffBezanson · 2024-08-02T20:24:14Z

One problem here is that, while we'd all in theory be OK with reducing thread 1's stack to make them consistent, we can't do that because the user might have manually set the limit with ulimit to make their program work. There might also be problems with e.g. java or native interop if we artificially limit the thread 1 stack. So that means either (1) we are permanently stuck with 8mb stacks, or (2) we have to accept some tasks having different stack sizes.

Based on the experiments here so far, (1) looks fine, but 8mb is really very big compared to how much stack typical programs use. It's entirely possible we will get some advantages in the future by reducing the size. For example we currently allocate one stack at a time with mmap, but we should be pooling them and allocating many in a single mapping. At that point, decreasing the stack size directly increases the number of tasks we can efficiently handle. Unfortunately we won't have those performance numbers until that is actually implemented. But I'm not sure it will always be the best tradeoff to promise consistent stack sizes for all tasks.

JeffBezanson · 2024-08-06T21:42:37Z

I will merge this for now in case it's helpful to anyone. However in the future we may try to optimize tasks further, and if we can show good numbers from shrinking the stacks again we'll consider doing it.

MilesCranmer · 2024-08-07T17:28:33Z

Sounds very reasonable to me

This increases the default stack size limit on 64-bit systems from 4 MB to 8 MB, matching glibc and typical modern Linux and macOS machines, as well as the stack size limit of the root Julia process.

increase default stack size on 64-bit systems

f374e0b

MilesCranmer mentioned this pull request Jul 20, 2024

Increase stack size limit to 8 MB? #54998

Closed

Merge branch 'master' into increase-stack-size

74db28b

This was referenced Jul 21, 2024

Make stack size configurable in Threads.@spawn #55200

Open

Enable reserved_stack argument to Threads.@spawn #55201

Open

fingolfin approved these changes Jul 25, 2024

View reviewed changes

LilithHafner previously approved these changes Jul 25, 2024

View reviewed changes

JeffBezanson approved these changes Jul 26, 2024

View reviewed changes

JeffBezanson merged commit e4678ab into JuliaLang:master Aug 6, 2024
12 of 13 checks passed

MilesCranmer deleted the increase-stack-size branch August 7, 2024 17:27

	// 8M signal stack, same as default stack size (though we barely use this)
	static const size_t sig_stack_size = 8 * 1024 * 1024;

Increase default stack size limit on 64-bit systems #55185

Increase default stack size limit on 64-bit systems #55185

Conversation

MilesCranmer commented Jul 20, 2024 • edited Loading

vtjnash commented Jul 20, 2024

MilesCranmer commented Jul 20, 2024 • edited Loading

vtjnash commented Jul 20, 2024

MilesCranmer commented Jul 20, 2024 • edited Loading

MilesCranmer commented Jul 20, 2024 • edited Loading

vtjnash commented Jul 20, 2024

MilesCranmer commented Jul 20, 2024 • edited Loading

MilesCranmer commented Jul 20, 2024 • edited Loading

oscardssmith commented Jul 20, 2024

MilesCranmer commented Jul 20, 2024 • edited Loading

MilesCranmer commented Jul 20, 2024 • edited Loading

ViralBShah commented Jul 21, 2024

nsajko commented Jul 21, 2024

MilesCranmer commented Jul 21, 2024 • edited Loading

MilesCranmer commented Jul 21, 2024 • edited Loading

MilesCranmer commented Jul 22, 2024 • edited Loading

MilesCranmer commented Jul 23, 2024

PallHaraldsson commented Jul 23, 2024

MilesCranmer commented Jul 23, 2024 • edited Loading

fingolfin left a comment

Choose a reason for hiding this comment

ViralBShah commented Jul 25, 2024

vtjnash commented Jul 25, 2024

LilithHafner commented Jul 25, 2024

vtjnash commented Jul 25, 2024

MilesCranmer commented Jul 25, 2024 • edited Loading

MilesCranmer commented Jul 26, 2024 • edited Loading

MilesCranmer commented Jul 26, 2024 • edited Loading

MilesCranmer commented Jul 26, 2024 • edited Loading

nsajko commented Jul 26, 2024

MilesCranmer commented Jul 26, 2024

MilesCranmer commented Jul 26, 2024 • edited Loading

giordano commented Jul 26, 2024

MilesCranmer commented Jul 26, 2024 • edited Loading

JeffBezanson commented Jul 26, 2024

vtjnash commented Jul 26, 2024 • edited Loading

vtjnash commented Jul 26, 2024

MilesCranmer commented Jul 26, 2024

vtjnash commented Jul 27, 2024

MilesCranmer commented Jul 27, 2024 • edited Loading

ViralBShah commented Jul 27, 2024

MilesCranmer commented Jul 27, 2024 • edited Loading

JeffBezanson commented Jul 30, 2024

MilesCranmer commented Jul 30, 2024 • edited Loading

vtjnash commented Jul 30, 2024

oscardssmith commented Jul 30, 2024

MilesCranmer commented Jul 30, 2024 • edited Loading

JeffBezanson commented Aug 2, 2024

JeffBezanson commented Aug 6, 2024

MilesCranmer commented Aug 7, 2024

MilesCranmer commented Jul 20, 2024 •

edited

Loading

MilesCranmer commented Jul 20, 2024 •

edited

Loading

MilesCranmer commented Jul 20, 2024 •

edited

Loading

MilesCranmer commented Jul 20, 2024 •

edited

Loading

MilesCranmer commented Jul 20, 2024 •

edited

Loading

MilesCranmer commented Jul 20, 2024 •

edited

Loading

MilesCranmer commented Jul 20, 2024 •

edited

Loading

MilesCranmer commented Jul 20, 2024 •

edited

Loading

MilesCranmer commented Jul 21, 2024 •

edited

Loading

MilesCranmer commented Jul 21, 2024 •

edited

Loading

MilesCranmer commented Jul 22, 2024 •

edited

Loading

MilesCranmer commented Jul 23, 2024 •

edited

Loading

MilesCranmer commented Jul 25, 2024 •

edited

Loading

MilesCranmer commented Jul 26, 2024 •

edited

Loading

MilesCranmer commented Jul 26, 2024 •

edited

Loading

MilesCranmer commented Jul 26, 2024 •

edited

Loading

MilesCranmer commented Jul 26, 2024 •

edited

Loading

MilesCranmer commented Jul 26, 2024 •

edited

Loading

vtjnash commented Jul 26, 2024 •

edited

Loading

MilesCranmer commented Jul 27, 2024 •

edited

Loading

MilesCranmer commented Jul 27, 2024 •

edited

Loading

MilesCranmer commented Jul 30, 2024 •

edited

Loading

MilesCranmer commented Jul 30, 2024 •

edited

Loading