V Concurrency Considerations #11649
Replies: 91 comments 7 replies
-
I very very much want to see channels in V! They're one of the best ideas in Go; they're both simple and powerful, which is rare. |
Beta Was this translation helpful? Give feedback.
-
@ylluminate See related discussion: #1868 |
Beta Was this translation helpful? Give feedback.
-
@elimisteve |
Beta Was this translation helpful? Give feedback.
-
@ylluminate @cristian-ilies-vasile @elimisteve thank you for summarizing the high level goals/requirements - they're pretty much what I envision 😉. I'll keep an eye on this to see how will everything evolve. |
Beta Was this translation helpful? Give feedback.
-
An amendment:
should read:
|
Beta Was this translation helpful? Give feedback.
-
@dumblob |
Beta Was this translation helpful? Give feedback.
-
https://golang.org/doc/go1.14#runtime |
Beta Was this translation helpful? Give feedback.
-
@cristian-ilies-vasile that's definitely interesting (see specification), but it's still by far not fully preemptible (therefore it's called asynchronously preemptible, because it actually still just inserts safe points, but now starting from go 1.14 they'll be inserted in many more places but still carefully enough to not increase the overhead much - actually they tuned it so that the overhead is in majority of cases not measurable - though in some cases the overhead seems to be enormous as seen from this benchmark and it's equivalent in plain C which doesn't suffer from any such bottlenecks). It's also good to note here, that the implementation of "asynchronously preemptible" in go is really tricky and is definitely not universally applicable (hehe) to other languages (e.g. because of the problem on Windows which they kind of "solved" for go semantics and internals). Though I generally like the go design with "safe points" - I deliberately didn't go into details like this when describing the "interleaving between concurrency and parallelism" as outlined in #1868 (comment) because it's a lot of work with "little benefit" (as you see even go lang tackles preemptiveness - and to date still just partially - first now after many years of development by hundreds of engineers with many of them full time go devs). |
Beta Was this translation helpful? Give feedback.
-
Not sure if this adds any value to this conversation but ScyllaDB is build on top this async framework https://github.com/scylladb/seastar Maybe instead of fibers etc. this kind of library solution could be used and added to vlib? |
Beta Was this translation helpful? Give feedback.
-
Regarding structured concurrency, I would suggest reading https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/ first, which explains reasons to use structured concurrency instead of go-style concurrency. |
Beta Was this translation helpful? Give feedback.
-
@pquentin thanks for the pointer - didn't know about the Trio concurrency library (and the "nursery concept"). It's a very good approach to concurrency programming indeed. In the context of V I have these observations:
|
Beta Was this translation helpful? Give feedback.
-
@pquentin Interesting concept, but preventing goroutines from being spawned without permitting execution of the main thread/goroutine is extremely limiting and would prevent a huge percentage of the value gained from having them in the first place. The argument made against goroutines is honestly a severe straw man, as the normal and common way to achieve a nursery-like pattern in Go is to use a WaitGroup, or to pass a "done channel" ( The fact that general solutions exist in the ecosystem is eventually admitted in the middle of the 4th footnote:
That said, the point about stack traces only going back to the point where the goroutine was launched, rather than going all the way back to |
Beta Was this translation helpful? Give feedback.
-
I disagree. The argument against unstructured Go is not a straw man, just like the arguments against "goto" a generation years ago weren't straw men. Enforced structure allows new ways of reasoning about your code, thus enabling you to code the Happy Eyeballs algorithm in 40 lines of code instead of 400. This directly translates to fewer bugs. Yes, cool, the ecosystem has solutions, but if it's way easier to not use them (e.g. because they require a heap of boilerplate in every function call, can't handle cancellation, and whatnot) they're ultimately not helpful. In Trio, "waiting for your child tasks" and "leaving the scope of a nursery" is exactly the same thing and requires zero lines of code (just the end of a block, in Python that's a de-indent), which again helps reduce the bug count and reduces the cognitive load on the programmer. @dumblob The In current V (or Go) you'd use a
with
IMHO that second line shouldn't exist. Not for files, not for locks, and not for nurseries. In Go, starting a goroutine is as easy as it gets – but cleaning it up is not and needs to be done manually, esp when error handling is involved. Making coroutine creation slightly more difficult (by requiring a scoped nursery object to attach them to) is a no-brainer when the dividend you get from this is trivial cleanup.
I don't understand that sentence. |
Beta Was this translation helpful? Give feedback.
-
Exactly that's the reason why I'm curious how would we implement the long running "background" tasks using the nursery concept in V as there are currently no means in V guaranteeing the existence of a proper |
Beta Was this translation helpful? Give feedback.
-
Having no guarantees is detrimental to a whole lot of other constructs too. Locks for instance, or not leaving a file handle open when I'm done with using it. So that's not an argument against nurseries, that's an argument for statically-bounded-visibility of objects – with a destructor that is guaranteed to be called exactly once. As V doesn't seem to have that concept, perhaps the first step should be to add it. |
Beta Was this translation helpful? Give feedback.
-
@shelby3 How, specifically? |
Beta Was this translation helpful? Give feedback.
-
Consider that a variable maintaining a value for security is being left in an indeterminate state from a data race. Targeted parallel calls cause races, causing variable to go undetermined, causing access to be granted when it shouldn't be. Attack may need to be sustained depending on likelihood of success. Memory is often reused in strange ways, the variable may not even be part of the race directly. Undefined behaviour, is undefined including leaving the vault unlocked. |
Beta Was this translation helpful? Give feedback.
-
The first 250 simple tasks successfully ran on the R U S H proof of concept (inspired by Andreas Prell's PhD thesis) . fn x_sum(a int, b int, output_ch chan string) { // original function
x:= a + b
output_ch <- ''
output_ch <- strconv.v_sprintf("x_sum= %04d", x)
output_ch <- ''
} |
Beta Was this translation helpful? Give feedback.
-
@cristian-ilies-vasile could you put all the sources to one gist to make it easier to comment on and discuss it (with the nice side effect of not polluting this thread)? Thanks 😉. |
Beta Was this translation helpful? Give feedback.
-
A recent note on the above-discussed topic of the difference between IO-bound computation and CPU-bound workloads by the author of Weave: https://nim-lang.org/blog/2021/02/26/multithreading-flavors.html |
Beta Was this translation helpful? Give feedback.
-
@dumblob o Achieving 11M IOPS & 66 GB/s IO on a Single ThreadRipper Workstation https://tanelpoder.com/posts/11m-iops-with-10-ssds-on-amd-threadripper-pro-workstation The new wave of thinking: One thread-per-core architecture |
Beta Was this translation helpful? Give feedback.
-
These platforms (and many other) need to be programmed though in some language which needs to do all the heavy lifting 😉.
Hm, how is this different from what we've discussed so far (i.e. one os-thread per physical core and then millions of fibers/tasks per such os-thread using work stealing with some hacks providing preemptivness to avoid starvation and some level of fairness)? |
Beta Was this translation helpful? Give feedback.
-
It is different a little bit.
That means once a task is put on a CPU, you cannot suspend that, just wait to finish. For a lot of tasks the load on each resource is fairy distributed due to work stealing algo. |
Beta Was this translation helpful? Give feedback.
-
If it's really like this, then this has a serious issue with starvation - imagine writing few tasks each with an infinite loop (accidentally). It's also "not fair" and thus no real-time guarantees (required by games, audio/video calls, etc.) can be made. Weave has some counter measures (has some "good enough" fairness and if it's not enough, it provides a shared input queue; regarding starvation I think Weave doesn't do anything, but there is an easy solution with But if you're talking about something else than Weave, then this starvation and unfairness is a show stopper for a default concurrency backend in V. |
Beta Was this translation helpful? Give feedback.
-
@dumblob Is it clear how much overhead Weave requires in order to do more sophisticated scheduling? |
Beta Was this translation helpful? Give feedback.
-
Yes - just see the benchmarks it Weave repo. It's actually basically fully amortized (not kidding!). So the only question is always: which Weave parameters should I choose for this particular set of tasks in order to trade (tail) latency for throughput and vice versa. Weave defaults are though pretty much usable even for games, so this parameter choosing shall happen only in extreme scenarios (e.g. someone writes her own kernel or audio/video signal processing server like JACK, etc.). |
Beta Was this translation helpful? Give feedback.
-
@dumblob I put here some non audited figures, just to have a perspective. Weave uses the word worker and I use the word robot for the same piece of code. A - load on each robot
B - tasks distribution spread by robot's state
|
Beta Was this translation helpful? Give feedback.
-
On the way to an ultra low overhead equal-priority hard real time preemptive schedulerMotivated (and surprised) by https://twitter.com/v_language/status/1388154640768851969 I've decided to finally research and write down my ideas I have in my head for about a year. TL;DR It turns out V could have zero overhead full preemption on Linux and Windows As described in #1868 (comment) V could (should!) leverage combining cooperative scheduling of "go routines" among a dynamic pool (1:1 to number of currently powered on CPU cores) of system-level threads ("os threads") with os-triggered preemption (bare metal has counter/clock interrupt). This os-triggered preemption ("interrupt" - e.g. a signal) is though costly if not necessary (with cooperative scheduling majority is not necessary). It's the costlier the more real time responses are needed which is typical for low-latency scenarios like audio, video, games, etc. - all that are coincidentally domains which also utterly seek high-performance languages to leverage the HW up to the last bit and watt. And that's the domain where V shines or wants to shine. Now, when The idea is, that user space can write to kernel space (thanks to eBPF) without any interrupt nor any context switch nor any locking nor any other restriction. Just write at the pointer address basically. eBPF app (to whoms kernel space memory the user space app can arbitrarily write) can be then run each time a kernel timer(s) fires (doesn't matter who has set up this timer - whether some kernel driver or some other user space app or whatever - as we just care about being woken up "frequently enough" - in case of If there'll be a free running counter in eBPF app kernel space incremented only by a given user space app thread and the eBPF app will maintain a "cached" copy of this counter from last time the eBPF app ran together with monotonic timestamp of the cached value, then by comparing this timestamp it can decide whether it's already too long the user space app thread didn't increment the counter and can organize an interrupt of the given user space app thread. The user space app thread increments the counter each time the cooperative scheduling on that particular CPU (virtual) core happens. Note, that no slow atomic operations (reads, increments, etc.) are needed here as we don't care about the edge cases as in the worst case it'll just fire the interrupt negligibly more frequently. Of course, for thousands of CPU cores there might arise the need to somehow increase efficiency of the eBPF app - maybe by grouping the timer interrupts (though hrtimers in Linux should support tens and hundreds of thousands of timers without performance degradation), but that's just an implementation detail. Why should this be possible? Because eBPF nowadays supports:
Implementation note: on *nix systems sending an interrupt signal to the given thread seems to be a bit more work than expected (first the signal is being sent to the process and the signal handler will most then most of the time need to "distribute" the signal to the appropriate thread: signal handlers are per-process but signal masks are per thread). Note, this is a generic solution and I'm certain readers from other projects (incl. Go lang, D lang, Nim, Python, ... and other languages and frameworks) might jump on this and try to implement it as low hanging fruit (hey, an attribution would be nice guys 😉). @mintsuki @UweKrueger @cristian-ilies-vasile thoughts? Additional notes
|
Beta Was this translation helpful? Give feedback.
-
Interesting - just 3 days after I published the above sketch, Microsoft announced support for eBPF directly in Windows - see https://github.com/Microsoft/ebpf-for-windows . And if they'll support some timer hooks (and I'm 100% sure they will), then I think there is no excuse to not writing a scheduler leveraging that sooner or later 😉. |
Beta Was this translation helpful? Give feedback.
-
Hello Everyone, I am super interested in trying out V to build something that uses the actor model like elixir. |
Beta Was this translation helpful? Give feedback.
-
Transferred from a document posted here in these documents by @cristian-ilies-vasile:
V concurrency high level design
After a carefully consideration of proposed concurrency models and suggestions expressed on discord v-chat channel the high level design is based on GO language model (message passing via channels) which in turn is a variation of communicating sequential processes.
I did read papers on actor model but seems that coders do not use all the primitives provided by language and resort to threads and queues (see Why Do Scala Developers Mix the Actor Model paper).
Almost all high level requirements are taken from Proper support for distributed computing, parallelism and concurrency published on github.
Because there are many words used more or less interchangeably like coroutine, goroutine, fiber, green threads etc I will use the term green thread.
Key points
Open points
Beta Was this translation helpful? Give feedback.
All reactions