see if per-thread hashmap is better #4

chc4 · 2022-12-18T06:44:22Z

currently we have threads just send possible garbage or definitely not garbage updates via a channel to the collector. this works, and is easy, but has a few bad things

collector is a bottleneck for high churn - if we have a bunch of mutators, they might generate traffic faster than we can process it, especially if cpu threads become oversubscribed.
when doing a cycle collection we aren't draining the channel, which means if we take too long the mutators will start blocking...even if all the traffic is just definitely not garbage updates.

instead we should probably just have a pseudo-thread_local hashmap that mutators locally update, and when doing a collection snapshot all the hashmaps from each thread and use that at the deferred map.
this has the benefit of allowing mutators to not block on the collector taking too long, and always be able to optimistically remove definitely not garbage from the deferred set. downside is just if you create objects and create+drop references on n threads it will allocate n map entries - that's sufficiently pathological that i think it's worth not worrying about.

idk if the snapshotting needs a full concurrent hashmap (evmap? flashmap?) or i can just use im::hashmap and clone the hashmap, update map, update pointer.

The text was updated successfully, but these errors were encountered:

chc4 · 2022-12-18T17:30:24Z

this is actually kinda hard because once we do a collection we need to remove all the now-dead Weak items from each hashmap, meaning we probably need an actual concurrent hashmap since we need multiple writers. luckily there's no ABA problem since the Weak's pointer value wont be reused until all existing Weaks are removed; unfortunately that also means that the allocation sticks around until we get rid of all Weaks, wasting memory.

chc4 · 2022-12-18T17:45:42Z

https://www.mikeash.com/pyblog/friday-qa-2017-09-22-swift-4-weak-references.html is maybe relevant for the Weak dead objects issue, where they have a scheme that is able to immediately free the allocations using sidetables

chc4 · 2022-12-18T18:33:39Z

another scheme would be having a per-thread arena, and when dropping put the object on its owning arena's deferred list, and when a thread exits transfer the arena ownership to another random(?) thread. then objects would only be on one deferred list at a time, and clearing freed objects is easy. would require unsafe, though, and using a custom allocator plus copying Arc from stdlib since it isnt parameterized over an Allocator

playXE · 2022-12-19T07:26:31Z

Hi! Have you thought about implementing space-time scheduling? It is used in JSC and Shenandoah, although they are tracing GCs it should work in your case as well. The design is very simple: when allocation happens you might want to suspend mutator if allocation size exceeds budget. Might be efficient to implement TLAB for this because checking for budget on every allocation is expensive.

chc4 · 2022-12-20T01:09:28Z

i think space-time scheduling as usually implemented might be a bit of a weird fit. riptide and other systems need it because mutators can infinitely prolong the mark/sweep by making the marking frontier go backwards, so it bounds the amount of time. with cycle collection we don't double visit objects ever, or add new objects from the mutator update channel, so the frontier never retreats, and don't even want to prevent mutators from allocating since it is also possible for them to free on their own as well, as long as the objects aren't cyclic.

there's probably something similar we could do, with having mutators add edges (and transition objects NONE->VISITED along with it)? where mutators timeshare with the collector via workstealing or something in order to avoid blocking everything on one thread...

chc4 · 2022-12-20T02:05:55Z

the problem with having the mutators try to cooperate with the collector for visiting objects is that they'd have to add the weak roots to somewhere the collector can visit them after, in order to clear the flags at the end. and concurrent hashmaps aren't great, especially under high mutation loads (which this would be). you could make have it transition VISITED -> (VISITED & DIRTY) recursively instead of only the single object on drop if the send on the channel would block, but not NONE -> VISITED that would require adding roots, but that only helps the liveness analysis and not the actual visitation.

chc4 · 2022-12-20T02:08:14Z

you can do an intrusive linked list in the Gc objects in order to make the entire visited list globally available instead of the deferred list globally available, i guess...then you don't have a problem of clearing flags at the end, since you can just walk the linked list

chc4 · 2022-12-20T02:23:17Z

yoooou probably could actually be smart here and have each thread do a visit traversal from the object in Drop if a global COLLECTION_IN_PROGRESS flag is true, and if they run into another object that has already been visited you merge your local set of objects and edges with the main collector. then instead of buffering roots on the channel for the next collection, you're enrolling them in the current collection cycle

chc4 · 2022-12-21T08:11:45Z

the problem with doing visit traversals in Drop is most of the time the subgraph of objects don't need a cyclic collection at all, and later Drops would be able to decrement refcount to 0 and free. you're doing extra work that is mostly useless.

i think instead we could transition flags NONE->(VISITED & DIRTY) and then put the object on an intrusive linked list, if we Drop while a collection is going on. then it would stop the collector from visiting through obiects if we know they're reachable from a live object, essentially contracting the graph from mutators, instead of having to either do a lot of work or having to block on the collector. If the refcount goes to 0 the mutator would be able to immediately unlink the object from the intrusive linked list and free the object, possibly with the mutator holding onto a Weak to it.

at the end of collection we can then walk the intrusive linked list into order to reset the flags of all the objects, and then put all the list objects on the deferred list for next collection

chc4 · 2022-12-23T05:39:18Z

if im already making a concurrent intrusive double linked list part of gc nodes then it should always be using that, instead of the channel+hashmap scheme. cactusref does it, but i discounted it as an implementation choice when first writing this because its 1) very unsafe 2) not concurrent for cactusref, but i think you can do it concurrently 3) wouldn't enable the snapshotting required for concurrent cyclic ohject detection, but i think you can by just doing an atomic exchange on the head pointer for NULL when starting a collection, which makes all future objects by mutators get queued on a new list, and then walk the broken off list of objects and make the hashmap from the Weak forward pointers. you could even have a per-thread intrusive linked list so that they aren't all fighting for the head.

there's probably some downsides to this (a bunch of atomic cmpxchgs are kinda expensive, compared to one relaxed atomic add + posting to a channel and inserting into a hashmap?) so id want to get good benchmarking setup to make sure it's actually better for the majority of cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

see if per-thread hashmap is better #4

see if per-thread hashmap is better #4

chc4 commented Dec 18, 2022

chc4 commented Dec 18, 2022

chc4 commented Dec 18, 2022

chc4 commented Dec 18, 2022

playXE commented Dec 19, 2022

chc4 commented Dec 20, 2022 •

edited

Loading

chc4 commented Dec 20, 2022

chc4 commented Dec 20, 2022

chc4 commented Dec 20, 2022

chc4 commented Dec 21, 2022

chc4 commented Dec 23, 2022

see if per-thread hashmap is better #4

see if per-thread hashmap is better #4

Comments

chc4 commented Dec 18, 2022

chc4 commented Dec 18, 2022

chc4 commented Dec 18, 2022

chc4 commented Dec 18, 2022

playXE commented Dec 19, 2022

chc4 commented Dec 20, 2022 • edited Loading

chc4 commented Dec 20, 2022

chc4 commented Dec 20, 2022

chc4 commented Dec 20, 2022

chc4 commented Dec 21, 2022

chc4 commented Dec 23, 2022

chc4 commented Dec 20, 2022 •

edited

Loading