Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(set-map): introduce MutativeMap to optimize for common scenarios (lots of data, little change) #73

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

thomasjahoda
Copy link

@thomasjahoda thomasjahoda commented Dec 16, 2024

I propose a Map-variant optimized for scenarios where a lot of entries exist and only a few are changed or accessed during mutations. (it already is much faster at just a hundred entries though)
However, it does not guarantee iteration order to be the same as the original map (i.e. it might not be insertion order during iteration).

Background/Details:

Mutative and especially Immer do not work well for scenarios where a lot of entries exist and only a fraction of the data is accessed/changed per mutation. It has to shallow-copy the whole Map on each mutation, which has a significant performance impact compared to directly mutating a Map.
Compared to immer, mutative is already much faster/better for such scenarios, but using MutativeMap basically reduces the cost to 0. See performance test in test/performance/mutative-set-map.ts.
This class enables mutative from needing to copy the entire map when a single value is changed. Instead, it stores the original entries separately and only copies the changed data during drafting.
E.g. if there are 50k entries and only 1 is changed or was recently changed, this class will only copy the map with that 1 changed entry.
With N=total_count, M=changed_count, this changes the complexity from O(N) to O(M) for drafting.
Other operations will become slightly more expensive due to having to lookup two Maps in many scenarios, but they have the same asymptotic complexity as a regular Map.

Benchmarks

N = 'number of entries', M = 'number of changed/accessed items during mutation'.
The total cost within a mutation is MAP_DRAFTING + M * DRAFTING_ITEM.
The cost of MAP_DRAFTING is constant for MutativeMap and O(N) for Map.
DRAFTING_ITEM is slightly more expensive for MutativeMap but asymptotically has the same complexity.

So for scenarios with a single change, the cost is basically constant:
map

For scenarios with only a fraction of the data being changed/accessed, MAP_DRAFTING becomes more and more irrelevant.
map-batch

Status

There are still a few TODOs for cleanup, refactoring and additional tests, but before I spend that effort I wanted to inquire whether there even is an interest in introducing such data-structures to optimize for such scenarios (which I suspect are the most common and most important scenarios though). Naturally, it takes away from the elegance of just being fast out-of-the-box, but I didn't want to extend Map and just transform every touched Map to MutativeMap by default. There are a few additional reasons in MutativeMap.ts against extending Map, but I'm not sure what the best approach is.
I also need to re-execute all benchmarks and apparently I accidentially deleted benchmark.jpg.
I also added some hacks regarding the __DEV__ global, because I had issues when executing the benchmarks. It's still not perfect as rollup gives warnings when generating types.

MutativeSet: A similar MutativeSet variant should be introduced to have the same asymptotic complexity as MutativeMap for the described scenarios, but I didn't bother to introduce it yet, as I don't need it in my own project. If there is an interest in this, I would be willing to implement it though.

@unadlib
Copy link
Owner

unadlib commented Dec 17, 2024

Hi @thomasjahoda ,

Thank you for your proposal and PR. I understand your idea and haven’t had a chance to thoroughly review your PR yet. However, I’m leaning towards implementing a library similar to mutative-map based on the custom shallow copy interface provided by Mutative, as it might offer more flexibility for users.

I’ll take a full look at your PR later.

Thanks again!

@thomasjahoda
Copy link
Author

@unadlib Thank you for already taking a look and the quick heads-up 🙏🏻
I have no time pressure for an answer or integrating the PR, so feel free to answer whenever.

I agree that intuitively it would be more elegant and make more "sense" to implement such data-structures around and not within the core of the library. And at first I even attempted to implement it via the custom shallow copy interface, but I failed miserably.

How to maybe do it with the custom shallow copy interface

Now that I have more knowledge about the internals of mutative I probably would have more of a chance to implement MutativeMap that way, although I am pretty certain that I would need to significantly enhance the custom shallow copy interface - especially to retain performance both within and outside the mutation.
Afaik some relevant limitations of the custom shallow copy interface are:

  • shallow copies via mark-fn are directly marked as changed (no line number but search for "Trigger a custom shallow copy to update to a new copy") for some reason. This leads to changes without having changed anything.
  • No custom proxies to efficiently handle drafting and finalization. Some probably incomprehensible brainstorming regarding that: Finalization would currently directly call set on the patchData Map within MutativeMap, which would be required to be intercepted to check whether anything changed at all compared to immutableData and to actually set the correct value. This intercept is probably possible by extending Map, but this kind of abuses internal undocumented behavior for this special-case and probably reduces performance. But I don't know how I would tell mutative that nothing changed in the end. This would need new plugin capabilities for the shallow-copy interface. Probably this map-intercept-based approach could be used for MutativeSet, too, but it just generally feels bad. I would recommend exposing some unstable internals and allow providing a separate proxy in addition to custom shallow-copies - but this would still make it hard to implement something like that.
    • Specifically, I don't even see how to prevent drafting the immutableData Map but still draft the actual items within - at least in a generic way. I didn't want to play around too much with marking immutableData as mutable to prevent drafting. It would probably all work quite easily by extending Map by abusing internal behavior again. But it was a secondary goal for me to not have to do that.

Why directly integrate it into mutative itself

I see little hope in generically supporting patches and freezing using this plugin system. Which is a shame if everything you want is just the most basic features in the most performant way. That is also a major reason why I decided that it is worth it to just build MutativeMap and MutativeSet directly into mutative itself. It's just that basic and the main selling point of mutative seems to be performance IMHO - so it makes sense to provide additional options to SIGNIFICANTLY enhance performance with the trade-off of not using the standard data-structures - even though they basically have the same signatures and cause basically no additional effort to use.

Just a heads-up from myself: Now that I put some thoughts into writing, I see that I do not think it is worth it to invest the time to attempt to use and extend the custom shallow copy interface. Instead, if you think it's not worth it to integrate MutativeMap into the core, I'd just fork mutative and maintain it just for my project to save time.

Thank you for the great library and your time!

@unadlib
Copy link
Owner

unadlib commented Dec 18, 2024

@thomasjahoda I appreciate the effort you have put into this.

In the early stages of Mutative, I considered how to comprehensively support custom immutable types. To achieve this, in addition to the current custom shallow copies, it is necessary to provide modification hooks for the custom shallow copies to facilitate locating modification marks. Moreover, as you mentioned, we need to provide custom finalize interfaces during the finalization process, which naturally includes how to check and generate patches, etc.

From a technical perspective, the aforementioned components are generally required. If refactored properly, such improvements would not introduce breaking changes.

I propose providing a base class for custom types. This base class can offer some default implementations, such as shallow copying, modification hooks, finalize interfaces, etc., so that users only need to inherit from this base class and then implement their own types.

What do you think of this approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants