Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore reimplementing HashMap as CHAMP (Compressed Hash-Array Mapped Prefix-tree) #2417

Open
pivovarit opened this issue Jun 13, 2019 · 20 comments

Comments

@pivovarit
Copy link
Member

pivovarit commented Jun 13, 2019

Related Epic: #2308

CompressedHAMP is designed to provide a smaller footprint and better cache locality than standard HAMT which features one array encoding both sub-node references and data values.

To improve things we can split the array into two dedicated ones: one for values, and one for sub-node references and use two 32-bit bitmaps.

Another idea to explore is if it would make sense to give up physical tree structure in favor of logical with one array backing the whole map. Would allow decreasing pointer chasing but would penalize modifications.

We should follow up and investigate if it makes sense to do the same move as Scala did in 2.13.

scala/collection-strawman#192
scala/scala@d5ae93e

References:

@danieldietrich
Copy link
Contributor

danieldietrich commented Jul 23, 2019

We should consider to port it from usethesource/capsule.
Here the author gave an overview over data structures.

@pivovarit
Copy link
Member Author

pivovarit commented Jul 24, 2019

Great, I was aware of the library - wasn't aware of that thread. Definitely helpful!

btw, do you think it would make sense to do a push before 1.0.0? 🤔

@danieldietrich
Copy link
Contributor

@pivovarit I plan to include it in 2.0.0, which is already in the make but in a very early alpha state. Collections aren't part of it, currently.

I also plan to get my hands on the essential data structures BMT, Red/Black Tree and CHAMP. I'm not sure if it makes sense that you invest time.

@wrandelshofer
Copy link

wrandelshofer commented Jun 19, 2022

I made an experimental (and very dirty) port of CHAMP collections in a fork of vavr:
https://github.com/wrandelshofer/vavr#readme

I decided to implement linked collections with sequence numbers. This is much simpler to do, than keeping a linked list or a vector in sync with the CHAMP trie. However performance is a mixed bag. Most operations are quite fast, except accesses to the first/last entry of the collection - which are linear in time. Also, occasionally, we need to renumber the sequence numbers, which may cause unexpected slow-downs. Memory efficiency should be quite good though.

So, maybe my fork is just a demonstration of how not to do it. 😂

@wrandelshofer
Copy link

I made some progress now.

I have now implemented four different CHAMP-based collections:

  • vavr.ChampMap
  • vavr.SequencedChampMap
  • vavr.ChampSet
  • vavr.SequencedChampSet

Here is the current README file. It shows a performance comparison with vavr.HashMap, vavr.LinkedHashMap and scala.HashMap, scala.VectorMap, scala.TreeSeqMap.

https://github.com/wrandelshofer/vavr/blob/6569c67e3d5c3b4dae681fa762c714c8d074d7ab/README.md

The performance of the sequenced collections (the ones that preserve the insertion order) is now O(1) for all operations.
There are no performance cliffs anymore.

The code is in a very rough draft state. Let me know, what you think about it.

@danieldietrich
Copy link
Contributor

Hi @wrandelshofer, the benchmarks look really impressive! From my point of view, you are welcome to create a PR that replaces the according Vavr collections (keeping their names).

@wrandelshofer
Copy link

Thank you. I am going to create a PR. 😀

@danieldietrich
Copy link
Contributor

Awesome, I am really looking forward to this great improvement!

@wrandelshofer
Copy link

Please, do not worry. I am working hard on the PR. So far I have converted 2 collections. I have 2 more to go. 😅

The code I am working on is in this branch:
https://github.com/wrandelshofer/vavr/tree/champ-dev

(No need to look at it. It is in disarray.)

What are your plans with respect to Java 21?
It would be a shame, if I downgraded all my code to Java 8, and would then have to upgrade to 21 again. 🚧 🏗️ 👷‍♂️

@wrandelshofer
Copy link

I have now made a pull request. See
#2745

@wrandelshofer
Copy link

wrandelshofer commented Oct 13, 2024

I have made now a release that is binary compatible with vavr 0.10.5, but which is implemented with CHAMP-based collections.

You can get it here: https://github.com/wrandelshofer/vavr/releases/tag/v0.10.5

@pivovarit
Copy link
Member Author

That's an amazing job you did here :) I will get back to you soon, I promise!

@wrandelshofer
Copy link

The merge request has a huge change set. It is almost not reviewable.
Having it as a Jar that can be used in place of the original vavr.jar, will make it easier to try it out.

@pivovarit
Copy link
Member Author

I would be against one big-hopefully-lucky merge, but I'd rather work on integrating it with the library gradually. Initially, by providing alternative implementations which would gradually replace existing ones

@wrandelshofer
Copy link

Ideally, new collections could be added in a separate Jar file.
The API of vavr is closed for extension. It would require some design changes.

@pivovarit
Copy link
Member Author

pivovarit commented Oct 14, 2024

What are you missing in order to fully integrate it with the existing Vavr Collections API?

I was thinking to pull off the Scala move: scala/scala@d5ae93e

The new implementations (i.e., ChampHashSet and ChampHashMap) currently exist next to the previous HashMap and HashSet.

@wrandelshofer
Copy link

There are a several places in the vavr code base, that are closed for extension.
I was not able to add a new collection implementation that would work along with the existing collections.
Unfortunately, I scrapped my earlier attempts on this.

For illustration, I am going to create a new project which only contains the Champ collections as an add on.
This will take some time. I'll let you know, when I have done it.

@wrandelshofer
Copy link

The new project is here: https://github.com/wrandelshofer/vavr-champ-collections

In this project, we have the following new collections: ChampSet, ChampMap, ChampVectorSet, ChampVectorMap.

Now, I have the following issues with these Champ collections:

. These collections can not be injected into the vavr API class. So we can not use them in existing programs, like in one of the existing Euler-Tests.

. If we perform functions on these collections using the fluent API, and any of the functions happens to produce a Vector, SortedSet, Seq or any other collection in io.vavr, then subsequent functions will use HashSet, HashMap, ... from io.vavr instead of the Champ collections.

. Default methods in the io.vavr Traversable interface, call static factory methods of io.vavr collections (like HashSet.of()). And thus we have to override them all in the Champ collections.

🤔

@pivovarit
Copy link
Member Author

If we perform functions on these collections using the fluent API, and any of the functions happens to produce a Vector, SortedSet, Seq or any other collection in io.vavr, then subsequent functions will use HashSet, HashMap, ... from io.vavr instead of the Champ collections.

This is not really a problem - API methods operating on specific implementation, also return instances of the same implementation: see Map#map(), LinkedHashMap#map() and HashMap#map() for example.

If an API methods returns instances of another class, for example: Map#collect() returning Seq, it's ok. Those methods should be extended anyway to give users an option to provide their own factory. Also, I believe Champ collections could eventually substitute existing ones. They would just coexist in preview-like transition period.

Summarizing:

  • for now, I think ok is to keep them separate to maximize backward compatibility (I'd rather avoid users ending up with a surprise implementation by accident)
  • eventually, those could become new default implementations for collections (so your concerns would not be an issue)
  • alternatively, we conclude that the experiment failed, and we remove them (unlikely, but also fine)

Note that the same philosophical problem exists now since we have more than one implementation, for example, for Map

@pivovarit pivovarit added this to the v0.11.0 milestone Oct 16, 2024
@wrandelshofer
Copy link

Nah, I would just redesign everything. 😜😂

Great. I am going to publish the vavr-champ-collections library. So that we can gather feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants