Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Paradict, a CTrie implementation #49

Closed
wants to merge 4 commits into from

Conversation

ElectreAAS
Copy link

@ElectreAAS ElectreAAS commented Dec 22, 2022

I added the library Paradict (from my repo).

The main code has been tested manually (but without dscheck yet) and seems pretty robust, but there may be a lack of documentation up to the standards of the project.

All file and variable names can be modified as my project wasn't done with the utmost professionalism in mind.

Feel free to suggest changes!

@ElectreAAS
Copy link
Author

This also adds a new dependency, Kcas

@ElectreAAS ElectreAAS marked this pull request as ready for review January 2, 2023 15:52
@bartoszmodelski bartoszmodelski self-requested a review January 5, 2023 09:42
@bartoszmodelski
Copy link
Contributor

bartoszmodelski commented Jan 11, 2023

Thanks for this PR @ElectreAAS!

Paradict looks useful to have. A ctrie-based structure will offer very different performance characteristic and API than the hashmap that @lyrm is currently cooking (which is array+list based). Having said that, lockfree maps are hard to get right and we'll need a more exhaustive testing. Here's some thoughts:

Parallel Testing

There's a few parallel tests for insertions, which are a good start, but at the minimum, we should have separate parallel tests for all core operations: insert, delete, find, update, snapshots and some combinations of them. Afterwards, we can take a look at the other functions that deal with entire data structure (e.g. size, iterators, ..), as I suspect they will either work on a snapshot or have a rather relaxed semantics.

Inside the parallel tests, it's important to ensure there's a high chance of actual parallel execution. For example, para_add_contention spawns 8 domain and each adds a single element into the map. Here, the spawn is so much more expensive than insertion, that any variability in spawn will make the insertions run alone. para_add_mem does better in that it has a loop of items but I think it could be improved further:

  • 10k items still feels like not a lot, e.g. mpmc queue uses 10^6, ws deque 10^5.
  • Personally, I try to make the loops as tight as possible - a call sprintf probably takes a bit longer than our core operations.
  • Another trick is to wait for all domains to start up before beginning the actual test. Mpmc queue does that with an atomic variable, mpsc queue with a semaphore.

All of the above refer to hammering a single instance in different ways. A different approach taken by Carine in SPMC exposed yet new issues.

Other

Furthermore, it would be nice to have dscheck tests and perhaps a rudimentary benchmark (how much are we actually faster than a locked map?).

I'm happy to help if you have any questions. In particular, happy to take the dscheck part.

A couple questions:

  • Is the kcas required for core structure or some extra functionality? For example, the basic version of ctrie on wiki does not mention it.
  • Comment for size recommends to take a snapshot to get accurate results. Perhaps we should have a snapshot type and only allow calling size on it?
  • The comment for map mentions that every pair (key, value) will be replaced. Just to be sure I'm not missing something - there can only be one such pair right?

@bartoszmodelski
Copy link
Contributor

Related: ocaml-bench/sandmark#181

@ElectreAAS
Copy link
Author

Is the kcas required for core structure or some extra functionality? For example, the basic version of ctrie on wiki does not mention it.

It is required for the functionality of the snapshot (as described in the follow-up paper by the authors), but not for the standalone data structure. I understood that the main draw of CTries was this snapshot operation so I suspect there might not be much interest in a CTrie without snapshot (and hence without KCas), but that's pure guesswork.

The comment for map mentions that every pair (key, value) will be replaced. Just to be sure I'm not missing something - there can only be one such pair right?

There can only be one (k, v) pair for each k in the dictionary, yes. However, map operates on the whole dictionary, not on a single pair.

@ElectreAAS
Copy link
Author

For testing, I'm afraid I don't have any time to dedicate to that these days. Maybe next hacking days...

I know that @OlivierNicole has done some hammering with qcheck & tsan, he might be able to give you some more info on that

@OlivierNicole
Copy link
Contributor

Yes, I have run a parallel property-based test using qcheck-stm, with ThreadSanitizer instrumentation to detect potential data races. There were no data races in the code (although my test covers only a subset of the Paradict API).

@ElectreAAS
Copy link
Author

Closing as stale.
If there's interest in the future, my repo is still up.

@ElectreAAS ElectreAAS closed this Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants