Replies: 3 comments
-
Cool! I'm surprised the performance isn't impacted. I always just assumed doing a type check would be prohibitively expensive in a benchmark. Normally I just use arrays anyway, but when I posted to Keep in mind I've also been working on a (hopefully) faster hash called BlazeHash which reads 8 bytes at a time instead of just 1. Largely the same idea as cyrb53a. There's also BlazeHash128 which in theory could be even faster, though it needs a lot of work. It reads 16 bytes at a time and produces a 128-bit hash. Poroblem is, 128-bit hashes are harder to test fully, so I typically just use my MurmurHash3_x86_128 port when I need large hashes, since it's quite fast already. |
Beta Was this translation helpful? Give feedback.
-
ad cyrb53a) I've read your code comments and am aware that it is still work in progress. I'm not using it in production, I just stumbled upon it as I was interested in a hash function that makes full use of JavaScript's 52/53bit integer range. ad performance) As much as I appreciate that someone takes the time to develop high performance hash functions in pure JavaScript, it might be more "fulfilling" for you to code and test them in C. If you take a look at, for example, the xxhash benchmarks, you will find that such performance might never be achievable with a JS solution. While Turbofan can theoretically optimize the JS function's byte code to run just as fast as a C routine, in practice this is only that case for very simple functions. In that regard I must say that your cyrb53a JS function performs very well. In the benchmark I ran (100K*100K strings) the throughput on my machine was ~0.6Gb/s. If your BlazeHash versions, for example, are as fast as you expect them to be, write them in C and check whether they can hold their own against the current "top dogs" like xxhash, murmur, city etc. as far as speed, randomness, collision resistance etc. are concerned... As you were talking about SMHasher (which is written in C afaik) you might already be doing this anyway, but at least in this repository I could only find JavaScript code... In any case, I wish you all the best for your future work. |
Beta Was this translation helpful? Give feedback.
-
I am using it to hash url strings in my personal scraper. |
Beta Was this translation helpful? Give feedback.
-
With minimal additional code, the string-based hash functions, like
cyrb53
, could work with any numeric array-like input, like Uint8Array, Uint16Array, Buffer etc. Below you can find a sample script where I did this forcyrb53a
. Running the script asnode cyrb53a.js 0
will run a ~20sec benchmark for the original version, runningnode cyrb53a.js 1
will do the same for the modified version. On my system (Ubuntu 16.04, AMD Ryzen 7 1700) with Node.js v12.19 and v14.15 both versions perform about the same, i.e. the additional code does not seem to hinder V8's Turbofan from optimizing the function like the original one.PS: Keep up the good work. This repository is a real treasure trove for hash enthusiasts :)
Beta Was this translation helpful? Give feedback.
All reactions