Skip to content

Files

Latest commit

d47a7f3 · Jun 20, 2024

History

History
211 lines (162 loc) · 56.4 KB

README.md

File metadata and controls

211 lines (162 loc) · 56.4 KB

Benchmarking VBA-FastDictionary

This Dictionary has decent performance while being compatible across all VBA platforms and applications. However, as you will see below, in most cases it is the fastest solution when compared to what is already available.

Table of Contents

What's already available

The following will be used for comparison:

  • the built-in VBA.Collection class
  • the Scripting.Dictionary available under the Microsoft Scripting Runtime (scrrun.dll) on Windows only

The Data Structures section of @Sancarn 's Awesome VBA repo is a good source for finding alternatives. There surely are many other implementations out there but they are not discussed here.

Two of the publicly available dictionaries which are listed in the above repo will be used for speed comparison with this repository:

There are other classes listed in the Awesome VBA repo but those are just extensions of VBA.Collection or Scripting.Dictionary and so will not be used. The clsTrickHashTable has a nice assembly enumerator but it's not Mac compatible and is not suitable for VBA7 because it relies on API calls which are slow in VBA7 - this is tested and explained in this Code Review question.

Pros and Cons

Pros

  • native - works in all VBA environments
  • fast instantiation
  • decent speed for up to 100k key-item pairs (when compared to the other solutions)

Cons

  • keys can only be of String data type - any other type needs to be cast
  • can only compare keys in text compare mode
  • cannot retrieve keys unless using memory manipulation
  • cannot enumerate keys using For Each..
  • slow speed for 100k+ items (when compared to the other solutions)

Pros

  • in general, the fastest solution for up to 32k key-item pairs where keys are String or Object
  • the standard/go-to solution for most VBA users on Windows

Cons

  • not available on Mac
  • slow for 32k+ key-item pairs - its hash table size is fixed to 1201 - see Scripting.Dictionary Conclusions for more details
  • very slow for number keys especially outside the -9,999,999 to 9,999,999 range because all numbers are casted to Single before they are hashed - see this for more details
  • has speed issues when multiple instances are used - the implementation is constantly reading the compare mode and the hash size (1201) from the heap - see Scripting.Dictionary Conclusions for more details

This is a class that wraps around Scripting.Dictionary on Windows and uses a combination of Collections (for keys) and Arrays on Mac.

Pros

  • easy drop-in replacement for Scripting.Dictionary
  • works on Mac and Windows on both x32 and x64

Cons

  • slower on Windows because it uses late-binding for the internal Scripting.Dictionary
  • very slow on Mac because of the strategy used
  • has serious bugs. For example, on Mac, an integer key of value 2 is considered equal to a text key of value 2__2 while in text comparison mode. The same happens on Windows if the compiler constant UseScriptingDictionaryIfAvailable is set to False. Another example, it raises an error for a Null key. Another example is that it sees the key #3/31/2024# as different from CDbl(#3/31/2024#). Another example is that in dict_RemoveKeyValue the last line is a call to dict_RemoveObjectKey even if the key being removed is not an Object which raises an error. And the list continues
  • inherits all the cons of Scripting.Dictionary

This is a class that does its own hashing and uses only arrays and some nice logic.

Pros

  • has methods like Add, Count, Item (default), Exists, Remove which makes it consistent with Scripting.Dictionary and thus easy to use for most people
  • works on Mac and Windows on both x32 and x64. For this to be true and to avoid VBA7 API overhead issues, the 050f128 commit does the necessary changes so that comparison is fair
  • has good performance when you know the number of items in advance
  • allows duplicate keys
  • hash is fast for keys of String type when the keys are relatively short and that's because it iterates the Wide-Character codes as Integers

Cons

  • has poor performance when you don't know the number of items in advance. The default value of 16384 as in ReInit 16384 in the Class_Initialize has a huge performance impact when many instances of this class are created. Using a smaller initial number improves performance by making initialization faster but then works poorly if too many items are inserted which cause too many collisions and linear search
  • doesn't really handle all data types for keys. For example, all Variant/Decimal keys would get assigned the exact same hash value and would all go into the same hash bucket (last bucket). LongLong is not covered as a data type and would end up in the same bucket as Decimal. Same for Variant/Error
  • for object keys, it ignores the interface i.e. the same instance can be passed as 2 different interfaces and the keys are considered different
  • the hash is slow if using keys of String type when the keys are lengthy. This can be fixed by compiling in something like TwinBasic but will require a dll reference to be used in VBA

Benchmarking code

Special thanks to Guido for his excellent code modules:

Copies of both modules are available under the third_party_code folder.

The Benchmarking.xlsm Excel file contains all the code listed under benchmarking/src folder. The results are written to 8 worksheets (one for each operation measured) and can be exported as screenshots.

The actual tests are in the BenchTests.bas module and they call the main Benchmark method of the Benchmarking.bas module with various key inputs.

Classes tested

Tim Hall's VBA-Dictionary, VBA.Collection, and Scripting.Dictionary are tested as-is.

For cHashD class we test 3 approaches:

  1. the default size of 16384 for the hash table size - which as you will see does not perform very well
  2. assuming the number of key-item pairs to add is known in advance then the hash table is sized before adding items with the goal of achieving approximately 10% load
  3. assuming the number of key-item pairs to add is known in advance then the hash table is sized before adding items with the goal of achieving approximately 38.5% load

For the new Dictionary (this repo) we have 2 approaches for adding items:

  1. default rehashing - the hash table resizes when the load reaches 50%
  2. assuming the number of key-item pairs to add is known in advance then the hash table is sized before adding items with the goal of achieving approximately 50% load - this is slightly faster than the default rehashing but not by much

Results

All screenshots are saved under the result_screenshots folder. The results can be reproduced by running the speed tests in the Benchmarking.xlsm file under the BenchTests.bas module.

Tests:

  • T01 - Mixed Keys
  • T02 - Double Fractional Keys
  • T03 - Double Large Integer Keys
  • T04 - Double Small Integer Keys
  • T05 - Long Large Keys
  • T06 - Long Small Keys
  • T07 - Class1 Keys
  • T08 - Collection Keys
  • T09 - Text Keys (length 5, binary compare)
  • T10 - Text Keys (length 5, text compare)
  • T11 - Text Keys (length 8-12, binary compare)
  • T12 - Text Keys (length 8-12, text compare)
  • T13 - Text Keys (length 17-23, binary compare)
  • T14 - Text Keys (length 17-23, text compare)
  • T15 - Text Keys (length 40-60, binary compare)
  • T16 - Text Keys (length 40-60, text compare)

Win x64 VBA7

Operation T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Add T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Exists (True) T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Exists (False) T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Item (Get) T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Item (Let) T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Key (Let) T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
For Each / NewEnum T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Remove T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16

Win x32 VBA7

Operation T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Add T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Exists (True) T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Exists (False) T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Item (Get) T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Item (Let) T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Key (Let) T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
For Each / NewEnum T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Remove T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16

Win x32 VBA6

Operation T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Add T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Exists (True) T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Exists (False) T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Item (Get) T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Item (Let) T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Key (Let) T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
For Each / NewEnum T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Remove T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16

Mac x64 VBA7

Operation T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Add T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Exists (True) T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Exists (False) T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Item (Get) T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Item (Let) T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Key (Let) T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
For Each / NewEnum T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16
Remove T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16

Conclusions

  • For adding keys of type Object, Fractional Numbers, or lengthy Strings, this Dictionary is the fastest solution for almost any number of key-item pairs. Even for shorter Strings and Integer numbers keys, this Dictionary is still the fastest for large numbers of key-item pairs, while for small numbers of pairs, the difference is so insignificant that it does not justify using any other solution. Keep in mind that this Dictionary is fast even without knowing the number of pairs in advance (rehashing) while solutions like cHashD simply cannot operate decently without knowing in advance
  • For checking if a key Exists, this Dictionary is simply the best choice. Only the Scripting.Dictionary is slightly better for small numbers of key-item pairs and the difference is insignificant. For cases when the keys do not exist, this Dictionary is even faster than for cases when keys exist. That's because it checks a whole group (8 keys on x64 and 4 keys on x32) in a few bitwise operations (on sub-hashes) without ever needing to compare the keys themselves
  • Retrieving Item(s) via Get or setting via Let/Set makes this Dictionary the best choice for any type of keys
  • Setting Keys to a different value is only possible for Scripting.Dictionary, VBA-Dictionary and this Dictionary with the latter being the fastest choice
  • Iterating keys using a For Each.. loop is only supported by Scripting.Dictionary and this Dictionary while the latter is just faster
  • This Dictionary was not optimized for Remove, so it is not the fastest in this regard, except with large numbers of key-item pairs or lengthy text keys. However, the trade-off is to have faster Add, Item and Exists operations

Final thoughts

Although it might seem that Scripting.Dictionary is faster for up to 32k items and for specific key types, the difference is usually of microseconds or a few milliseconds when compared to this Dictionary. However, the lack of compatibility with Mac and some of the other issues mentioned (reading heap is slow for multiple instances) makes this Dictionary a better choice over Scripting.Dictionary.

Although it might seem that cHashD is faster for adding keys of type Long (up to a certain number of pairs), that comes with the downside of not being compatible with Scripting.Dictionary. For example key CLng(1) is seen as different than CDbl(1) while Scripting.Dictionary and this repo's Dictionary 'see' them as the same number. Moreover, cHashD does not perform well without knowing the number of items in advance so the comparison is just for illustrative purposes.

Overall, the Dictionary presented in this repository seems the best choice for any scenario, key type, key length, platform, and number of key-item pairs added.