This document outlines the design decisions made in creating an efficient and robust dictionary, providing a comprehensive overview of its functionality. Please refer to the table of contents below for easy navigation to specific design decisions.
This Dictionary does not require any DLL references or any kind of external libraries. Works on Mac and Windows on both x32 and x64.
- Compatibility with Scripting.Dictionary
- Hashing
- Metadata
- Hash Map/Table
- Rehashing
- NewEnum
- Additional functionality
- OLE Automation
- x64 Assembly
The Dictionary presented in this repository is designed to be a drop-in replacement for Scripting.Dictionary
(Microsoft Scripting Runtime - scrrun.dll on Windows). However, there are a few differences, and their purpose is to make this Dictionary the better choice from a functionality perspective.
The Scripting.Dictionary
casts all the keys of type number to Single
and only then hashes the values. This can be easily checked with the following code snippet:
Option Explicit
Private Declare PtrSafe Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (Destination As Any, Source As Any, ByVal Length As LongPtr)
Sub TestNumberHash()
#If Mac Then
MsgBox "Scripting.Dictionary is not available on Mac"
#Else
Dim d As Object: Set d = CreateObject("Scripting.Dictionary")
Dim i As Long
Dim s As Single
'
i = 12345
s = i
'
Debug.Print d.HashVal(i) 'Prints 1196
'
Const dictHashModulo As Long = 1201
Dim l As Long
'
CopyMemory l, s, 4
'
Debug.Print l Mod dictHashModulo 'Prints 1196
#End If
End Sub
Thus, any number outside the Single
range is not hashed. The following all return 0 (zero):
Dim d As Object: Set d = CreateObject("Scripting.Dictionary")
Debug.Print d.HashVal(10000000)
Debug.Print d.HashVal(1E+300)
Debug.Print d.HashVal(215363454)
So, Scripting.Dictionary hashes any number outside the -9,999,999 to 9,999,999 range to a value of 0 which explains why it is terribly slow to add such numbers. See adding large numbers.
Because of this huge disadvantage, this repo's Dictionary hashes all numbers by casting them to Double
instead of Single
. However, this creates an incompatibility:
Sub TestHashPrecision()
Dim sd As Object: Set sd = CreateObject("Scripting.Dictionary")
Dim nd As New Dictionary
'
Dim d As Double: d = 12345.6789101112
Dim s As Single: s = d 'Approx. 12345.68
'
'The new dictionary sees keys as different
nd.Add d, 1
nd.Add s, 2
'
'The scripting dictionary sees keys as same
sd.Add d, 1
sd.Add s, 2 'Throws error 457
End Sub
Scripting.Dictionary would always downgrade a Double
to a Single
to perform the comparison. This is of course in line with how VBA behaves as seen here:
When a Single is compared to a Double, the Double is rounded to the precision of the Single
However, the new Dictionary (this repo) casts Single
to Double
. This seems more of an improvement rather than an issue, not to mention that the number of collisions is greatly reduced thus improving speed by orders of magnitude.
This Dictionary only raises errors 5, 9, 450 and 457. For example Scripting.Dictionary raises error 32811 if calling Remove
with a key that does not exist while this Dictionary raises error 9 (Subscript out of Range).
The main reason not to adhere to the same error numbers is speed. For example in the Item
method, instead of using an extra If
statement to check if the call to GetIndex
returns NOT_FOUND
, the code simply continues and if the key was indeed missing, error 9 is raised anyway when trying to access the storage array with an invalid index. Other methods like Remove
will simply return error 9 for consistency. The omission of the extra If
statement does not impact speed for a few items but for millions of items there is a difference and speed was chosen over consistency here.
When calling the Item
(Get) property with a key that does not exist, the Scripting.Dictionary
adds a new key-item pair where the key is the key that did not exist previously, and the item is Empty
. This kind of behaviour makes sense in the Let
or Set
counterparts of the Item
property - which is why this Dictionary emulates the same behaviour. However, for the Get
property this does not make much sense. On the contrary, it's misleading. Moreover, most likely no one would ever rely on this kind of functionality considering the Exists
method does not throw an error if avoiding errors is the goal.
So, this Dictionary throws error 9 if Item
(Get) is called with a key that is not part of the dictionary.
The way Scripting.Dictionary compares these 3 values is very misleading. Consider the next code snippet:
Dim d As Object: Set d = CreateObject("Scripting.Dictionary")
On Error Resume Next
d.Add Empty, Nothing
Debug.Assert Err.Number = 0
d.Add "", Nothing 'Not allowed because Empty exists
Debug.Assert Err.Number = 457: Err.Clear
d.Add 0, Nothing 'Not allowed because Empty exists
Debug.Assert Err.Number = 457: Err.Clear
'Dict contains Empty
d.RemoveAll
d.Add 0, Nothing
Debug.Assert Err.Number = 0
d.Add "", Nothing
Debug.Assert Err.Number = 0
'Dict contains 0 and "" keys
d.RemoveAll
On Error GoTo 0
We can see in the above code that:
- when comparing
Empty
to""
(orvbNullString
) they are considered equal - when comparing
Empty
to 0 (zero) they are considered equal - when comparing 0 (zero) to
""
(orvbNullString
) they are NOT considered equal
This is the standard behaviour when comparing Variants in VBA, clearly outlined here. However, for the Scripting.Dictinary this yields different results depending on the order these special values are added to the dictionary.
To avoid this unfortunate and misleading behaviour, this Dictionary distinguishes between these values and allows all 3 at the same time. The following code is valid when using this Dictionary:
dict("") = 1
dict(Empty) = 2
dict(0) = 3
debug.print dict("") '1
debug.print dict(Empty) '2
debug.print dict(0) '3
A few different hashing strategies were implemented in this Dictionary with the sole purpose that hashing is fast without having to worry about key data type or number of key-item pairs being added.
All hash values are stored (until key is removed or replaced). This requires that the hashes have a good distribution and do not rely on the hash table size. Thus, there is no rehashing in the real sense of the word - for more details see the Rehashing section.
All hash values are combined with data type metadata in the upper bits of the hash so that when comparing hash values we are comparing types in the same instruction - for more details see Metadata section.
All hash values are calculated in the GetIndex
method. This is to avoid any extra function call/stack frame required if having a separate method.
To achieve good hash distribution the following strategies were implemented:
- numbers are first cast to
Double
(8 bytes) and then 4 primes are used to get the best hash distribution - objects are first cast to
IUnknown
so that any class instance is only added once to the dictionary i.e. cannot add the same instance as different interfaces. A prime number is used for best hash distribution - in fact, it seems to outperform anything available as seen here - on Mac, all texts are hashed by iterating each wide character (Integer) in a loop using a prime
- on Windows, the Mac strategy is only applied for texts of length 6 or below and for binary compare only. All other texts are hashed using the
HashVal
method on a fake instance ofScripting.Dictionary
- with early-binding speed even though there is no dll reference
As mentioned above, numbers are first cast to Double
. See Hashing Numbers incompatibility for details as to why this was chosen.
While initially a single prime number (13) was used to hash all numbers, this was changed in 7d58829 to 4 prime numbers. The new approach cut the time in half for hashing large integer numbers and also brought small improvements for hashing smaller integers. Both strategies yield the same results for fractional numbers. The numbers are hashed as per these lines.
Quick example:
Dim d As New Dictionary
d.Add CLng(1234567890), Empty
Debug.Assert d.Exists(CDbl(1234567890))
Debug.Assert d.HashVal(CCur(123.456)) = d.HashVal(CDbl(123.456))
Debug.Assert d.HashVal(CVErr(2042)) <> d.HashVal(CInt(2042)) 'Different because Errors are seen as not-numbers
Debug.Assert d.HashVal(CDbl(CVErr(2042))) = d.HashVal(CInt(2042))
Objects are first cast to IUnknown
and then the IUnknown
interface pointer is hashed. This ensures each instance is only added once to the dictionary regardless of the interface used.
Code for Class1
:
Option Explicit
Implements Class2 'Class2 has no code
Quick example:
Dim d As New Dictionary
Dim c1 As New Class1
Dim c2 As Class2: Set c2 = c1
d.Add c1, Empty
Debug.Assert d.Exists(c1)
Debug.Assert d.Exists(c2)
d.Add c2, Null 'Throws error 457
Objects pointers are well distributed anyway because:
- each class instance takes a certain amount of space and so even if adjacent in memory the pointers for 2 instances still have some bytes in between (not consecutive numbers)
- class instances are stored in memory depending on where the memory allocator finds space
So, there is no need to split the pointer into smaller integers to hash. Instead, a modulo prime number is used for best hash distribution. The prime value of 2701 was chosen after running speed tests for all the prime numbers up to 10k. The code is basically this.
This strategy seems to yield the best results as seen here or here.
On Mac, all texts are hashed by iterating each wide character (Integer) in a loop. Each char code is added to the previous hash value and the result is multiplied with a prime number. This is repeated until all characters are iterated. A bitmask is used to avoid overflow. The code is this. The prime number value of 131 was carefully chosen after many speed tests with different prime values.
For text compare, the key is first passed to the VBA.LCase
function and then the result is hashed.
LCase
is fast enough on Mac that there is no need to build a cached map for each character code like cHashD
does.
There is an integer accessor being used (same for Windows) so that reading the char codes in a String
is done fast via a 'fake' array. More details on this in the Text Hashing on Windows section below.
The Mac strategy of iterating char codes is only applied in Windows for texts with lengths smaller than 7 and for binary compare only. All other texts are hashed using the HashVal
method on a fake instance of Scripting.Dictionary
.
The only reason the Mac strategy for short texts (<7 length) is still used is that it's simply faster - and 7 seems to be the first length that runs faster on the fake instance. Please note that for text compare the iteration strategy is not used on Windows, so no calls to LCase
are made.
As mentioned above, most texts are hashed using the HashVal
function on a fake Scripting.Dictionary instance. The reason is again speed. For lengthy strings, it is much slower to iterate char codes (in native VBA) than to call this method. See how much better this Dictionary performs on lengthy text keys here as opposed to shorter here solely because it's calling the compiled HashVal
.
This would not be needed if code could be compiled in VBA but unfortunately, it cannot. It could be compiled in something like TwinBasic but then it would require all users to reference a dll file which is a big impediment for most VBA users because of distribution problems and also because some users would have IT permission difficulties.
The following will describe how calling Scripting.Dictionary.HashVal is achieved with early binding speed without needing a reference all while avoiding the implementation issues of Scripting.Dictionary.
By inspecting the memory layout of a random instance of Scripting.Dictionary
we can conclude that it looks like this:
Private Type ScrDictLayout
vTable1 As LongPtr
vTable2 As LongPtr
vTable3 As LongPtr
vTable4 As LongPtr
unkPtr1 As LongPtr
refCount As Long
firstItemPtr As LongPtr
lastItemPtr As LongPtr
#If Win64 = 0 Then
Dummy As Long
#End If
hashTablePtr As LongPtr
hashTableSize As Long
compMode As Long
localeID As Long
unkPtr2 As LongPtr
unkPtr3 As LongPtr
End Type
If we run the following code:
Option Explicit
Private Declare PtrSafe Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (Destination As Any, Source As Any, ByVal Length As LongPtr)
Private Type ScrDictLayout
vTable(0 To 3) As LongPtr
unkPtr1 As LongPtr
refCount As Long
firstItemPtr As LongPtr
lastItemPtr As LongPtr
#If Win64 = 0 Then
Dummy As Long
#End If
hashTablePtr As LongPtr
hashTableSize As Long
compMode As Long
localeID As Long
unkPtr(0 To 1) As LongPtr
End Type
Sub TestScrDictLayout()
#If Mac Then
MsgBox "Scripting.Dictionary is not available on Mac"
#Else
Dim d As Object: Set d = CreateObject("Scripting.Dictionary")
Dim sdl As ScrDictLayout
'
CopyMemory sdl, ByVal ObjPtr(d), LenB(sdl)
Stop
#End If
End Sub
when the code stops execution on the Stop
line, we get something like this in the Locals window:
What we see is that compare mode is set to 0 or vbBinaryCompare
(by default) and the hash table size is 1201. All hashes are apparently applied a Hash Mod 1201
before HashVal
returns:
Sub TestHashValDefaultRange()
#If Mac Then
MsgBox "Scripting.Dictionary is not available on Mac"
#Else
Dim d As Object: Set d = CreateObject("Scripting.Dictionary")
Dim i As Long
Dim h As Long
'
For i = 0 To 10000000
h = d.HashVal(i)
Debug.Assert h >= 0 And h < 1201
Next i
#End If
End Sub
Via memory manipulation, we can change the value of 1201 to something else and we get different hashes:
Sub TestHashValDefaultRange()
#If Mac Then
MsgBox "Scripting.Dictionary is not available on Mac"
#Else
Dim d As Object: Set d = CreateObject("Scripting.Dictionary")
Dim sdl As ScrDictLayout
Dim sizeOffset As LongPtr
'
sizeOffset = VarPtr(sdl.hashTableSize) - VarPtr(sdl) 'For both x32 and x64
CopyMemory ByVal ObjPtr(d) + sizeOffset, &H7FFFFFFF, 4
Debug.Print d.HashVal(123)
Debug.Print d.HashVal(12345)
Debug.Print d.HashVal(1234567)
CopyMemory ByVal ObjPtr(d) + sizeOffset, 1201, 4
#End If
End Sub
We get this:
1123418112
1178657792
1234613304
However, if the value is not set back to the original 1201 then a crash will occur. That's because the hashTablePtr
probably points to a table of 1201 size and when the instance is destroyed, the wrong size is being deallocated.
Based on the above examples, we can now conclude the following:
- in case of a state loss, using a real Scripting.Dictionary instance for hashing would lead to a crash if we change the hash size to anything else than 1201. Please note
hashTablePtr
cannot be changed without leading to a crash, or at best, a memory leak. So, we use a fake instance - see Faking a Scripting.Dictionary instance below - the Scripting.Dictionary never resizes its hash table beyond 1201 which explains the poor performance for more than 32k items even for text keys as seen here. There are so many hash collisions that the linear search simply degrades performance
- the Scripting.Dictionary always applies the
Mod
operator before returning a hash value and for that, it must read thehashTableSize
(1201 by default) from the heap. This causes real speed problems when spawning many Scripting.Dictionary instances even if each instance has only a few items. See Scripting.Dictionary heap issue below for more details
Here is a standalone method that calls Scripting.Dictionary.HashVal:
Option Explicit
Private Declare PtrSafe Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (Destination As Any, Source As Any, ByVal Length As LongPtr)
Private Type ScrDictLayout
vTable(0 To 3) As LongPtr
unkPtr1 As LongPtr
refCount As Long
firstItemPtr As LongPtr
lastItemPtr As LongPtr
#If Win64 = 0 Then
Dummy As Long
#End If
hashTablePtr As LongPtr
hashTableSize As Long
compMode As Long
localeID As Long
unkPtr(0 To 1) As LongPtr
End Type
Public Function HashVal(ByRef v As Variant _
, Optional ByVal compMode As VbCompareMethod = vbBinaryCompare _
, Optional ByVal hashTableSize As Long = 1201) As Long
#If Mac Then
Err.Raise 5, , "Scripting.Dictionary not available on Mac"
#Else
Const dictVTableSize As Long = 22
Const opNumDictHashVal As Long = 21
Const opNumCollItem As Long = 7
#If Win64 Then
Const PTR_SIZE As Long = 8
Const NULL_PTR As LongLong = 0^
#Else
Const PTR_SIZE As Long = 4
Const NULL_PTR As Long = 0&
#End If
'
Static fakeDict As Collection
Static vTable(0 To dictVTableSize - 1) As LongPtr
Static sdl As ScrDictLayout
Static lcid As Long
Static isSet As Boolean
'
If Not isSet Then
'Copy entire memory layout for Scripting.Dictionary instance
CopyMemory sdl, ByVal ObjPtr(CreateObject("Scripting.Dictionary")), LenB(sdl)
'
'Copy main virtual function table
CopyMemory vTable(0), ByVal sdl.vTable(0), dictVTableSize * PTR_SIZE
'
lcid = sdl.localeID
'
'Replace main virtual table with our own
sdl.vTable(0) = VarPtr(vTable(0))
'
'Map Collection.Item to Dictionary.HashVal
vTable(opNumCollItem) = vTable(opNumDictHashVal)
'
'Set up fake instance
CopyMemory ByVal VarPtr(fakeDict), VarPtr(sdl), PTR_SIZE
'
sdl.hashTablePtr = NULL_PTR
isSet = True
End If
#If Win64 Then
If VarType(v) = vbLongLong Then 'Check for Object with Default Property
If Not IsObject(v) Then
HashVal = fakeDict.Item(CDbl(v))
Exit Function
End If
End If
#End If
If compMode < vbDatabaseCompare Then
sdl.compMode = compMode
sdl.localeID = lcid
Else
sdl.compMode = vbTextCompare
sdl.localeID = compMode
End If
sdl.hashTableSize = hashTableSize
'
HashVal = fakeDict.Item(v) 'Dict.HashVal with early-binding speed
#End If
End Function
Note that fakeDict.Item(v)
can be replaced with just fakeDict(v)
because VBA "sees" the object as a Collection
.
With the above code, calls can be made to the new HashVal
method which in turn calls Scripting.Dictionary.HashVal with early-binding speed without needing a reference. For example Debug.Print HashVal("abc")
.
Of course, the method signature for Collection.Item
:
HRESULT Item(
[in] VARIANT* Index,
[out, retval] VARIANT* pvarRet);
perfectly matches the signature for Scripting.Dictionary.HashVal
:
HRESULT HashVal(
[in] VARIANT* Key,
[out, retval] VARIANT* HashVal);
if we inspect their type libraries.
There are 2 reasons why such code was not used in this repository:
- it would require an additional .bas module - the design goal was to have a single class with zero dependencies
- it adds an extra function call (stack frame) which impacts performance, especially when dealing with millions of keys
The initial approach was to have a fake Scripting.Dictionary instance for each instance of this repo's Dictionary. However, this unveiled another Scripting.Dictionary bug which was mentioned briefly before. Luckily, while testing this Dictionary on parsing a JSON file of 12MB size which required tens of thousands of Dictionary instances, it was found that there is a serious speed impact which is only noticeable when using multiple instances. After further investigation and testing, it seems that each Scripting.Dictionary instance (real or fake) must read the hash size from the heap (and also the compare mode) for any key being hashed, and this becomes a problem for multiple instances. This was easily confirmed by moving the storage of the fake layouts into a global array of UDTs inside a standard .bas module, which immediately solved the issue and improved speed about 6-7 times.
For this Dictionary, the fix was obvious - keep a single fake instance in the default Dictionary
instance and access it from all the other instances. This was fixed in 2a39d61. Presumably if using only one instance, the same heap location is accessed each time and there must be some caching involved because the issue does not occur.
However, this cannot be fixed for Scripting.Dictionary and it now becomes clear why it is not suitable for work that involves multiple instances, like parsing a JSON.
Each instance of this Dictionary uses a fake array of Collection
type (one element) which is set to read the single fake Scripting.Dictionary instance from the default/predeclared Dictionary instance. Also, there is a fake array of type Long
(two elements) which allows each Dictionary instance to read/write compare mode and locale ID into the single fake instance. See the Private Type Hasher
struct and the InitHasher
method.
As briefly mentioned in the Hashing section, all hash values are combined with data type metadata in the upper bits of the hash with the goal of minimizing comparisons and ultimately being more efficient.
The hash + meta layout is briefly shown in the text at the top of the GetIndex method. Here is another representation:
With this chosen layout, hash values of up to 268,435,456 (0x10000000) can be stored (28 bits), while the next upper 3 bits store metadata about the type. All number data types are combined into a single type for compatibility with Scripting.Dictionary. Similarly, vbDataObject
is combined with vbObject
as per below:
Key Var Type | Meta Type |
---|---|
vbEmpty | Empty |
vbNull | Null |
vbInteger | Number |
vbLong | Number |
vbSingle | Number |
vbDouble | Number |
vbCurrency | Number |
vbDate | Number |
vbString | Text |
vbObject | Object |
vbError | Error |
vbBoolean | Number |
vbDataObject | Object |
vbDecimal | Number |
vbByte | Number |
vbLongLong | Number |
The sign bit is not used on x32 because there is separate storage available - see NewEnum section for more details. However, on x64 the sign bit is used if the Item is an object. This removes the need to have separate storage - the idea is to avoid calling IsObject
repeatedly when the item is retrieved.
For example, if a number key and a text key happen to have the same hash value of 21, when comparing the 2 hashes, they won't be the same thanks to the extra meta bits:
When searching the hash table for a key that was just hashed, we first compare the hash + meta values and only if they are equal, we then compare the actual values. This avoids unnecessary comparisons which are especially slow for texts. In fact, before we even compare the hash + meta values, we first compare a sub-hash called "control byte" but that is described in more detail in the Hash Map/Table section below.
The meta values are defined in the HashMeta enum. There is also a special value hmRemove
used to mark items/keys that have been removed.
Please note that for special float values like +Inf (positive infinity), -Inf (negative infinity), QNaN (quiet NaN) and SNaN (signaling NaN), the meta bits are all set to 0 (zero). This is to avoid comparison against these special values - the hash comparison will be enough in the same way it is for Empty
and Null
.
After watching the Designing a Fast, Efficient, Cache-friendly Hash Table, Step by Step video on YouTube, presented by Matt Kulukundis, the idea of using SSE instructions to compare multiple sub-hashes in a set of just 3 instructions sounded great. Since we cannot natively use SSE instructions in VBA, this Dictionary uses the next best thing - bitwise operations.
The structure looks like this. In short:
- the hash map/table is divided into groups (buckets) of fixed size
- each group can hold 4 (x32) or 8 (x64) elements of
Long
data type - which will store indexes into the keys/meta storage. This is an array fixed in size at compile time - each group has an integer "Control" value which is either
Long
(x32) orLongLong
(x64) to allow for bitwise operations. Each byte in the control corresponds to one element/index in the group's array. To avoid overflow problems, only the lower 7 bits in each byte are used. Each 7 bits in a control byte are in fact sub-hash values corresponding to the hash of each key pointed by the indexes in the group's array.
The following diagram illustrates the above:
When a key is hashed, the following sub-hashes are computed, in the GetIndex method:
- lower bits in the hash value are used to identify the group slot. The number of bits used depends on the number of groups (table size). A modulo operator is applied to compute this sub-hash. These are the 'pink' bits in the above diagram
- the next 7 bits in the hash value are the control byte. To compute this sub-hash, a bit mask and a right-shift is applied. These are the 'red' bits in the above diagram
If a key needs to be found, then the steps are the following:
- the full hash+meta is computed and then the 2 sub-hashes
- the group slot sub-hash indicates the group/bucket to search in
- within the group, the control sub-hash byte is checked against the group's control integer using these bitwise operations. This will find all matching positions within the group
- for each matching position, the full hash+meta value is compared
- if the full hashes match, then the actual keys are compared
- the search will continue until a match is found or until a group that was never full is encountered. Please note that during this process the group sub-hash is being incremented
Please note that by comparing sub-hashes and/or hashes minimizes the number of key comparisons, and this greatly improves performance.
Before adding a key, the steps in the Finding a key section always run first - to avoid key duplication. So, when adding a key, we already have computed:
- the full hash
- the group sub-hash. Note this will always indicate a group that has available space because this value is incremented in the finding process i.e. it might not correspond with the actual bits in the full hash
- the control byte sub-hash
When adding the Key-Item (and hash+meta) pair to the data storage, in the group indicated by the group sub-hash, the following operations are performed:
- the index for the data is added in the first available position within the group
- the corresponding byte in the control integer is updated with the value of the control sub-hash so that it can be used later for fast find
Code here.
When adding a new key, the new corresponding index is added to the hash table, so there is a chance that the group sub-hash indicates a full group. As mentioned above, this is taken care of in the finding process (by incrementing the group slot value), and we always end up adding the new index into a bucket that has available space. Since hashes are not going to have perfect distribution, a new index can be added a few groups/buckets away from the intended position. This is why we track if the group was ever full via a boolean flag. The search for a key will always be done until the first group that was never empty is found. The nice benefit is that we can achieve a higher load factor without needing to resize individual groups/buckets. The current max load is set to 50% which seems to provide a good balance between storage and performance.
As briefly mentioned in the Hashing section, there is no rehashing in the real sense of the word. Instead, all hash values are stored along with the key. Still, the method called when the hash table needs to grow is named Rehash
because most people are familiar with the concept.
By avoiding rehashing the actual keys, this Dictionary can adapt efficiently the hash table size to any number of key-item pairs.
Of course, sub-hash values are re-computed based on the full hash and the new hash table size, when the hash table needs to grow in size. However, these operations are fast (modulo and bit-shifts) and we end up having a hash map resize with a small penalty in performance.
If key-item pairs are removed from the dictionary, then some of the groups in the hash table might remain with unused positions because the Add
process only checks the WasEverFull
flag. Please note this is not a problem for the Key
method which always checks for .Count = GROUP_SIZE
. The idea is that a call to Rehash
will actually fix the unused positions and obviously, adding items can easily lead to a resize. Meanwhile, Key
(Let) will not lead to a resize and so the logic is different to avoid endlessly searching for a group slot.
To enable a For Each..
loop on a class instance in VBA, the instance class must have a defined public method with the special attribute Attribute NewEnum.VB_UserMemId = -4
where NewEnum
is the method name. That method must return an instance that implements the IEnumVariant
interface and VBA will make calls to IEnumVARIANT::Next
until all items are iterated. -4
is simply the dispIdMember
that IDispatch::Invoke
uses to call the method (late-bind call).
Unfortunately, we cannot create a class enumerator (IEnumVariant
) with native code in VBA. Other developers have tried using assembly injected at runtime - for example clsTrickHashTable. However, there is no solution available for all platforms and even the ones available often lead to crashes.
A few alternatives were explored in this Code Review article. In short:
- we can hijack calls to
IEnumVARIANT::Next
on a fake instance and use our own method inside a standard .bas module. This can be done in multiple ways; however, it is too slow (extra stack frames) compared to what a Scripting.Dictionary or a VBA.Collection can do. Moreover, this approach adds a module dependency - we can use instances of
IEnumVARIANT
as returned by a temporaryCollection
. If we can make our data structure mimic the items in a Collection's linked list, then the same code that iterates collection items can iterate our own data
This Dictionary uses approach #2 mentioned above but implemented differently for 32-bit and 64-bit versions of VBA.
A Collection item looks like this:
Private Type VbCollectionItem
Data As Variant
KeyPtr As LongPtr
pPrevIndexedItem As LongPtr
pNextIndexedItem As LongPtr
pParentItem As LongPtr
pRightBranch As LongPtr
pLeftBranch As LongPtr
bFlag As Boolean
End Type
This is well detailed on VB Forums or here.
As described in the Code Review article linked above, we only really need the following structure:
Variant | Unused Pointer | Unused pointer | Next Item Pointer |
---|
In short, just the first 4 members of a real Collection item.
So, we could create an array of such type instead of an array of Variant
and we then make sure the Next Pointer always links to the next Variant
. In fact this is the exact approach used for this Dictionary but for x32 only. For x64 there is a better approach which is less wasteful.
Many thanks to sancarn for his help on figuring out VT_RECORD (vbUserDefined).
As mentioned above, we need the following structure:
Variant | Unused Pointer | Unused pointer | Next Item Pointer |
---|
Here is a more useful diagram:
We can see on the left side of the diagram that:
- a
Variant
uses 24 Bytes on x64. The last 8 Bytes are only used for User Defined Types (UDTs). Keys in the Dictionary cannot be UDTs and so the 8 Bytes are not used - we are wasting 16 Bytes with unused pointers just to have the Next Pointer at the correct offset
In total, 24 Bytes are not used - the exact size of a Variant
.
We can see on the right side of the diagram that:
- we can make use of the 8 unused Bytes in a
Variant
and use them for the Next Pointer - the Next Pointer for Variant at position n would sit in the last Bytes of the Variant at position n+1
In total, we are using all Bytes. Moreover, we don't need a custom structure - we can just continue using an array of Variant
type.
The code that updates the Next Pointers looks like this
There is an additional benefit - we can still return the keys array via the Keys
method, without having to iterate the keys, if there are no gaps in the array. See code.
Unfortunately, we cannot apply the same strategy used for x64. This is because pointers are only 4 Bytes (Long
data type) on x32, and the Next Pointer would not have the correct offset. Moreover, the last bytes in a Variant
are used - for example, a Double
would use 8 Bytes.
So, we must use the following custom structure:
Variant | Unused Pointer | Unused pointer | Next Item Pointer |
---|
However, we can make use of the space between the Variant and the Next Pointer (Extra information):
Here is how the custom Variant looks like on x32 and here is the main data storage structure for keys and items, and this is the code that updates the Next Pointers.
Although, the Keys
method has to iterate the keys in order to return the keys array, this is not a big drawback because:
Items
method is not affectedFor Each
can be used on the Dictionary instance directly (For Each v In dict
) which is anyway faster than iterating the keys array (For Each v In dict.Keys
)
Using the EnumerableVariant
structure for storing keys brings additional benefits:
- there is no need to have a
Meta
array, like we have on x64 to store the hash+meta values - there is no need to use bitmasks to check if item or key is an object, like we have on x64
Compared to not implementing this functionality, there are only 8 extra Bytes being used per Variant: 4 for the Next Pointer and 4 for the 2 Boolean flags (is Item/Key Object). That's because the hash+meta uses 4 Bytes regardless. While the 2 Boolean flags are slightly faster than using bitmasks, the speed difference is negligeable. Overall, having the iterator functionality trumps the need for the additional space.
When using the class iterator described above, there are a few scenarios that we need to consider:
- a
For Each
loop can be called inside an existingFor Each
loop - items/keys can be added/removed while inside of a
For Each
loop. This can lead to a re-allocation of the data NewEnum
can be called and stored to be used at a later stage
To account for all the scenarios above, this Dictionary has additional management in the RemoveUnusedEnums
and ShiftEnumPointer
methods.
Compared to a Scripting.Dictionary, this Dictionary has a few extra methods that can be useful:
AllowDuplicateKeys
Factory
- returns a new Dictionary instanceIndex
- returns the index for a specified KeyItemAtIndex
- returns or replaces the Item at the specified indexKeyAtIndex
- returns the Key at the specified indexKeysItems2D
- returns a 2D array of all the Keys and ItemsPredictCount
- if the number of Key-Item pairs is known upfront or if a good guess is possible, then a call toPredictCount
with the expected number of pairs will prepare the internal size of the hash map so that there are no calls made toRehash
. This results in better performanceSelf
- this method is useful inWith New Dictionary
code blocks
Although this dictionary does not rely on other libraries or references, it still requires that the basic OLE Automation
reference in enabled. This is because the IUnknown
interface is needed to properly hash object keys - for more details see this discussion.
This is such a basic reference that even VBA itself uses it - if we look at the type library for VBA, we can see:
library VBA
{
// TLib : OLE Automation : {00020430-0000-0000-C000-000000000046}
importlib("stdole2.tlb");
This class implements fixes by overwriting x64 assembly bytes for the following known VBA x64 bugs:
- Bug with For Each enumeration on x64 Custom Classes
- VBA takes wrong branch at If-statement - severe compiler bug? which is same as issue #10
First, let's understand how x64 methods are called.
The first 8 bytes (4 bytes on x32) pointed by a class instance pointer hold the address of the class virtual table.
Each VBA class is derived from IDispatch which in turn is derived from IUnknown. In other words, the virtual table for a VBA class looks like this:
IUnknown::QueryInterface 'Position 0
IUnknown::AddRef
IUnknown::Release
IDispatch::GetTypeInfoCount
IDispatch::GetTypeInfo
IDispatch::GetIDsOfNames
IDispatch::Invoke 'Position 6
PublicMethod1 'Position 7
...
PublicMethodN
PrivateMethod1
...
PrivateMethodN
We can read the pointer to the first function in a VBA class by using the following code. In an empty project add a Class1
with the following code:
Option Explicit
Public Function Test() As Variant
Debug.Print "Test"
End Function
Public Sub Test2()
Dim i As Long
For i = 1 To 3
Debug.Print i
Next i
End Sub
Now add the following code in a standard .bas module and run the TestFuncPointer
method. This will print the pointer for the Test
method of Class1
to the Immediate window.
Option Explicit
#If Win64 Then
Const PTR_SIZE = 8
#Else
Const PTR_SIZE = 4
#End If
Private Declare PtrSafe Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (Destination As Any, Source As Any, ByVal Length As LongPtr)
Private Function MemLongPtr(ByVal addr As LongLong) As LongPtr
CopyMemory MemLongPtr, ByVal addr, PTR_SIZE
End Function
Sub TestFuncPointer()
Dim c As New Class1
Dim vTable As LongPtr
Dim testPtr As LongPtr
Dim b() As Byte
'
vTable = MemLongPtr(ObjPtr(c))
testPtr = MemLongPtr(vTable + PTR_SIZE * 7)
'
Debug.Print testPtr
End Sub
For those familiar with COM, there will be a virtual table for each interface that a class implements.
A class has many default interfaces (e.g. IMarshall
, IConnectionPointCointainer
, _DClass
etc.) but IClassModuleEvt
in particular is useful for finding the pointers for _Initialize
and _Terminate
, which in turn call, if present, the Class_Initialize
and Class_Terminate
(more details on class footprint / layout can be found here).
In addition to the default interfaces, a class will have a virtual table for each implemented interface via the Implements
keyword. It will also have an interface for each use of the WithEvents
keyword.
If we inspect the bytes pointed by the testPtr
in the above example, it seems there are some assembly instructions. These are not the actual method instructions. After testing, it seems this is simply code that is being called via late-binding (e.g. when the method is called on a variable declared As Object
) - a call via IDispatch::Invoke
. It is not called at all via early-binding (e.g. when the method is called on a variable declared As Class1
).
The following diagram shows how the bytes are organized:
Let's see the assembly for a Function
and a Sub
on x64.
Add the following code in a standard .bas module and run the TestClassMethodASM
method:
Option Explicit
#If Win64 Then
Const PTR_SIZE = 8
Private Declare PtrSafe Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (Destination As Any, Source As Any, ByVal Length As LongPtr)
Private Function MemLongPtr(ByVal addr As LongLong) As LongPtr
CopyMemory MemLongPtr, ByVal addr, PTR_SIZE
End Function
Sub TestClassMethodASM()
Dim c As New Class1
Dim vTable As LongPtr
Dim testPtr As LongPtr
Dim b() As Byte
'
vTable = MemLongPtr(ObjPtr(c))
testPtr = MemLongPtr(vTable + PTR_SIZE * 7) '7 because IDispatch has methods 0 to 6
'
ReDim b(1 To 69)
CopyMemory b(1), ByVal testPtr, UBound(b)
Debug.Print StringToHex(CStr(b))
End Sub
Public Function StringToHex(ByRef s As String) As String
Static map(0 To 255) As String
Dim b() As Byte: b = s
Dim i As Long
If LenB(map(0)) = 0 Then
For i = 0 To 255
map(i) = Right$("0" & Hex$(i), 2)
Next i
End If
StringToHex = Space$(LenB(s) * 2 + 2)
Mid$(StringToHex, 1, 2) = "0x"
For i = LBound(b) To UBound(b)
Mid$(StringToHex, (i + 1) * 2 + 1, 2) = map(b(i))
Next i
End Function
#End If
The resulting hex can be translated to this:
; Typical assembly for a class Function with no arguments
;-------------------------------------------------------------------------------
66490F6EEC ; MOVQ XMM5,R12 ; Saves R12 value in XMM5
48B8 987307FAFA7F0000 ; MOV RAX,00007FFAFA077398 ; Copies literal value into RAX - this value seems to always be the same
488B00 ; MOV RAX,QWORD PTR [RAX] ; Reads memory value pointed by RAX into RAX (Dereference)
4C8B20 ; MOV R12,QWORD PTR [RAX] ; Reads memory value pointed by RAX into R12 (Dereference)
4981EC 10000000 ; SUB R12,0000000000000010 ; Adds space (negative) for 2 pointers to the stack, one for instance pointer and one for function return
498BC4 ; MOV RAX,R12 ; Saves R12 value into RAX - seems useless because RAX gets overwritten just below in the second last instruction
49898C24 00000000 ; MOV QWORD PTR [R12+00000000],RCX ; Saves RCX value at the address pointed by R12
49899424 08000000 ; MOV QWORD PTR [R12+00000008],RDX ; Saves RDX value at the address pointed by R12 + 0x8
48BA F0D1254C0E020000 ; MOV RDX,0000020E4C25D1F0 ; Pushes literal value to RDX. The value matches the value at -8 offset from function pointer
48B8 8A1800FAFA7F0000 ; MOV RAX,00007FFAFA00188A ; Pushes literal value to RAX. This is jumped to in the next instruction and is the same value for all methods in the class i.e. a wrapper
FFE0 ; JMP RAX ; Jumps to asm pointer by RAX as mentioned above. This wrapper will use the address passed in RDX to call the required method
; ...
; Typical assembly for a class Sub with no aruments
;-------------------------------------------------------------------------------
66490F6EEC ; MOVQ XMM5,R12
48B8 987307FAFA7F0000 ; MOV RAX,00007FFAFA077398
488B00 ; MOV RAX,QWORD PTR [RAX]
4C8B20 ; MOV R12,QWORD PTR [RAX]
4981EC 08000000 ; SUB R12,0000000000000008 ; Adds space (negative) for instance pointer to the stack
498BC4 ; MOV RAX,R12
49898C24 00000000 ; MOV QWORD PTR [R12+00000000],RCX
; RDX value is no longer saved at the address pointed by R12 + 0x8 (compared to a Function)
48BA 302820570E020000 ; MOV RDX,0000020E4C25D1F0 ; The literal value is changed every time code is compiled
48B8 8A1800FAFA7F0000 ; MOV RAX,00007FFAFA00188A
FFE0 ; JMP RAX
; ...
The asm code pointed by the entries in the class virtual table is only called via late-binding and in turn calls a wrapper code (pointed by the RAX
register) that eventually calls the PCode (pointed by the RDX
register).
The RDX
value always matches the value at function pointer address less 8 bytes e.g.[testPtr - 8]
. For early-binded calls, this pointer at -8 offset is used to access the PCode directly.
To be continued..