When PHP arrays meet C#: an all-rounded reinterpretation of Lists.
Consider a task that uses a List<T>
:
- Receive/append many items
- Remove many elements in place: to conserve memory, in case your list is large
We quickly notice problems when removing many items in-place from List<T>
:
- This is actually
O(n^2)
! (where n = number of items in the list) - Can't keep old indexes (perhaps we need them for later)
- Can't do that in a
foreach
loop
We also notice the problems of the alternatives:
Array<T>
does not automatically resize: tedious memory management- It is very slow to insert many data into
Dictionary<int,T>
- By the way,
HashSet<T>
is essentially equivalent toDictionary<T,void>
- By the way,
- LINQ expressions are usually the fastest, but will require extra memory allocation when building the solution
List<T>
- Or, perhaps the problem asks for a
foreach
solution with side effects, such that LINQ is not suitable
This is a bad situation between a rock and a hard place.
Inspired by PHP's highly unconventional approach to implement arrays, lists and maps,
this C# package introduces a hybrid between the List<T>
type and the Dictionary<TKey,TValue>
type: the DictionaryList<T>
type.
Do not be afraid of the PHP origin.
While there are some trade-offs when migrating from List<T>
to DictionaryList<T>
(see the Characteristics and the Benchmarking sections),
rest assured that DictionaryList<T>
really works well.
via NuGet:
dotnet add package Vectorial1024.DictionaryList
Please see CHANGELOG.md
.
Some todos of this package:
- Implement the various list interfaces
- Performance optimization
Consider the following primer table:
List Behavior | Dictionary Behavior | |
---|---|---|
List Structure | List<T> |
ListDictionary<int, T> |
Dictionary Structure | π DictionaryList<T> π |
Dictionary<int, T> |
A DictionaryList<T>
is a List<T>
that has Dictionary
-like structure.
The theme of a DictionaryList<T>
is "deferred reindexing".
Compared to a List<T>
, there are some characteristics/restrictions:
- No
Insert()
; useAppend()
instead - No
RemoveAt()
; useUnsetAt()
instead, which leaves behind "memory gaps" - Memory gaps may be reused if you know their indexes; also see
ContainsIndex()
- Enumeration yields
KeyValuePair<int,T>
and skips over unset rows - No
TrimExcess()
; useCompactAndTrimExcess()
instead UnsetAt()
duringforeach
is allowed!
Perhaps this can be better demonstrated with some code sample.
The API is similar to a List<T>
(see the docstring or your IDE for the latest info).
Still, you may consider the following sample code:
// basic usage sample code
// create a new DictionaryList
var dictList = new DictionaryList<int>;
// add some items to it
dictList.Add(1); // index 0
dictList.Add(2); // index 1
dictList.Add(3); // index 2
dictList.Add(4); // index 3
dictList.Add(5); // index 4
// how many items?
_ = dictList.Count; // 5
// read from it
_ = dictList.ContainsIndex(3); // true
_ = dictList[3]; // 4
// write to it
dictList[3] = 12;
_ = dictList[3]; // 12
// dictList[99] = 99999; // out of bounds; not allowed!
// dictList[-4] = 42; // out of bounds; not allowed!
// remove from it
// note: removed items can be GC-ed if that was the last reference to it
dictList.UnsetAt(1);
_ = dictList.ContainsIndex(1); // false
_ = dictList.Count; // 4
// _ = dictList[1]; // out of bounds; not allowed!
// "revive" indexes, but we recommend Add() if you do not care about the value of indexes.
dictList[1] = 11;
_ = dictList[1]; // 11;
// let's delete it again...
dictList.UnsetAt(1);
// ...to demonstrate traversal
foreach (var kv in dictList)
{
_ = (kv.Key, kv.Value);
// yields in order: (0, 1), (2, 3), (3, 12), (4, 5); 4 items!
}
// we have a gap at index=1; reclaim gaps by doing:
dictList.CompactAndTrimExcess();
// we are done with the data; clear everything by doing:
dictList.Clear();
_ = dictList.Count; // 0
DictionaryList<T>
is NOT thread safe!
DictionaryList<T>
is compatible with Native AOT.
This package is benchmarked with the BenchmarkDotNet package.
To run the benchmark:
# BenchmarkDotNet strongly recommends benchmarking in Release mode
dotnet run -c=Release --project=Benchmarking
A benchmark was run with version 0.1.2
of this library. Its detailed results can be found in BENCHMARK.md
.
But still, here is a table outlining the performances when N = 100000
:
Task | List | DictionaryList | Dictionary | SortedDictionary |
---|---|---|---|---|
Append Many Items (speed) | 155 ms β | 317 ms π | 1073 ms | 10507 ms |
Append Many Items (memory) | 1048976 B β | 2097536 B π | 6037640 B | 4800112 B |
Full Iteration (speed) | 50 ms β | 176 ms π | 162 ms | 587 ms |
Full Iteration (memory) | 0 B | 48 B | 0 B | 312 B |
Read Many Items (speed) | 17 ms β | 65 ms π | 147 ms | 1322 ms |
Read Many Items (memory) | 0 B | 0 B | 0 B | 0 B |
Remove Many Items In-place (speed) | 557307 ms | 789 ms π | 316 ms β | 810 ms |
Remove Many Items In-place (memory) | 0 B | 48 B | 0 B | 312 B |
Remove Many Items w/ LINQ (speed) | 104 ms β | 793 ms π | 885 ms | 1693 ms |
Remove Many Items w/ LINQ (memory) | 198072 B β | 529280 B π | 673168 B | 1473296 B |
Add Items During foreach |
β | β new key | β new key | β |
Emit Key/Index During foreach |
π€· [1] | βοΈ | βοΈ | βοΈ |
Replace Items During foreach |
π€· [1] | βοΈ | βοΈ | β if key exists |
Remove Items During foreach |
β | βοΈ | βοΈ | β |
You may see that DictionaryList<T>
is an all-rounded, midway solution between a List<T>
and a Dictionary<TKey,TValue>
.
As part of the benchmark, you may also see that SortedDictionary<TKey,TValue>
is generally a bad type to use compared to other similar types.
The following collection types are excluded from the benchmarking:
HashSet<T>
: too similar toDictionary<T, void>
, probably has similar performanceOrderedDictionary
: ambiguity between intended<int, T>
usage andthis[int]
syntaxLinkedList<T>
: unfair/impossible comparison due to lack of index accessSortedList<int,T>
: too similar toSortedDictionary<int, T>
, probably has similar performance
This package uses NUnit for testing.
Run the tests with:
dotnet test
[1] Technically, this can be done with Enumerable LINQ's Index
method, but using LINQ with foreach
is perhaps an antipattern.