Below is the output from various benchee runs with descriptions of what we did for each.
This approach attempted to compare the current approach of xml -> xmerl -> data_schema (by way of SweetXML queries) VS going straight from Saxy to data_schema structs.
There are actually a few ways we can approach the latter, this first approach took the simplest idea and created a SaxyDataAccessor. The upshot being that we "Saxy" the entire XML document once per field in the schema. Each time though we look for one very specific value (the value pointed to by the field in the schema). This means we stop as soon as we find it AND that for the vast majority of events we ignore them completely. This approach would get faster if we could skip (as soon as possible) a subtree - like if we know the node doesn't appear in our path the sooner we can skip the better.
The big problem of this is the has_many's. They become very tricky because you need to be able to iterate through a subset of the XML, which is actually harder than it sounds. Either you "parse" the list of things by reconstructing the string from the saxy events - which seems mental. OR you parse the has_manys to a different representation and have clauses in the accessor for all fns for that representation. One which handles it by querying it in some way.
For this experiment we hacked a thing which isn't completely accurate, but it gives us enough information to know whether we should continue the effort to make has_many work. SO
(Note this one is harder to bench with large XML because it requires a lot more work).
❯ mix run bench.exs
Operating System: macOS
CPU Information: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Number of Available Cores: 16
Available memory: 32 GB
Elixir 1.13.1
Erlang 24.1.7
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 1 s
reduction time: 1 s
parallel: 1
inputs: none specified
Estimated total run time: 18 s
Benchmarking experiment parse many times ...
Benchmarking xmerl -> data_schema ...
Name ips average deviation median 99th %
xmerl -> data_schema 12.93 K 77.35 μs ±23.90% 74 μs 165 μs
experiment parse many times 11.53 K 86.71 μs ±15.81% 85 μs 148 μs
Comparison:
xmerl -> data_schema 12.93 K
experiment parse many times 11.53 K - 1.12x slower +9.37 μs
Memory usage statistics:
Name Memory usage
xmerl -> data_schema 107.41 KB
experiment parse many times 60.83 KB - 0.57x memory usage -46.58594 KB
**All measurements for memory usage were the same**
Reduction count statistics:
Name Reduction count
xmerl -> data_schema 60.43 K
experiment parse many times 6.27 K - 0.10x reduction count -54.15800 K
**All measurements for reduction count were the same**
What's super interesting is although it is slower it uses like half the memory, on very small input. We should really try on a larger input to explore that more.. but it does require getting a solution for the "has_many" problem...
Chose to use a Jetstar Seat map as an example because it's fairly easy to copy the schemas over and the XML is a decent enough size. File size is: 1.8mbs.
This is a benchmark for just the XMERL -> DataSchema approach:
Operating System: macOS
CPU Information: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Number of Available Cores: 16
Available memory: 32 GB
Elixir 1.13.1
Erlang 24.1.7
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 1 s
reduction time: 1 s
parallel: 1
inputs: none specified
Estimated total run time: 9 s
Benchmarking xmerl -> data_schema ...
Name ips average deviation median 99th %
xmerl -> data_schema 3.59 278.83 ms ±15.18% 299.79 ms 347.19 ms
Memory usage statistics:
Name Memory usage
xmerl -> data_schema 153.61 MB
**All measurements for memory usage were the same**
Reduction count statistics:
Name average deviation median 99th %
xmerl -> data_schema 27.45 M ±0.03% 27.46 M 27.46 M
What's nuts is look at that memory! 153mbs! That's like 10x what our original is! This is probably because there is a lot of nesting I guess meaning a lot of parents. Or really a lot of tags inside tags. IN FACT! I know why easyjet is so fast! They use attrs for everything, so there are like no parents! They also return way fewer options too but yea....
This approach uses almost the exact same approach as now, but uses maps for the xml elements rather than xmerl records. This used the large response fixture of request_id FuOOEnVBFfAVGy8AeRED
❯ mix run bench.exs
Compiling 1 file (.ex)
Operating System: macOS
CPU Information: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Number of Available Cores: 16
Available memory: 32 GB
Elixir 1.13.1
Erlang 24.1.7
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 1 s
reduction time: 1 s
parallel: 1
inputs: none specified
Estimated total run time: 18 s
Benchmarking current to map ...
Benchmarking current to xmerl (records) ...
Name ips average deviation median 99th %
current to map 1.13 0.88 s ±10.04% 0.85 s 1.06 s
current to xmerl (records) 0.98 1.02 s ±8.22% 0.98 s 1.17 s
Comparison:
current to map 1.13
current to xmerl (records) 0.98 - 1.15x slower +0.133 s
Memory usage statistics:
Name Memory usage
current to map 253.50 MB
current to xmerl (records) 290.61 MB - 1.15x memory usage +37.10 MB
**All measurements for memory usage were the same**
Reduction count statistics:
Name Reduction count
current to map 388.46 M
current to xmerl (records) 389.00 M - 1.00x reduction count +0.54 M
**All measurements for reduction count were the same**
What's crazy about this is the memory used in both. It's like 253mbs, for a ~9.1mb file!
This approach uses almost the exact same approach as now, but uses maps for the xml elements rather than xmerl records. This used the large response fixture of request_id FuOOEnVBFfAVGy8AeRED
BUT we keep the element names as strings....
❯ mix run bench.exs
Compiling 1 file (.ex)
warning: variable "atom_fun" is unused (if the variable is not meant to be used, prefix it with an underscore)
lib/saxy/experiment/xmerl_map.ex:101: Saxy.XmerlMap.make_name/2
Operating System: macOS
CPU Information: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Number of Available Cores: 16
Available memory: 32 GB
Elixir 1.13.1
Erlang 24.1.7
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 1 s
reduction time: 1 s
parallel: 1
inputs: none specified
Estimated total run time: 18 s
Benchmarking current to map ...
Benchmarking current to xmerl (records) ...
Name ips average deviation median 99th %
current to map 1.08 0.93 s ±6.77% 0.90 s 1.05 s
current to xmerl (records) 0.93 1.08 s ±12.15% 1.03 s 1.31 s
Comparison:
current to map 1.08
current to xmerl (records) 0.93 - 1.16x slower +0.148 s
Memory usage statistics:
Name Memory usage
current to map 253.16 MB
current to xmerl (records) 290.61 MB - 1.15x memory usage +37.45 MB
**All measurements for memory usage were the same**
Reduction count statistics:
Name Reduction count
current to map 388.73 M
current to xmerl (records) 389.19 M - 1.00x reduction count +0.46 M
**All measurements for reduction count were the same**
What's crazy about this is the memory is basically the same as with atoms... which I guess makes sense because the real saving is if we did a second XML with the same binaries, we'd be able to re-use atoms but not the binaries I would guess...
This attempts to slim down the map that gets created to make it simpler to query. We will need to bench the query performance because it might be that sweet XML is just way quicker than access anyway. But for example, the current "xmerl" approach saves all the parents on every node. That's like a HUGE amount of data, and I cannot think why we'd need it. (Though I am a simpleton).
Similarly we do shit like save the namespace as separate, not sure why we can't just but the namespace as the element name. That's actually what we want really.
And finally the "count" seems just absolutely useless because we already order the elements by the how they appear in the XML it seems, so surely we can just index off of that. IN FACT it's better because I'm pretty sure xpath is 1-indexed, which is confusing.
❯ mix run bench.exs
Operating System: macOS
CPU Information: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Number of Available Cores: 16
Available memory: 32 GB
Elixir 1.13.1
Erlang 24.1.7
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 1 s
reduction time: 1 s
parallel: 1
inputs: none specified
Estimated total run time: 18 s
Benchmarking current to map ...
Benchmarking current to xmerl (records) ...
Name ips average deviation median 99th %
current to map 2.68 0.37 s ±3.34% 0.37 s 0.40 s
current to xmerl (records) 0.96 1.04 s ±9.10% 0.99 s 1.20 s
Comparison:
current to map 2.68
current to xmerl (records) 0.96 - 2.78x slower +0.66 s
Memory usage statistics:
Name Memory usage
current to map 131.41 MB
current to xmerl (records) 290.61 MB - 2.21x memory usage +159.20 MB
**All measurements for memory usage were the same**
Reduction count statistics:
Name average deviation median 99th %
current to map 16.36 M ±0.03% 16.36 M 16.36 M
current to xmerl (records) 389.22 M ±0.00% 389.22 M 389.22 M
Comparison:
current to map 16.36 M
current to xmerl (records) 389.22 M - 23.80x reduction count +372.86 M
We have halved the memory usage here, though it is still suspiciously high!
Here we do the same but remove the "count" from the accumulator as we don't use it:
❯ mix run bench.exs
Compiling 1 file (.ex)
Operating System: macOS
CPU Information: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Number of Available Cores: 16
Available memory: 32 GB
Elixir 1.13.1
Erlang 24.1.7
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 1 s
reduction time: 1 s
parallel: 1
inputs: none specified
Estimated total run time: 18 s
Benchmarking current to map ...
Benchmarking current to xmerl (records) ...
Name ips average deviation median 99th %
current to map 2.51 0.40 s ±2.71% 0.40 s 0.42 s
current to xmerl (records) 0.95 1.06 s ±7.47% 1.03 s 1.19 s
Comparison:
current to map 2.51
current to xmerl (records) 0.95 - 2.65x slower +0.66 s
Memory usage statistics:
Name Memory usage
current to map 127.25 MB
current to xmerl (records) 290.61 MB - 2.28x memory usage +163.36 MB
**All measurements for memory usage were the same**
Reduction count statistics:
Name average deviation median 99th %
current to map 16.43 M ±0.07% 16.43 M 16.43 M
current to xmerl (records) 389.29 M ±0.00% 389.29 M 389.29 M
Comparison:
current to map 16.43 M
current to xmerl (records) 389.29 M - 23.70x reduction count +372.87 M
Even better memory impact which is nice, still high but whatcha gonna do. I'll take a halving for sure!
But now we need to see if we can query efficiently...
This approach compared the current xmerl handler, the newer "slimmed down" map and a new version of the slimmed down map approach that aims to make it really easy to query for the data inside it. Check out the Saxy.XmerlMapDynamic
module but we do that by making the keys the names of the nodes in the XML. The hope is that we can then take the normal xpath
and translate it into what is essentially a lens into the data.... We can now test which is quicker / smaller mems overall including querying the result of saxy.
This below is just the saxy bit and not the querying for data inside it. It uses request ID FuOOEnVBFfAVGy8AeRED
❯ mix run bench.exs
Operating System: macOS
CPU Information: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Number of Available Cores: 16
Available memory: 32 GB
Elixir 1.13.4
Erlang 24.1.7
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 1 s
reduction time: 1 s
parallel: 1
inputs: none specified
Estimated total run time: 27 s
Benchmarking XML to PATHED UP map ...
Benchmarking XML to map (trimmed) ...
Benchmarking current to xmerl (records) ...
Name ips average deviation median 99th %
XML to map (trimmed) 2.69 371.50 ms ±4.59% 368.88 ms 417.18 ms
XML to PATHED UP map 1.95 512.49 ms ±5.76% 521.68 ms 554.67 ms
current to xmerl (records) 0.71 1413.57 ms ±14.84% 1492.59 ms 1558.00 ms
Comparison:
XML to map (trimmed) 2.69
XML to PATHED UP map 1.95 - 1.38x slower +140.99 ms
current to xmerl (records) 0.71 - 3.80x slower +1042.06 ms
Memory usage statistics:
Name Memory usage
XML to map (trimmed) 127.25 MB
XML to PATHED UP map 190.24 MB - 1.50x memory usage +62.99 MB
current to xmerl (records) 290.61 MB - 2.28x memory usage +163.36 MB
**All measurements for memory usage were the same**
Reduction count statistics:
Name average deviation median 99th %
XML to map (trimmed) 16.02 M ±0.04% 16.02 M 16.02 M
XML to PATHED UP map 20.64 M ±0.06% 20.64 M 20.65 M
current to xmerl (records) 389.36 M ±0.00% 389.36 M 389.36 M
Comparison:
XML to map (trimmed) 16.02 M
XML to PATHED UP map 20.64 M - 1.29x reduction count +4.62 M
current to xmerl (records) 389.36 M - 24.31x reduction count +373.34 M
That. Is. Dope. We use less memory and it's faster, even for a large input, so either map approach is promising, we just need to see which fares better when it comes to querying data.
The memory is still huge though.
Interestingly the Saxy.XmerlMap
uses less memory... I think it would use even less memory if we used a struct, and we can for that approach as it doesn't require dynamic keys.
Let's add into the mix 2 more approaches:
- XML to map (trimmed) with Struct instead of a map
- Using tuples for
XmerlMapDynamic
This is an example that compares current with a "dynamic map tuple" approach
❯ mix run bench.exs
Operating System: macOS
CPU Information: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Number of Available Cores: 16
Available memory: 32 GB
Elixir 1.13.4
Erlang 24.1.7
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 1 s
reduction time: 1 s
parallel: 1
inputs: none specified
Estimated total run time: 18 s
Benchmarking XmerlMapDynamicTuple ...
Benchmarking current to xmerl (records) ...
Name ips average deviation median 99th %
XmerlMapDynamicTuple 2.81 0.36 s ±2.35% 0.36 s 0.37 s
current to xmerl (records) 0.98 1.02 s ±9.58% 0.98 s 1.20 s
Comparison:
XmerlMapDynamicTuple 2.81
current to xmerl (records) 0.98 - 2.88x slower +0.67 s
Memory usage statistics:
Name Memory usage
XmerlMapDynamicTuple 122.48 MB
current to xmerl (records) 290.61 MB - 2.37x memory usage +168.13 MB
**All measurements for memory usage were the same**
Reduction count statistics:
Name average deviation median 99th %
XmerlMapDynamicTuple 14.89 M ±0.04% 14.89 M 14.89 M
current to xmerl (records) 389.15 M ±0.00% 389.15 M 389.15 M
Comparison:
XmerlMapDynamicTuple 14.89 M
current to xmerl (records) 389.15 M - 26.14x reduction count +374.26 M
This compares all of them tried so far.
❯ mix run bench.exs
Operating System: macOS
CPU Information: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Number of Available Cores: 16
Available memory: 32 GB
Elixir 1.13.4
Erlang 24.1.7
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 1 s
reduction time: 1 s
parallel: 1
inputs: none specified
Estimated total run time: 36 s
Benchmarking XML to PATHED UP map ...
Benchmarking XML to map (trimmed) ...
Benchmarking XmerlMapDynamicTuple ...
Benchmarking current to xmerl (records) ...
Name ips average deviation median 99th %
XML to map (trimmed) 2.82 354.16 ms ±1.63% 353.17 ms 365.30 ms
XmerlMapDynamicTuple 2.80 357.30 ms ±2.14% 360.02 ms 370.90 ms
XML to PATHED UP map 1.99 501.74 ms ±3.58% 504.36 ms 528.71 ms
current to xmerl (records) 0.97 1033.55 ms ±8.28% 995.71 ms 1183.84 ms
Comparison:
XML to map (trimmed) 2.82
XmerlMapDynamicTuple 2.80 - 1.01x slower +3.14 ms
XML to PATHED UP map 1.99 - 1.42x slower +147.57 ms
current to xmerl (records) 0.97 - 2.92x slower +679.39 ms
Memory usage statistics:
Name Memory usage
XML to map (trimmed) 127.25 MB
XmerlMapDynamicTuple 122.48 MB - 0.96x memory usage -4.77155 MB
XML to PATHED UP map 190.24 MB - 1.50x memory usage +62.99 MB
current to xmerl (records) 290.61 MB - 2.28x memory usage +163.36 MB
**All measurements for memory usage were the same**
Reduction count statistics:
Name average deviation median 99th %
XML to map (trimmed) 16.26 M ±0.08% 16.27 M 16.27 M
XmerlMapDynamicTuple 14.89 M ±0.05% 14.89 M 14.89 M
XML to PATHED UP map 20.22 M ±0.03% 20.22 M 20.22 M
current to xmerl (records) 389.12 M ±0.00% 389.12 M 389.12 M
Comparison:
XML to map (trimmed) 16.27 M
XmerlMapDynamicTuple 14.89 M - 0.92x reduction count -1.37824 M
XML to PATHED UP map 20.22 M - 1.24x reduction count +3.95 M
current to xmerl (records) 389.12 M - 23.92x reduction count +372.86 M
This is with DynamicTuple using an atom for the node names:
Operating System: macOS
CPU Information: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Number of Available Cores: 16
Available memory: 32 GB
Elixir 1.13.4
Erlang 24.1.7
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 1 s
reduction time: 1 s
parallel: 1
inputs: none specified
Estimated total run time: 36 s
Benchmarking XML to PATHED UP map ...
Benchmarking XML to map (trimmed) ...
Benchmarking XmerlMapDynamicTuple ...
Benchmarking current to xmerl (records) ...
Name ips average deviation median 99th %
XmerlMapDynamicTuple 2.71 368.36 ms ±5.70% 364.50 ms 427.84 ms
XML to map (trimmed) 2.66 375.69 ms ±5.08% 372.45 ms 413.53 ms
XML to PATHED UP map 1.95 512.60 ms ±5.92% 504.64 ms 582.14 ms
current to xmerl (records) 0.90 1116.80 ms ±10.40% 1071.30 ms 1318.31 ms
Comparison:
XmerlMapDynamicTuple 2.71
XML to map (trimmed) 2.66 - 1.02x slower +7.34 ms
XML to PATHED UP map 1.95 - 1.39x slower +144.24 ms
current to xmerl (records) 0.90 - 3.03x slower +748.45 ms
Memory usage statistics:
Name Memory usage
XmerlMapDynamicTuple 122.48 MB
XML to map (trimmed) 127.25 MB - 1.04x memory usage +4.77 MB
XML to PATHED UP map 190.24 MB - 1.55x memory usage +67.77 MB
current to xmerl (records) 290.61 MB - 2.37x memory usage +168.13 MB
**All measurements for memory usage were the same**
Reduction count statistics:
Name average deviation median 99th %
XmerlMapDynamicTuple 15.13 M ±0.07% 15.13 M 15.14 M
XML to map (trimmed) 16.34 M ±0.30% 16.36 M 16.37 M
XML to PATHED UP map 20.24 M ±0.11% 20.24 M 20.26 M
current to xmerl (records) 389.40 M ±0.00% 389.40 M 389.40 M
Comparison:
XmerlMapDynamicTuple 15.13 M
XML to map (trimmed) 16.34 M - 1.08x reduction count +1.20 M
XML to PATHED UP map 20.24 M - 1.34x reduction count +5.11 M
current to xmerl (records) 389.40 M - 25.73x reduction count +374.27 M
Interesting that the tuple approach is faster. NOW LETS ADD QUERYING.
This test run was done on a this very small XML sample:
<SteamedHam price="1">
<ReadyDate>2021-09-11</ReadyDate>
<ReadyTime>15:50:07.123Z</ReadyTime>
<Sauce Name="burger sauce">spicy</Sauce>
<Type>medium rare</Type>
<Salads>
<Salad Name="ceasar">
<Cheese Mouldy="true">Blue</Cheese>
</Salad>
<Salad Name="cob">
<Leaf type="lambs lettuce">washed</Leaf>
</Salad>
</Salads>
</SteamedHam>
The results are pretty astonishing!
❯ mix run bench.exs
Operating System: macOS
CPU Information: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Number of Available Cores: 16
Available memory: 32 GB
Elixir 1.13.4
Erlang 24.1.7
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 1 s
reduction time: 1 s
parallel: 1
inputs: none specified
Estimated total run time: 18 s
Benchmarking XmerlMapDynamicTuple -> DataSchema ...
Benchmarking xmerl -> data_schema ...
Name ips average deviation median 99th %
XmerlMapDynamicTuple -> DataSchema 59.57 K 16.79 μs ±45.60% 15 μs 51 μs
xmerl -> data_schema 13.19 K 75.80 μs ±19.63% 74 μs 149 μs
Comparison:
XmerlMapDynamicTuple -> DataSchema 59.57 K
xmerl -> data_schema 13.19 K - 4.52x slower +59.01 μs
Memory usage statistics:
Name Memory usage
XmerlMapDynamicTuple -> DataSchema 19.59 KB
xmerl -> data_schema 107.33 KB - 5.48x memory usage +87.73 KB
**All measurements for memory usage were the same**
Reduction count statistics:
Name Reduction count
XmerlMapDynamicTuple -> DataSchema 1.43 K
xmerl -> data_schema 60.43 K - 42.14x reduction count +58.99 K
**All measurements for reduction count were the same**
Now there might not be complete feature parity between the two approaches here... But currently my approach is about 4.5 times faster and uses about 5.5 times less memory.... WILD.
It is still an insane amount of memory for the size of XML.... but yea.