Profiling & optimization #11

florian-huber · 2022-10-30T19:44:51Z

Currently the key data part is implemented using numpy structured arrays.
An alternative approach I tried used pandas multi-index DataFrames (#4 ).

This issue is to explore/discuss different implementations and their performance.

florian-huber · 2022-12-02T14:16:31Z

I tried several implementations including Numpy (multiple variants), Pandas, Polars, Numba (multiple variants).

Pro:

Cons:

Requires several more complex utils functions.
Currently really slow when it comes to stacking more layers --> join data step is a serious bottleneck!

Pro:

Fast addition of additional data (joins/merges)
Rather accessible implementation under the hood (when compared to the structured arrays)
Allows storing different datatypes.

Cons:

Slow when it comes to slicing. Since this is a very common action I would consider this to be critical.

Pro:

Very fast addition of additional data (joins/merges)
Rather accessible implementation under the hood (when compared to the structured arrays)
Allows storing different datatypes.

Cons:

Notably slower than Numpy when it comes to slicing. Since this is a very common action I would consider this to be critical.

Pro:

Cons:

florian-huber · 2022-12-02T14:18:06Z

Here results from a profiling script I made:

This was referenced Dec 2, 2022

Drastic bottleneck: array joining #13

Closed

Improve merging #12

Merged

florian-huber mentioned this issue Dec 2, 2022

Pandafy data #4

Closed

Provide feedback