Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: fix enum performance #1307

Merged
merged 2 commits into from
Nov 26, 2024
Merged

perf: fix enum performance #1307

merged 2 commits into from
Nov 26, 2024

Conversation

bonjourmauko
Copy link
Member

@bonjourmauko bonjourmauko commented Nov 21, 2024

Fixes #1306

Performance

  • Fix enum's module performance issues
    • 43.0.0 fixed impending bugs in indexed_enums and improved EnumArray
      performance
    • However, Enun.__eq__ and Enum.encode suffered from performance
      degradation on large datasets
    • This changeset aims at correcting these while keeping the bugfixes provided
      by the aforesaid published version

Note to reviewers

Some of the spectacular performances of Enum.encode came from the fact that
it didn't actually work, leaving buggy behaviour unseen (see for example
openfisca/openfisca-france@84e41a5).

This PR introduces O(n) and O(1) use of fancy indexing, vector masking, and
numpy.searchsorted, that scales nicely with large datasets (10k+).

However, as we need to validate data at enum encoding time, the encoding of
int and str sequences can't be faster than the pre-43.0.0 just because
data has to be copied over.

If ever this becomes problematic for very large datasets (50M+), we can workout
a feature flag to disable fancy indexing and trusting data has been properly
validated priorly by the user disabling run-time data validation, and so to
gain from the performance of using a memory view instead of copying data over
(that is, not using neither fancy indexing nor binary search).

However, it seems the least surprising for every user that the data be
validated before encoding (out of bounds indices and wrong str values not
present in an Enum).

Benchmarks

Against 42.0.0
------------------------------------------------------------------------------------ benchmark 'Enum.__eq__': 2 tests ------------------------------------------------------------------------------------
Name (time in us)                            Min               Max              Mean            StdDev            Median               IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_eq (0001_e8b0acf)     6.1164 (1.0)      6.2392 (1.0)      6.1712 (1.0)      0.0236 (1.0)      6.1702 (1.0)      0.0284 (1.13)         30;2      162.0429 (1.0)         100       10000
test_benchmark_enum_eq (NOW)              6.1806 (1.01)     6.3476 (1.02)     6.2902 (1.02)     0.0286 (1.21)     6.2958 (1.02)     0.0252 (1.0)          14;7      158.9779 (0.98)        100       10000
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------- benchmark 'Enum.encode (Enum)': 2 tests -----------------------------------------------------------------------------------
Name (time in ms)                                     Min               Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_encode_enum (NOW)              1.8063 (1.0)      2.0201 (1.0)      1.8426 (1.0)      0.0409 (1.0)      1.8268 (1.0)      0.0383 (1.0)          13;7  542.7000 (1.0)         100          10
test_benchmark_enum_encode_enum (0001_e8b0acf)     1.9466 (1.08)     2.4946 (1.23)     2.0038 (1.09)     0.0812 (1.98)     1.9726 (1.08)     0.0592 (1.55)        15;11  499.0517 (0.92)        100          10
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------- benchmark 'Enum.encode (int)': 2 tests --------------------------------------------------------------------------------------------------
Name (time in ns)                                         Min                    Max                   Mean                StdDev                 Median                 IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_encode_int (0001_e8b0acf)        600.0000 (1.0)       4,333.4001 (1.0)         660.3360 (1.0)        371.1875 (1.0)         620.9000 (1.0)       12.4500 (1.0)           1;3    1,514.3805 (1.0)         100          10
test_benchmark_enum_encode_int (NOW)              36,487.4999 (60.81)    51,941.5989 (11.99)    36,839.4600 (55.79)    1,554.0287 (4.19)     36,554.2001 (58.87)    229.0995 (18.40)        1;12       27.1448 (0.02)        100          10
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------- benchmark 'Enum.encode (str)': 2 tests -----------------------------------------------------------------------------------
Name (time in ms)                                    Min               Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_encode_str (0001_e8b0acf)     1.1514 (1.0)      5.4571 (2.05)     1.3338 (1.0)      0.5570 (4.93)     1.1804 (1.0)      0.0383 (1.15)         6;16  749.7253 (1.0)         100          10
test_benchmark_enum_encode_str (NOW)              1.7690 (1.54)     2.6589 (1.0)      1.8262 (1.37)     0.1130 (1.0)      1.7950 (1.52)     0.0332 (1.0)           7;9  547.5705 (0.73)        100          10
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------ benchmark 'EnumArray.__eq__': 2 tests -------------------------------------------------------------------------------------
Name (time in us)                                  Min               Max              Mean            StdDev            Median               IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_array_eq (NOW)              6.4008 (1.0)      7.0462 (1.0)      6.4869 (1.0)      0.0953 (1.0)      6.4546 (1.0)      0.0511 (1.05)        12;15      154.1559 (1.0)         100         100
test_benchmark_enum_array_eq (0001_e8b0acf)     8.8654 (1.39)     9.3842 (1.33)     8.9354 (1.38)     0.1009 (1.06)     8.8969 (1.38)     0.0487 (1.0)         10;15      111.9138 (0.73)        100         100
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------- benchmark 'EnumArray.decode': 2 tests --------------------------------------------------------------------------------------------
Name (time in us)                                        Min                   Max                Mean             StdDev              Median               IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_array_decode (NOW)              241.8246 (1.0)        254.4829 (1.0)      243.7532 (1.0)       2.0537 (1.0)      243.1275 (1.0)      1.4817 (1.0)           9;8        4.1025 (1.0)         100         100
test_benchmark_enum_array_decode (0001_e8b0acf)     752.3354 (3.11)     1,501.3621 (5.90)     765.7738 (3.14)     74.5790 (36.31)    757.6515 (3.12)     4.3040 (2.90)          1;3        1.3059 (0.32)        100         100
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------- benchmark 'EnumArray.decode_to_str': 2 tests ------------------------------------------------------------------------------------------
Name (time in us)                                               Min                 Max                Mean            StdDev              Median               IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_array_decode_to_str (NOW)              313.1738 (1.0)      321.3071 (1.0)      317.8362 (1.0)      0.9421 (1.0)      317.7827 (1.0)      0.9496 (1.0)          12;5        3.1463 (1.0)         100         100
test_benchmark_enum_array_decode_to_str (0001_e8b0acf)     861.6308 (2.75)     916.3904 (2.85)     871.6658 (2.74)     9.9436 (10.56)    868.3410 (2.73)     6.7525 (7.11)        12;11        1.1472 (0.36)        100         100
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Against 43.0.0
------------------------------------------------------------------------------------ benchmark 'Enum.__eq__': 2 tests ------------------------------------------------------------------------------------
Name (time in us)                            Min               Max              Mean            StdDev            Median               IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_eq (NOW)              6.2050 (1.0)      7.1716 (1.02)     6.3081 (1.0)      0.1370 (2.70)     6.2640 (1.0)      0.0587 (1.0)         10;14      158.5255 (1.0)         100       10000
test_benchmark_enum_eq (0002_d95a1ce)     6.8115 (1.10)     7.0431 (1.0)      6.9327 (1.10)     0.0507 (1.0)      6.9421 (1.11)     0.0646 (1.10)         28;0      144.2448 (0.91)        100       10000
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------- benchmark 'Enum.encode (Enum)': 2 tests -----------------------------------------------------------------------------------
Name (time in ms)                                     Min               Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_encode_enum (NOW)              1.7899 (1.0)      1.8529 (1.01)     1.8080 (1.0)      0.0109 (1.04)     1.8053 (1.0)      0.0165 (1.37)         25;1  553.0894 (1.0)         100          10
test_benchmark_enum_encode_enum (0002_d95a1ce)     1.7940 (1.00)     1.8348 (1.0)      1.8099 (1.00)     0.0105 (1.0)      1.8086 (1.00)     0.0120 (1.0)          40;1  552.5250 (1.00)        100          10
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------ benchmark 'Enum.encode (int)': 2 tests ------------------------------------------------------------------------------------------------
Name (time in us)                                         Min                    Max                   Mean              StdDev                 Median                IQR            Outliers          OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_encode_int (NOW)                  36.4666 (1.0)          72.2875 (1.0)          37.3550 (1.0)        3.8081 (1.0)          36.5750 (1.0)       0.6438 (1.0)           2;5  26,770.1773 (1.0)         100          10
test_benchmark_enum_encode_int (0002_d95a1ce)     12,735.8166 (349.25)   13,852.8666 (191.64)   12,821.3985 (343.23)   180.2659 (47.34)    12,763.6396 (348.97)   37.4478 (58.17)        8;15      77.9946 (0.00)        100          10
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------- benchmark 'Enum.encode (str)': 2 tests -------------------------------------------------------------------------------------
Name (time in ms)                                     Min                Max               Mean            StdDev             Median               IQR            Outliers       OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_encode_str (NOW)               1.7953 (1.0)       2.0568 (1.0)       1.8196 (1.0)      0.0292 (1.0)       1.8132 (1.0)      0.0118 (1.0)           3;9  549.5770 (1.0)         100          10
test_benchmark_enum_encode_str (0002_d95a1ce)     21.2889 (11.86)    21.9644 (10.68)    21.4329 (11.78)    0.1193 (4.08)     21.4100 (11.81)    0.1202 (10.14)        18;6   46.6572 (0.08)        100          10
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------ benchmark 'EnumArray.__eq__': 2 tests -------------------------------------------------------------------------------------
Name (time in us)                                  Min               Max              Mean            StdDev            Median               IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_array_eq (0002_d95a1ce)     6.3300 (1.0)      6.9117 (1.0)      6.4014 (1.0)      0.0804 (1.0)      6.3788 (1.0)      0.0373 (1.0)         10;12      156.2161 (1.0)         100         100
test_benchmark_enum_array_eq (NOW)              6.3796 (1.01)     7.7629 (1.12)     6.4775 (1.01)     0.1607 (2.00)     6.4352 (1.01)     0.0594 (1.59)         6;14      154.3805 (0.99)        100         100
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------ benchmark 'EnumArray.decode': 2 tests -------------------------------------------------------------------------------------------
Name (time in us)                                        Min                 Max                Mean            StdDev              Median               IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_array_decode (NOW)              239.6783 (1.0)      252.8412 (1.0)      241.4663 (1.00)     1.4732 (1.0)      241.2848 (1.00)     0.7644 (1.0)          12;8        4.1414 (1.00)        100         100
test_benchmark_enum_array_decode (0002_d95a1ce)     240.0513 (1.00)     259.0704 (1.02)     241.1311 (1.0)      1.9358 (1.31)     240.7379 (1.0)      0.9602 (1.26)          1;4        4.1471 (1.0)         100         100
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------- benchmark 'EnumArray.decode_to_str': 2 tests -------------------------------------------------------------------------------------------
Name (time in us)                                               Min                 Max                Mean             StdDev              Median               IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_enum_array_decode_to_str (NOW)              316.5517 (1.0)      473.0125 (1.46)     322.0825 (1.01)     17.4004 (21.00)    317.8633 (1.0)      4.1794 (5.62)          4;6        3.1048 (0.99)        100         100
test_benchmark_enum_array_decode_to_str (0002_d95a1ce)     318.1996 (1.01)     323.5029 (1.0)      319.1073 (1.0)       0.8285 (1.0)      318.9919 (1.00)     0.7440 (1.0)          10;7        3.1337 (1.0)         100         100
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

@bonjourmauko bonjourmauko requested review from benoit-cty, clallemand and a team November 21, 2024 02:58
@bonjourmauko bonjourmauko added the kind:perf A code change that improves performance label Nov 21, 2024
@bonjourmauko
Copy link
Member Author

BTW, when assuming data is correct, performance could be also improved with numba and the like.

Copy link
Contributor

@clallemand clallemand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this fix on one of our (LexImpact) simulations and it seems to works !

@bonjourmauko bonjourmauko merged commit 435ee78 into master Nov 26, 2024
29 checks passed
@bonjourmauko bonjourmauko deleted the perf/fix-enum-perf branch November 26, 2024 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:perf A code change that improves performance
Projects
Development

Successfully merging this pull request may close these issues.

Fix enums performance
2 participants