Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize DTOs json serialization #3

Open
AndrewKostousov opened this issue Feb 6, 2022 · 2 comments
Open

Optimize DTOs json serialization #3

AndrewKostousov opened this issue Feb 6, 2022 · 2 comments
Labels
help wanted Extra attention is needed

Comments

@AndrewKostousov
Copy link
Member

test_perf_serialize run shows that VektonnBaseModel.json() is a huge bottleneck:

----------------------------- Captured stdout call -----------------------------
         28285893 function calls in 25.275 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   118351   15.430    0.000   21.607    0.000 dtos.py:17(json)
   118351    3.108    0.000    3.522    0.000 test_dtos_perf.py:55(to_idp_fast)
 12781908    2.509    0.000    2.509    0.000 {built-in method _abc._abc_instancecheck}
 12781908    2.122    0.000    4.630    0.000 abc.py:96(__instancecheck__)
   118351    0.691    0.000    0.691    0.000 {orjson.dumps}
   118351    0.414    0.000    0.414    0.000 {method 'tolist' of 'numpy.ndarray' objects}
   118351    0.189    0.000    0.552    0.000 typing.py:802(__getitem__)
   118351    0.130    0.000    0.317    0.000 typing.py:255(inner)
   236702    0.113    0.000    0.1[65](https://github.com/vektonn/vektonn-client-python/runs/5082861448?check_suite_focus=true#step:6:65)    0.000 <frozen importlib._bootstrap>:389(parent)
   118351    0.076    0.000    0.815    0.000 utils.py:13(orjson_dumps)
   23[67](https://github.com/vektonn/vektonn-client-python/runs/5082861448?check_suite_focus=true#step:6:67)02    0.075    0.000    0.104    0.000 typing.py:329(__hash__)
        1    0.071    0.071    3.605    3.605 test_dtos_perf.py:42(construct)
        1    0.063    0.063   21.671   21.671 test_dtos_perf.py:37(serialize)
   118351    0.059    0.000    0.083    0.000 typing.py:720(__hash__)
   355053    0.052    0.000    0.052    0.000 {built-in method builtins.hash}
   236[70](https://github.com/vektonn/vektonn-client-python/runs/5082861448?check_suite_focus=true#step:6:70)2    0.051    0.000    0.051    0.000 {method 'rpartition' of 'str' objects}
   11[83](https://github.com/vektonn/vektonn-client-python/runs/5082861448?check_suite_focus=true#step:6:83)51    0.047    0.000    0.047    0.000 {method 'decode' of 'bytes' objects}
   236702    0.027    0.000    0.027    0.000 {built-in method builtins.isinstance}
   118351    0.019    0.000    0.019    0.000 {built-in method builtins.len}
   118351    0.016    0.000    0.016    0.000 typing.py:1149(cast)
   118351    0.012    0.000    0.012    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 cProfile.py:133(__exit__)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
@AndrewKostousov AndrewKostousov added the help wanted Extra attention is needed label Feb 6, 2022
@BrandesDenis
Copy link

The main reason is that pydantic separately handles each value from nested collections. In this case, we have significant overhead due to the field "coordinates" of the "Vector" model. In this field, pydandic processes each int inside the list field.

To solve this problem, you can write your own dict converter. For example, this mixin adds custom dict convertation to "Vector" model:

class ToDictMixin:
    def dict(
        self,
        *,
        by_alias: bool = False,
        exclude_none: bool = False,
        **kwargs,
    ) -> dict:
        return {
            self.__fields__[field_name].alias if by_alias else field_name: value
            for field_name, value in self
            if value is not None or not exclude_none
        }

class Vector(ToDictMixin, VektonnBaseModel):
    ...

@AndrewKostousov
Copy link
Member Author

AndrewKostousov commented Feb 25, 2022

With such a trick we get:

----------------------------- Captured stdout call -----------------------------
         4260640 function calls in 7.623 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   118351    2.693    0.000    3.057    0.000 test_dtos_perf.py:55(to_idp_fast)
   118351    2.360    0.000    4.423    0.000 dtos.py:16(json)
   118351    0.734    0.000    0.734    0.000 {orjson.dumps}
   118351    0.364    0.000    0.364    0.000 {method 'tolist' of 'numpy.ndarray' objects}
   710106    0.191    0.000    0.191    0.000 {built-in method _abc._abc_instancecheck}
   118351    0.183    0.000    0.542    0.000 typing.py:802(__getitem__)
   710106    0.164    0.000    0.355    0.000 abc.py:96(__instancecheck__)
   118351    0.134    0.000    0.312    0.000 typing.py:255(inner)
   118351    0.110    0.000    0.204    0.000 dtos.py:48(dict)
   118351    0.077    0.000    0.077    0.000 dtos.py:55(<dictcomp>)
        1    0.074    0.074    3.144    3.144 test_dtos_perf.py:42(construct)
   118351    0.074    0.000    0.[85](https://github.com/vektonn/vektonn-client-python/runs/5330785234?check_suite_focus=true#step:6:85)9    0.000 utils.py:25(orjson_dumps)
   236702    0.069    0.000    0.098    0.000 typing.py:329(__hash__)
   118351    0.061    0.000    0.085    0.000 <frozen importlib._bootstrap>:3[89](https://github.com/vektonn/vektonn-client-python/runs/5330785234?check_suite_focus=true#step:6:89)(parent)
   118351    0.057    0.000    0.0[79](https://github.com/vektonn/vektonn-client-python/runs/5330785234?check_suite_focus=true#step:6:79)    0.000 typing.py:720(__hash__)
        1    0.057    0.057    4.479    4.479 test_dtos_perf.py:37(serialize)
   355053    0.051    0.000    0.051    0.000 {built-in method builtins.hash}
   11[83](https://github.com/vektonn/vektonn-client-python/runs/5330785234?check_suite_focus=true#step:6:83)51    0.051    0.000    0.051    0.000 {method 'decode' of 'bytes' objects}
   236702    0.029    0.000    0.029    0.000 {built-in method builtins.isinstance}
   118351    0.024    0.000    0.024    0.000 {method 'rpartition' of 'str' objects}
   118351    0.019    0.000    0.019    0.000 {built-in method builtins.len}
   118351    0.017    0.000    0.017    0.000 typing.py:1149(cast)
   118351    0.016    0.000    0.016    0.000 {method 'items' of 'dict' objects}
   118351    0.012    0.000    0.012    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 cProfile.py:133(__exit__)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants