Optimize DTOs json serialization #3

AndrewKostousov · 2022-02-06T10:26:20Z

test_perf_serialize run shows that VektonnBaseModel.json() is a huge bottleneck:

----------------------------- Captured stdout call -----------------------------
         28285893 function calls in 25.275 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   118351   15.430    0.000   21.607    0.000 dtos.py:17(json)
   118351    3.108    0.000    3.522    0.000 test_dtos_perf.py:55(to_idp_fast)
 12781908    2.509    0.000    2.509    0.000 {built-in method _abc._abc_instancecheck}
 12781908    2.122    0.000    4.630    0.000 abc.py:96(__instancecheck__)
   118351    0.691    0.000    0.691    0.000 {orjson.dumps}
   118351    0.414    0.000    0.414    0.000 {method 'tolist' of 'numpy.ndarray' objects}
   118351    0.189    0.000    0.552    0.000 typing.py:802(__getitem__)
   118351    0.130    0.000    0.317    0.000 typing.py:255(inner)
   236702    0.113    0.000    0.1[65](https://github.com/vektonn/vektonn-client-python/runs/5082861448?check_suite_focus=true#step:6:65)    0.000 <frozen importlib._bootstrap>:389(parent)
   118351    0.076    0.000    0.815    0.000 utils.py:13(orjson_dumps)
   23[67](https://github.com/vektonn/vektonn-client-python/runs/5082861448?check_suite_focus=true#step:6:67)02    0.075    0.000    0.104    0.000 typing.py:329(__hash__)
        1    0.071    0.071    3.605    3.605 test_dtos_perf.py:42(construct)
        1    0.063    0.063   21.671   21.671 test_dtos_perf.py:37(serialize)
   118351    0.059    0.000    0.083    0.000 typing.py:720(__hash__)
   355053    0.052    0.000    0.052    0.000 {built-in method builtins.hash}
   236[70](https://github.com/vektonn/vektonn-client-python/runs/5082861448?check_suite_focus=true#step:6:70)2    0.051    0.000    0.051    0.000 {method 'rpartition' of 'str' objects}
   11[83](https://github.com/vektonn/vektonn-client-python/runs/5082861448?check_suite_focus=true#step:6:83)51    0.047    0.000    0.047    0.000 {method 'decode' of 'bytes' objects}
   236702    0.027    0.000    0.027    0.000 {built-in method builtins.isinstance}
   118351    0.019    0.000    0.019    0.000 {built-in method builtins.len}
   118351    0.016    0.000    0.016    0.000 typing.py:1149(cast)
   118351    0.012    0.000    0.012    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 cProfile.py:133(__exit__)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

The text was updated successfully, but these errors were encountered:

BrandesDenis · 2022-02-24T15:04:10Z

The main reason is that pydantic separately handles each value from nested collections. In this case, we have significant overhead due to the field "coordinates" of the "Vector" model. In this field, pydandic processes each int inside the list field.

To solve this problem, you can write your own dict converter. For example, this mixin adds custom dict convertation to "Vector" model:

class ToDictMixin:
    def dict(
        self,
        *,
        by_alias: bool = False,
        exclude_none: bool = False,
        **kwargs,
    ) -> dict:
        return {
            self.__fields__[field_name].alias if by_alias else field_name: value
            for field_name, value in self
            if value is not None or not exclude_none
        }

class Vector(ToDictMixin, VektonnBaseModel):
    ...

AndrewKostousov · 2022-02-25T08:28:04Z

With such a trick we get:

----------------------------- Captured stdout call -----------------------------
         4260640 function calls in 7.623 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   118351    2.693    0.000    3.057    0.000 test_dtos_perf.py:55(to_idp_fast)
   118351    2.360    0.000    4.423    0.000 dtos.py:16(json)
   118351    0.734    0.000    0.734    0.000 {orjson.dumps}
   118351    0.364    0.000    0.364    0.000 {method 'tolist' of 'numpy.ndarray' objects}
   710106    0.191    0.000    0.191    0.000 {built-in method _abc._abc_instancecheck}
   118351    0.183    0.000    0.542    0.000 typing.py:802(__getitem__)
   710106    0.164    0.000    0.355    0.000 abc.py:96(__instancecheck__)
   118351    0.134    0.000    0.312    0.000 typing.py:255(inner)
   118351    0.110    0.000    0.204    0.000 dtos.py:48(dict)
   118351    0.077    0.000    0.077    0.000 dtos.py:55(<dictcomp>)
        1    0.074    0.074    3.144    3.144 test_dtos_perf.py:42(construct)
   118351    0.074    0.000    0.[85](https://github.com/vektonn/vektonn-client-python/runs/5330785234?check_suite_focus=true#step:6:85)9    0.000 utils.py:25(orjson_dumps)
   236702    0.069    0.000    0.098    0.000 typing.py:329(__hash__)
   118351    0.061    0.000    0.085    0.000 <frozen importlib._bootstrap>:3[89](https://github.com/vektonn/vektonn-client-python/runs/5330785234?check_suite_focus=true#step:6:89)(parent)
   118351    0.057    0.000    0.0[79](https://github.com/vektonn/vektonn-client-python/runs/5330785234?check_suite_focus=true#step:6:79)    0.000 typing.py:720(__hash__)
        1    0.057    0.057    4.479    4.479 test_dtos_perf.py:37(serialize)
   355053    0.051    0.000    0.051    0.000 {built-in method builtins.hash}
   11[83](https://github.com/vektonn/vektonn-client-python/runs/5330785234?check_suite_focus=true#step:6:83)51    0.051    0.000    0.051    0.000 {method 'decode' of 'bytes' objects}
   236702    0.029    0.000    0.029    0.000 {built-in method builtins.isinstance}
   118351    0.024    0.000    0.024    0.000 {method 'rpartition' of 'str' objects}
   118351    0.019    0.000    0.019    0.000 {built-in method builtins.len}
   118351    0.017    0.000    0.017    0.000 typing.py:1149(cast)
   118351    0.016    0.000    0.016    0.000 {method 'items' of 'dict' objects}
   118351    0.012    0.000    0.012    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 cProfile.py:133(__exit__)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

AndrewKostousov added the help wanted Extra attention is needed label Feb 6, 2022

AndrewKostousov added a commit that referenced this issue Feb 25, 2022

speed up VektonnBaseModel.json() for Vector model (see #3)

59571a3

AndrewKostousov added a commit that referenced this issue Feb 25, 2022

speed up VektonnBaseModel.json() for Vector model (see #3)

28a9922

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize DTOs json serialization #3

Optimize DTOs json serialization #3

AndrewKostousov commented Feb 6, 2022

BrandesDenis commented Feb 24, 2022

AndrewKostousov commented Feb 25, 2022 •

edited

Loading

Optimize DTOs json serialization #3

Optimize DTOs json serialization #3

Comments

AndrewKostousov commented Feb 6, 2022

BrandesDenis commented Feb 24, 2022

AndrewKostousov commented Feb 25, 2022 • edited Loading

AndrewKostousov commented Feb 25, 2022 •

edited

Loading