Description
Overview
From research into KDP deserialization performance:
I think this graph of the KDP prod database IO is interesting. Currently, a sync is deserializing. The units in the graph are seconds of IO Wait per second. So I interpret that anything over 1 s/s essentially means the IO Wait is compounding. It's almost entirely reads, and so it seems this is likely caused by sequential reads of the morango store (without index). The query that I see filters on several columns, and so a full compound index would be gigantic, but a targeted index on the same columns with the constraint on dirty_bit = True should theoretically be plenty small with frequent deserializations. And it should speed it up a lot if the planner decides it's able to use the index
Since we dropped support for anything older than Django 3, we can now add a conditional index through Django
Description and outcomes
- Add a conditional index on
profile
,model_name
,partition
, anddirty_bit
with the condition thatdirty_bit = True
- Ensure this works with SQLite with the test suite we added to test older versions of SQLite
SELECT "morango_store"."profile", "morango_store"."serialized", "morango_store"."deleted", "morango_store"."hard_deleted", "morango_store"."last_saved_instance", "morango_store"."last_saved_counter", "morango_store"."partition", "morango_store"."source_id", "morango_store"."model_name", "morango_store"."conflicting_serialized_data", "morango_store"."_self_ref_fk", "morango_store"."id", "morango_store"."dirty_bit", "morango_store"."deserialization_error", "morango_store"."last_transfer_session_id" FROM "morango_store" WHERE ("morango_store"."profile" = 'facilitydata' AND "morango_store"."model_name" = 'attemptlog' AND "morango_store"."partition"::text LIKE '<REDACTED>%' AND "morango_store"."dirty_bit" = true)