Skip to content

WDB end of day merge performance issues in partbyenum mode #693

Closed
@jonathonmcmurray

Description

@jonathonmcmurray

It has been reported that when using partbyenum mode in the WDB, the end of day merge can be quite slow. Specifically it seems to be quite slow when we read multiple partitions at once, and much faster if we're only reading one e.g. compare parts 1185+638 (taking ~44 secs) to part 869 (taking ~0.001s) below with similar record counts

2025.02.07D00:01:02.022613000|host|sort|sort1|INF|merge|reading partition/partitions :f:/kdb/wdbhdb/2025.02.06/1185/mt4quote, :f:/kdb/wdbhdb/2025.02.06/638/mt4quote
2025.02.07D00:01:46.351574000|host|sort|sort1|INF|resort|Checking that the contents of this subpartition conform
2025.02.07D00:01:46.351617000|host|sort|sort1|INF|getextraparttype|parted attribute p not found in sort.csv for mt4quote table, using default instead
2025.02.07D00:01:46.399757000|host|sort|sort1|INF|merge|upserting 8566540 rows to :F:/KDB/hdb/2025.02.06/mt4quote/
2025.02.07D00:01:52.031659000|host|sort|sort1|INF|merge|reading partition/partitions :f:/kdb/wdbhdb/2025.02.06/307/mt4quote
2025.02.07D00:01:52.032956000|host|sort|sort1|INF|resort|Checking that the contents of this subpartition conform
2025.02.07D00:01:52.032992000|host|sort|sort1|INF|getextraparttype|parted attribute p not found in sort.csv for mt4quote table, using default instead
2025.02.07D00:01:55.242509000|host|sort|sort1|INF|merge|upserting 4213054 rows to :F:/KDB/hdb/2025.02.06/mt4quote/
2025.02.07D00:02:06.699865000|host|sort|sort1|INF|merge|reading partition/partitions :f:/kdb/wdbhdb/2025.02.06/869/mt4quote
2025.02.07D00:02:06.701207000|host|sort|sort1|INF|resort|Checking that the contents of this subpartition conform
2025.02.07D00:02:06.701237000|host|sort|sort1|INF|getextraparttype|parted attribute p not found in sort.csv for mt4quote table, using default instead
2025.02.07D00:02:12.820915000|host|sort|sort1|INF|merge|upserting 8576974 rows to :F:/KDB/hdb/2025.02.06/mt4quote/
2025.02.07D00:02:31.768208000|host|sort|sort1|INF|merge|reading partition/partitions :f:/kdb/wdbhdb/2025.02.06/1195/mt4quote, :f:/kdb/wdbhdb/2025.02.06/156/mt4quote

Seems like it might be better (in this case, at least) to read every partition separately - we should do some testing of the performance here and potentially rethink our approach to batching in .merge.getpartchunks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions