Closed
Description
It has been reported that when using partbyenum mode in the WDB, the end of day merge can be quite slow. Specifically it seems to be quite slow when we read multiple partitions at once, and much faster if we're only reading one e.g. compare parts 1185+638 (taking ~44 secs) to part 869 (taking ~0.001s) below with similar record counts
2025.02.07D00:01:02.022613000|host|sort|sort1|INF|merge|reading partition/partitions :f:/kdb/wdbhdb/2025.02.06/1185/mt4quote, :f:/kdb/wdbhdb/2025.02.06/638/mt4quote
2025.02.07D00:01:46.351574000|host|sort|sort1|INF|resort|Checking that the contents of this subpartition conform
2025.02.07D00:01:46.351617000|host|sort|sort1|INF|getextraparttype|parted attribute p not found in sort.csv for mt4quote table, using default instead
2025.02.07D00:01:46.399757000|host|sort|sort1|INF|merge|upserting 8566540 rows to :F:/KDB/hdb/2025.02.06/mt4quote/
2025.02.07D00:01:52.031659000|host|sort|sort1|INF|merge|reading partition/partitions :f:/kdb/wdbhdb/2025.02.06/307/mt4quote
2025.02.07D00:01:52.032956000|host|sort|sort1|INF|resort|Checking that the contents of this subpartition conform
2025.02.07D00:01:52.032992000|host|sort|sort1|INF|getextraparttype|parted attribute p not found in sort.csv for mt4quote table, using default instead
2025.02.07D00:01:55.242509000|host|sort|sort1|INF|merge|upserting 4213054 rows to :F:/KDB/hdb/2025.02.06/mt4quote/
2025.02.07D00:02:06.699865000|host|sort|sort1|INF|merge|reading partition/partitions :f:/kdb/wdbhdb/2025.02.06/869/mt4quote
2025.02.07D00:02:06.701207000|host|sort|sort1|INF|resort|Checking that the contents of this subpartition conform
2025.02.07D00:02:06.701237000|host|sort|sort1|INF|getextraparttype|parted attribute p not found in sort.csv for mt4quote table, using default instead
2025.02.07D00:02:12.820915000|host|sort|sort1|INF|merge|upserting 8576974 rows to :F:/KDB/hdb/2025.02.06/mt4quote/
2025.02.07D00:02:31.768208000|host|sort|sort1|INF|merge|reading partition/partitions :f:/kdb/wdbhdb/2025.02.06/1195/mt4quote, :f:/kdb/wdbhdb/2025.02.06/156/mt4quote
Seems like it might be better (in this case, at least) to read every partition separately - we should do some testing of the performance here and potentially rethink our approach to batching in .merge.getpartchunks
Metadata
Metadata
Assignees
Labels
No labels