Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WDB end of day merge performance issues in partbyenum mode #693

Open
jonathonmcmurray opened this issue Feb 7, 2025 · 0 comments
Open

Comments

@jonathonmcmurray
Copy link
Member

jonathonmcmurray commented Feb 7, 2025

It has been reported that when using partbyenum mode in the WDB, the end of day merge can be quite slow. Specifically it seems to be quite slow when we read multiple partitions at once, and much faster if we're only reading one e.g. compare parts 1185+638 (taking ~44 secs) to part 869 (taking ~0.001s) below with similar record counts

2025.02.07D00:01:02.022613000|host|sort|sort1|INF|merge|reading partition/partitions :f:/kdb/wdbhdb/2025.02.06/1185/mt4quote, :f:/kdb/wdbhdb/2025.02.06/638/mt4quote
2025.02.07D00:01:46.351574000|host|sort|sort1|INF|resort|Checking that the contents of this subpartition conform
2025.02.07D00:01:46.351617000|host|sort|sort1|INF|getextraparttype|parted attribute p not found in sort.csv for mt4quote table, using default instead
2025.02.07D00:01:46.399757000|host|sort|sort1|INF|merge|upserting 8566540 rows to :F:/KDB/hdb/2025.02.06/mt4quote/
2025.02.07D00:01:52.031659000|host|sort|sort1|INF|merge|reading partition/partitions :f:/kdb/wdbhdb/2025.02.06/307/mt4quote
2025.02.07D00:01:52.032956000|host|sort|sort1|INF|resort|Checking that the contents of this subpartition conform
2025.02.07D00:01:52.032992000|host|sort|sort1|INF|getextraparttype|parted attribute p not found in sort.csv for mt4quote table, using default instead
2025.02.07D00:01:55.242509000|host|sort|sort1|INF|merge|upserting 4213054 rows to :F:/KDB/hdb/2025.02.06/mt4quote/
2025.02.07D00:02:06.699865000|host|sort|sort1|INF|merge|reading partition/partitions :f:/kdb/wdbhdb/2025.02.06/869/mt4quote
2025.02.07D00:02:06.701207000|host|sort|sort1|INF|resort|Checking that the contents of this subpartition conform
2025.02.07D00:02:06.701237000|host|sort|sort1|INF|getextraparttype|parted attribute p not found in sort.csv for mt4quote table, using default instead
2025.02.07D00:02:12.820915000|host|sort|sort1|INF|merge|upserting 8576974 rows to :F:/KDB/hdb/2025.02.06/mt4quote/
2025.02.07D00:02:31.768208000|host|sort|sort1|INF|merge|reading partition/partitions :f:/kdb/wdbhdb/2025.02.06/1195/mt4quote, :f:/kdb/wdbhdb/2025.02.06/156/mt4quote

Seems like it might be better (in this case, at least) to read every partition separately - we should do some testing of the performance here and potentially rethink our approach to batching in .merge.getpartchunks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant