-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTODO
38 lines (28 loc) · 1.19 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
x Map large pages
x Already done (check via tracefs)
x Copy keys to DRAM first
x Huge speedup
x Use stdio to read/write records
x also speedup
x Or PMDK with non-temporal writes of 100B size?
x Made it worse
* Replace qsort/memcmp with something else?
* AVX-512 sort? Didn't find one that can treat arbitrary keys (only
uint32_t, uint64_t).
x Split keys from pointers in DRAM
x Made it slightly slower
x Reduce pointer size
x Small speedup
x Allow non-power-of-two number of mappers
* qsort file-by-file (as we read them) and then merge-sort the lot?
* To overlap with the read phase. Not sure if useful (read phase is very short).
x Look at membw_stdio to see if we're conducting IO efficiently
x We are. 7.2 GB/s read & write throughput.
x qsort on NVM (to overlap compute with IO)
x Slower. Probably latency bound.
x partition with fread didn't make a perf difference versus mmap
x Use quickselect instead of sort to find lowest elements as quickly
as possible. Use to overlap with writing (current bottleneck)
x Made it slightly slower
* Maybe start another thread to write the data asynchronously?
* Use async IO? (not sure if that's supported with NVM)