Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Narek/ezra work mem #205

Merged
merged 11 commits into from
Oct 15, 2023
Merged

Narek/ezra work mem #205

merged 11 commits into from
Oct 15, 2023

Conversation

Ngalstyan4
Copy link
Contributor

@Ngalstyan4 Ngalstyan4 commented Oct 15, 2023

[Copied from Ezra's pr (#200):

This adds checks for the work_mem and maintenance_work_mem GUC variables. Checks for both are fairly uncommon in the upstream extensions. It seems like maintenance_work_mem is checked more often than work_mem this may be because it's greater size makes it less prohibitive to respect. Neither is actually enforced by postgres. I'll list some examples below

maintenance_work_mem is referenced more infrequently but its use is pretty straightforward when it is
in src/backend/access/gin/gininsert.c in the build callback function maintenance_work_mem is checked

if (buildstate->accum.allocatedMemory >= (Size) maintenance_work_mem * 1024L)

similarly nbtree checks it in its parallel build functions src/backend/access/nbtree/nbtsort.c

 sortmem = maintenance_work_mem / btshared->scantuplesortstates;
 _bt_parallel_scan_and_sort(btspool, btspool2, btshared, sharedsort,
                                               sharedsort2, sortmem, false);

It is also checked in src/backend/commands/vacuumparallel.c. It is never checked in contrib. I think the bloom in the earlier listing refers to an internal bloomfilter, not the extension. Notably though pgvector does have a check

work_mem is also checked very infrequently although within the optimizer/executor there are a number of checks.
in src/backend/access/gin/ginfast.c it gets checked

workMemory = work_mem;
...
/*
* Is it time to flush memory to disk?  Flush if we are at the end of
* the pending list, or if we have a full row and memory is getting
* full.
*/
if (GinPageGetOpaque(page)->rightlink == InvalidBlockNumber ||
    (GinPageHasFullRow(page) &&
    (accum.allocatedMemory >= workMemory * 1024L)))

it gets checked in src/backend/access/nbtree/nbtpage.c as well, albeit in a function that is only called during vacuums

maxbufsize = (work_mem * 1024L) / sizeof(BTPendingFSM);
maxbufsize = Min(maxbufsize, INT_MAX);
maxbufsize = Min(maxbufsize, MaxAllocSize / sizeof(BTPendingFSM));
/* Stay sane with small work_mem */
maxbufsize = Max(maxbufsize, vstate->bufsize);
vstate->maxbufsize = maxbufsize;

I have however found the following calling pattern in several places including contrib/tablefunc/tablefunc.c, contrib/dblink/dblink.c, contrib/adminpack/adminpack.c and also pgvector (albeit only in ivfscans)

tuplestore_begin_heap(random_access, false, work_mem);

this seems to be a data structure that holds tuples to be returned by a scan. It doesn't account for memory allocated elsewhere though. Overall the lack of enforcement seems to make checking these values somewhat uncommon. I think it makes sense to enforce maintenance_work_mem because building an index is relatively infrequent, but maybe enforcing runtime checks for work_mem is overkill

@codecov
Copy link

codecov bot commented Oct 15, 2023

Codecov Report

Merging #205 (39f34ca) into main (3474f82) will increase coverage by 0.00%.
Report is 6 commits behind head on main.
The diff coverage is 85.71%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #205   +/-   ##
=======================================
  Coverage   83.05%   83.06%           
=======================================
  Files          18       18           
  Lines        1216     1234   +18     
  Branches      261      264    +3     
=======================================
+ Hits         1010     1025   +15     
- Misses         81       83    +2     
- Partials      125      126    +1     
Files Coverage Δ
src/hnsw/external_index.c 91.12% <100.00%> (+0.03%) ⬆️
src/hnsw/insert.c 82.53% <100.00%> (+0.28%) ⬆️
src/hnsw/scan.c 81.37% <100.00%> (+0.37%) ⬆️
src/hnsw/utils.h 40.00% <ø> (ø)
src/hnsw/utils.c 88.00% <90.90%> (+2.28%) ⬆️
src/hnsw/build.c 80.32% <60.00%> (-0.79%) ⬇️

@github-actions
Copy link

Benchmarks

metric old new pct change
recall (after create) 0.740 0.740 -
recall (after insert) 0.750 0.764 +1.87%
select bulk tps 523.505 486.206 -7.12%
select bulk latency (ms) 14.463 15.562 +7.60%
select bulk latency (stddev ms) 2.168 2.963 +36.67%
create latency (ms) 1212.732 1212.992 +0.02%
insert bulk tps 12.198 11.131 -8.75%
insert bulk latency (ms) 81.974 89.830 +9.58%
insert bulk latency (stddev ms) 3.409 3.710 +8.83%
disk usage (bytes) 6348800.000 6348800.000 -

@Ngalstyan4 Ngalstyan4 force-pushed the narek/ezra_work_mem branch 3 times, most recently from c3c856b to ae27f6f Compare October 15, 2023 23:14
@Ngalstyan4 Ngalstyan4 merged commit 00c5b41 into main Oct 15, 2023
26 of 27 checks passed
@Ngalstyan4 Ngalstyan4 deleted the narek/ezra_work_mem branch October 15, 2023 23:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants