faster pool allocation scheme #9106

vtjnash · 2014-11-22T08:03:55Z

this may be a faster gc allocation scheme, on the premise that it is better not to touch a large number of memory pages if you don't really need to

@JeffBezanson ping. thoughts?

carnaval · 2014-11-22T14:57:13Z

Nice. I have what looks like a similar change in #8699 (in fact I just changed a few things making it more like this patch since it simplifies things and may very well be faster).
Am I understanding correctly that in this branch some pages will be half linearly allocated (starting right before the first live object) while the other half is in the regular freelist ?

vtjnash · 2014-11-22T17:14:57Z

yes. in particular, this patch remembers the location of the first live object on a page, and uses the regular freelist to remember the location of the holes. it then dynamically sorts pages into two lists: those with only holes (the first live object is at location 0), and those with free space for linear allocation. in returning allocation requests, this patch prefers filling a hole to expanding the allocated region. the previous strategy touched every page to store a pointer chain when you call add_pool. this strategy fills the page from highest address to lowest, so that pages only need to get mapped when they are used.

JeffBezanson · 2014-11-22T19:55:39Z

GC changes like this typically need measurement. It's good if you can at least come up with an artificial benchmark that benefits. Then we can check that the new GC does equally well, so we know we're not losing anything by merging that instead of this.

…mpletely empty/new

vtjnash · 2014-11-22T23:36:42Z

agreed. however, as best as I can tell, this does equivalently on time make test

perhaps the generalization of this of storing the freelist pointer chain as a (next, len) pair would be better?

i expect that this has a much greater benefit when the pool is large relative to the elements in the pool (e.g. when virtual memory is cheap), since this delays the actual mapping of each page until it is required.

this was inspired by startup profiling blaming that initial pointer chain creation for a large portion of the startup cost. however, that total cost didn't decrease much, so I suspect that the actual cost was due to mapping the memory page, and not significantly due to writing to it.

i really expected that this may be faster in a case where the user is creating and destroying a large number of small boxes: @time for i = 1:1e3; Float64[ i==0 ? 0 : 1.01+i for i = 1:10^4]; end
however, it instead seems to be adding about 5% to the gc time of the above snippet.

vtjnash added 2 commits November 22, 2014 01:52

WIP: faster pool allocation

33e3821

manage newpools/fullpools distinction more efficiently after a gc-cycle

4067cf3

fix memory tracking error in new pool allocation scheme

9eb2b6c

fix bug in newpools where we wouldn't reuse full pages that became co…

0be546d

…mpletely empty/new

vtjnash closed this Dec 13, 2014

vtjnash deleted the jn/faster_pool branch August 27, 2015 04:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

faster pool allocation scheme #9106

faster pool allocation scheme #9106

Uh oh!

vtjnash commented Nov 22, 2014

Uh oh!

carnaval commented Nov 22, 2014

Uh oh!

vtjnash commented Nov 22, 2014

Uh oh!

JeffBezanson commented Nov 22, 2014

Uh oh!

vtjnash commented Nov 22, 2014

Uh oh!

Uh oh!

Uh oh!

faster pool allocation scheme #9106

faster pool allocation scheme #9106

Uh oh!

Conversation

vtjnash commented Nov 22, 2014

Uh oh!

carnaval commented Nov 22, 2014

Uh oh!

vtjnash commented Nov 22, 2014

Uh oh!

JeffBezanson commented Nov 22, 2014

Uh oh!

vtjnash commented Nov 22, 2014

Uh oh!

Uh oh!