Skip to content

faster pool allocation scheme #9106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

faster pool allocation scheme #9106

wants to merge 4 commits into from

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Nov 22, 2014

this may be a faster gc allocation scheme, on the premise that it is better not to touch a large number of memory pages if you don't really need to

@JeffBezanson ping. thoughts?

@carnaval
Copy link
Contributor

Nice. I have what looks like a similar change in #8699 (in fact I just changed a few things making it more like this patch since it simplifies things and may very well be faster).
Am I understanding correctly that in this branch some pages will be half linearly allocated (starting right before the first live object) while the other half is in the regular freelist ?

@vtjnash
Copy link
Member Author

vtjnash commented Nov 22, 2014

yes. in particular, this patch remembers the location of the first live object on a page, and uses the regular freelist to remember the location of the holes. it then dynamically sorts pages into two lists: those with only holes (the first live object is at location 0), and those with free space for linear allocation. in returning allocation requests, this patch prefers filling a hole to expanding the allocated region. the previous strategy touched every page to store a pointer chain when you call add_pool. this strategy fills the page from highest address to lowest, so that pages only need to get mapped when they are used.

@JeffBezanson
Copy link
Member

GC changes like this typically need measurement. It's good if you can at least come up with an artificial benchmark that benefits. Then we can check that the new GC does equally well, so we know we're not losing anything by merging that instead of this.

@vtjnash
Copy link
Member Author

vtjnash commented Nov 22, 2014

agreed. however, as best as I can tell, this does equivalently on time make test

perhaps the generalization of this of storing the freelist pointer chain as a (next, len) pair would be better?

i expect that this has a much greater benefit when the pool is large relative to the elements in the pool (e.g. when virtual memory is cheap), since this delays the actual mapping of each page until it is required.

this was inspired by startup profiling blaming that initial pointer chain creation for a large portion of the startup cost. however, that total cost didn't decrease much, so I suspect that the actual cost was due to mapping the memory page, and not significantly due to writing to it.

i really expected that this may be faster in a case where the user is creating and destroying a large number of small boxes: @time for i = 1:1e3; Float64[ i==0 ? 0 : 1.01+i for i = 1:10^4]; end
however, it instead seems to be adding about 5% to the gc time of the above snippet.

@vtjnash vtjnash closed this Dec 13, 2014
@vtjnash vtjnash deleted the jn/faster_pool branch August 27, 2015 04:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants