Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to specify parameters to allocate distributed memory #293

Open
guowentian opened this issue Oct 14, 2016 · 4 comments
Open

how to specify parameters to allocate distributed memory #293

guowentian opened this issue Oct 14, 2016 · 4 comments

Comments

@guowentian
Copy link

Hi, I run the pagerank in application/ graphlab/ pagerank. The performance is too fast that I suspect that the memory is allocated locally and there is only one worker.
I run the pagerank on a graph two times, the first with 1 node and the second time with 10 nodes. But the shared memory breakdown in the output of these two runs are the same. For example, "node total" is 31 G of these two runs; "locale shared heap total" is 18G, etc..
How come when I run it with 10 nodes, the total shared memory is still the same ?

@guowentian guowentian changed the title how to allocate distributed memory how to specify parameters to allocate distributed memory Oct 14, 2016
@guowentian
Copy link
Author

These are the commands I use in the two runs:
mpirun --n 1 --npernode 1 --hostfile hosts -- applications/graphlab/pagerank.exe --path=graph.txt --format=tsv --locale_shared_fraction 0.6 --global_heap_fraction 0.7
mpirun --n 10 --npernode 1 --hostfile hosts -- applications/graphlab/pagerank.exe --path=graph.txt --format=tsv --locale_shared_fraction 0.6 --global_heap_fraction 0.7

@mengke-mk
Copy link

huh..., Actually I have the same problem when I try to run a 27 scale graph500 on 10 nodes with each of 15 GB memory. It seems that Grappa just allocate all memory on node 0, here is some hint:

https://github.com/uwsampa/grappa/blob/master/system/GlobalAllocator.hpp#L86

that is odd...

@nelsonje
Copy link
Member

nelsonje commented Dec 8, 2016

That piece of code is just saying that core 0 runs the code to allocate a region of the shared address space. The actual storage backing this address space is allocated when the Grappa program starts, and it is always striped across all the cores in the cluster. If this is not happening, it's most likely a problem with your MPI installation keeping your cores from talking to each other.

Usually the best way to debug this is to run the Grappa hello_world program. If it prints "hello from core 0 of 1" over and over again there's an MPI problem, unrelated to Grappa. If it prints "hello from core n of m" where n and m make sense, then there's some other problem.

And to respond to the question from October: the stats printed when Grappa starts are all per-node stats for us to use in debugging internal data structures, so you won't expect them to change significantly as you add more nodes. I don't believe it prints a total amount of shared memory across all the nodes without you requesting it.

@mengke-mk
Copy link

yeah, nelsonje is right, after having a close look at that piece of code, now I understand the magic of your PGAS...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants