-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output of distributed Galois is confusing #389
Comments
Please post your output here.
You can use |
I am suspecting that you may not run your applications on 2 nodes. Please check if your clustering (e.g. SLURM) is being used correctly. If you want to run this on 1 nodes and 2 hosts, you should specify |
Here is my output
The PE 1 seems called PageRank::go without any detailed information. Thanks! |
Thank you. This looks slightly weird to me. Let me try to reproduce your problems. Regarding communication volumes of Gluon, you could enable this flag through CMake: |
Yes, same command. Also I am using oak ridge Summit but it should work as Slurm. I am not sure if that causes the problem but Summit MPI has been working well for my other applications. Here is the result of nvidia-smi:
|
your output looks sane to me, actually: both hosts ([0] and [1] are picking up the GPU local to that host and do you have the stats files? (you can save it to disk with -statFile= else it's outputted to stdout, and |
The stats file, but no stats information on PE 1:
Particularly, I am interested in the load balancing on different processes and the communication volume between processes. From the stats results, I got no information for PE 1. Besides the |
If you want per host timers in the stats file, set GALOIS_PRINT_PER_HOST_STATS=1 when you run the program. Local workload are the InitializeGraph_, PageRank_, etc timers (one for each run); you can get the timer names by |
I didn't find GALOIS_PRINT_PER_HOST_STATS either macro or CMake flag or environmental variable or variable in your master branch code. Could you explain more about setting GALOIS_PRINT_PER_HOST_STATS=1 when I run the program? Thanks in advance! |
Now this flag is
For example, as you can see |
slight correction: the order of appareance of the HostValues does not correspond to the host; e.g. the first 27 isn't necessarily host 0 unfortunately the stats have no way to distinguish which timer belongs to which at the moment |
Thanks a lot for your help! I am able to print our information including workload, time per host. This is convenient, great work! |
It is hard to answer it with only this information. But generally it should be scalable. Pleasse check Glun paper. It includes GPU scalability result. Please check and understand time breakdowns on the stat file. It is possible that communication overhead outweighs computation distributions. |
Are you using |
No, the command I use is: I am converting the twitter40 from the website: https://snap.stanford.edu/data/twitter-2010.html. I will try this dataset and see if it could get a better strong scaling performance. If not, I might do some wrong and I hope I could get help from you guys. Thanks! |
Could you please run with |
I tried the flag
I think the time is this: Here is the single GPU run:
The average time is 1066.3 ms. |
I ran the twitter40, on a single GPU, the runtime is 11238.800000, and on 2 GPUs is 7248.000000 with oec partition. I feel this scaling makes more sense to me. Do you get similar perf? |
Hi Galois Team,
I am trying to run pagerank-push-dist with 2 nodes:
mpirun -n 2 $ROOT/lonestar/analytics/distributed/pagerank/pagerank-push-dist mygraph.gr --num_nodes=2 --partition=oec --pset=g
The output is confusing. It looks to me only process 0 is running.
Besides, for the input, if I would like to use partition ginger-o, how can I get transposed .tgr file?
Thanks!
The text was updated successfully, but these errors were encountered: