Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ParMETIS call so it uses all workers in cluster. #877

Merged
merged 2 commits into from
Jun 18, 2024

Conversation

thvasilo
Copy link
Contributor

@thvasilo thvasilo commented Jun 14, 2024

Issue #, if available:

Description of changes:

  • Previously we were calling the pm_dglpart executable through mpirun using only a single/process worker. This PR pulls in a necessary file that parmetis_preprocess.py creates in the wrong path, and fixes the number of workers used to match the number of partitions.

Detailed motivation for the change

The parmetis program implicitly expects a file named <graph-name>_stats.txt to be in the current working directory, the way the call is implemented at least. Actually the reference DGL implementation is wrong, it passes the graph_name whereas pm_dglpart actually expects a prefix path to which it appends _stats.txt. It just so happens that if we prefix the graph_name to a file in the cwd, and name it graph-name_stats.txt pm_dglpart will pick it up and use it.

A more correct solution would be to pass the absolute path to graph-name_stats.txt as the first argument to pm_dglpart (which pm_dglpart refers to as fstem)

You can check out the actual call here

The fstem var is used downstream here

Testing

Tested using Docker compose on ml-25M with 2 partitions.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@thvasilo thvasilo added bug Something isn't working 0.3 ready able to trigger the CI labels Jun 14, 2024
@thvasilo thvasilo added this to the 0.3 release milestone Jun 14, 2024
@thvasilo thvasilo self-assigned this Jun 14, 2024
Copy link
Collaborator

@jalencato jalencato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thvasilo thvasilo merged commit 6e7f25b into awslabs:main Jun 18, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.3 bug Something isn't working ready able to trigger the CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants