Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Avoid unnecessary gathering of distributed operand #1216

Merged
merged 38 commits into from
Feb 5, 2024
Merged

Enhancement: Avoid unnecessary gathering of distributed operand #1216

merged 38 commits into from
Feb 5, 2024

Conversation

samadpls
Copy link
Contributor

@samadpls samadpls commented Sep 8, 2023

Due Diligence

  • General:
    • base branch must be main for new features, latest release branch (e.g. release/1.3.x) for bug fixes
    • title of the PR is suitable to appear in the Release Notes
  • Implementation:
    • unit tests: all split configurations tested
    • unit tests: multiple dtypes tested
    • documentation updated where needed

Description

Updated __sanitize_close_input logic to resolve issue #1064 regarding unnecessary gathering of distributed operands.

Issue/s resolved: #1064

Changes proposed:

  • optimized the logic in the __sanitize_close_input function to avoid unnecessary gathering of distributed operands.

Type of change

New feature (non-breaking change which adds functionality)

Memory requirements

Memory requirements for this change have not been profiled at this time.

Performance

Performance metrics for this change have not been measured at this time.

Does this change modify the behaviour of other functions? If so, which?

yes

@github-actions
Copy link
Contributor

Thank you for the PR!

Copy link
Collaborator

@mtar mtar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @samadpls. However, your contribution is contrary to the intended behavior of the function and does not address the issue. Perhaps it wasn't written clearly and concisely enough, so I rephrase it again:

What we want for allclose/isclose: If we have a distributed array and a non-distributed array as inputs, slice the undistributed array such that we can compare it with the corresponding local part of the distributed array on the process.

Hope this helps

heat/core/logical.py Outdated Show resolved Hide resolved
heat/core/logical.py Outdated Show resolved Hide resolved
heat/core/logical.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@mtar mtar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a small step into the right direction 🙂

heat/core/logical.py Outdated Show resolved Hide resolved
heat/core/logical.py Outdated Show resolved Hide resolved
heat/core/logical.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ClaudiaComito ClaudiaComito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samadpls here's what we're trying to achieve with this fix.

First of all, all Heat operations are implemented to operate on memory-distributed arrays (DNDarrays) . In practice:

  • operations are performed on many MPI processes, could be many CPUs or many GPUs, many nodes of a supercomputer;
  • if a DNDarray is distributed, each MPI process stores only a slice of that DNDarray (along the DNDarray.split dimension)
  • users can choose whether to distribute which DNDarrays or not (DNDarray.split = None means each process has a copy of the entire DNDarray in memory).

In this PR, you are addressing the case in which logical operations all, allclose etc. compare a distributed DNDarray to a non-distributed one.

Example with small arrays for better understanding:

import numpy as np
import heat as ht

x = ht.arange(30, split=0).reshape(-1, 2)  
y = ht.arange(30).reshape(-1, 2)

print(ht.all(x == y))

This will return True no matter how many MPI processes you run it on.

Try and print out the local arrays (DNDarray.larray) to get a better understanding what the underlying data on each process really are (run this code on more than one process, see here):

rank = x.comm.rank 
print(f"On rank {rank}: local x data: {x.larray}")
print(f"On rank {rank}: local y data: {y.larray}")

In the current implementation, when a distributed x is compared to non-distributed y, x gets "gathered" so it is no longer distributed. But "gathering" all the slices of a distributed array onto each process is communication- and memory-intensive and, in this case, unnecessary. Each process can simply compare the local slice of x (x.larray) to the corresponding slice of y and, at the end, only communicate whether the slices on each process match or not.

Feel free to ask if anything is unclear.

heat/core/logical.py Outdated Show resolved Hide resolved
@samadpls
Copy link
Contributor Author

samadpls commented Oct 9, 2023

I apologize for the delay in responding. I've been on vacation and traveling for the past 22 days, and I just landed few hours ago. I'll definitely get back to work and address the PR comments. Thanks for your understanding!

@ClaudiaComito
Copy link
Contributor

I apologize for the delay in responding. I've been on vacation and traveling for the past 22 days, and I just landed few hours ago. I'll definitely get back to work and address the PR comments. Thanks for your understanding!

No problem at all @samadpls, thanks for your time!

Copy link
Contributor

Thank you for the PR!

Copy link
Contributor

@ClaudiaComito ClaudiaComito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @samadpls , two more small changes and we're good to go.

heat/core/logical.py Outdated Show resolved Hide resolved
if x.split is not None and y.split is None and y.ndim > 0:
t2 = factories.array(y.larray, device=x.device, split=x.split)
x, t2 = sanitation.sanitize_distribution(x, t2, target=x)
return x, t2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're missing the same check for y distributed, x non distributed I think. Otherwise we're good to go.

Copy link
Contributor

Thank you for the PR!

Copy link
Contributor

@ClaudiaComito ClaudiaComito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We made it, thanks a lot @samadpls !

@samadpls
Copy link
Contributor Author

Thank you so much @ClaudiaComito, I couldn't have done it without your assistance.

@ClaudiaComito
Copy link
Contributor

Thank you so much @ClaudiaComito, I couldn't have done it without your assistance.

My pleasure.

The CI is failing, but this has nothing to do with your PR, we've been working on it in #1313 and #1314 . There's nothing you can do about it, #1314 needs to be merged before everything else.

We're about to start the Christmas break so things will be grinding to a halt until early January. Happy holidays! 🎄

Copy link
Contributor

github-actions bot commented Jan 8, 2024

Thank you for the PR!

Copy link
Contributor

github-actions bot commented Feb 5, 2024

Thank you for the PR!

Copy link
Contributor

github-actions bot commented Feb 5, 2024

Thank you for the PR!

Copy link
Contributor

github-actions bot commented Feb 5, 2024

Thank you for the PR!

@ClaudiaComito ClaudiaComito merged commit 42733ab into helmholtz-analytics:main Feb 5, 2024
50 checks passed
Copy link
Contributor

github-actions bot commented Feb 5, 2024

Backport failed for release/1.3.x, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin release/1.3.x
git worktree add -d .worktree/backport-1216-to-release/1.3.x origin/release/1.3.x
cd .worktree/backport-1216-to-release/1.3.x
git switch --create backport-1216-to-release/1.3.x
git cherry-pick -x 9d093b2b1b971320b6dbfa8d32af450cf414e22b 7b7a19a16069af11e54e6eabb94ed41ec68eb1e2 f8133118af2e4898f25cd7846034acc750484301 bdf62df2ef726fa9e97c5d54846ba2aee4b0a1bb bde9d0e5b34243c75c64ed0840cf053963e01f07 0e8060188f6c8c258badb691c272671f153f3574 b5bf608e0b34167d93e924ae35cd1b97ed8e4507 12c67ecb93c76ae457ae544845a354db4d8430a7 0ebe1ced2fb22bbbe6ab14999cc7f46f44fabf52 e5dc1213e878c0a1927365be8eff30979ff1d1ff e8d45d4a01d22c90a45c1a0089bbf0cad789b4b1 ae3f969828f1b9514ddbbe82dbca8ee37f3e6e54 43e77d81d8b41c488237566f1d340aff60cf5aaa 38f4314a9b721720453b00cf3927ab1384fae8a5 aa555d5a6747f5c9e6d97fda12c0e68f2443b835 52cb055e3dab9518a5c84a91a36070b028c461a7 d0bf3f3e5899688f28cbfbb96bddbba5a2c420ef 60fb1cddc6a414a8d9382b1c54d6366758b52bb6 3d1fe15cd2539b36beb4610bc08296843f765ee7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid unnecessary gathering of distributed operand in mixed distributed/non-distributed logical functions
4 participants