Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(gpu): general fixes on indexes used in multi-gpu context #1997

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

pdroalves
Copy link
Contributor

@pdroalves pdroalves commented Jan 21, 2025

  • fix a bug in which the wrong GPU may be queried for the max shared memory
  • If multiple streams are running split through multiple GPUs, operations happening on a stream in GPU i should query GPU i about its max shared memory.
  • also fixes wrong indexing at rust side

closes: https://github.com/zama-ai/tfhe-rs-internal/issues/901

PR content/description

Check-list:

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)
  • Relevant issues are marked as resolved/closed, related issues are linked in the description
  • Check for breaking changes (including serialization changes) and add them to commit message following the conventional commit specification

@pdroalves pdroalves requested a review from agnesLeroy January 21, 2025 13:10
@cla-bot cla-bot bot added the cla-signed label Jan 21, 2025
@pdroalves pdroalves force-pushed the pa/fix/independent_streams branch from a6407ec to 1b30c36 Compare January 21, 2025 13:34
@pdroalves pdroalves force-pushed the pa/fix/independent_streams branch from 1b30c36 to 52a24ef Compare January 21, 2025 14:09
@zama-bot zama-bot removed the approved label Jan 21, 2025
@pdroalves pdroalves force-pushed the pa/fix/independent_streams branch from 52a24ef to 8c4d6a6 Compare January 21, 2025 14:46
Copy link

⚠️ This PR contains unsigned commits. To get your PR merged, please sign those commits (git rebase --exec 'git commit -S --amend --no-edit -n' @{upstream}) and force push them to this branch (git push --force-with-lease).

If you're new to commit signing, there are different ways to set it up:

Sign commits with gpg

Follow the steps below to set up commit signing with gpg:

  1. Generate a GPG key
  2. Add the GPG key to your GitHub account
  3. Configure git to use your GPG key for commit signing
Sign commits with ssh-agent

Follow the steps below to set up commit signing with ssh-agent:

  1. Generate an SSH key and add it to ssh-agent
  2. Add the SSH key to your GitHub account
  3. Configure git to use your SSH key for commit signing
Sign commits with 1Password

You can also sign commits using 1Password, which lets you sign commits with biometrics without the signing key leaving the local 1Password process.

Learn how to use 1Password to sign your commits.

Watch the demo

@pdroalves pdroalves force-pushed the pa/fix/independent_streams branch 4 times, most recently from d012f88 to 89a8733 Compare January 21, 2025 19:07
@pdroalves
Copy link
Contributor Author

@agnesLertá maybe we could introduce a new test to cover this scenario. Wdyt?

Copy link
Contributor

@agnesLeroy agnesLeroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pdroalves! Looks good to me thanks for the fix! Yes, we should definitely add a test for it. The thing is, for now only the long run tests are testing the use of a single GPU other than 0. What we could do is add long run tests with a set of classic PBS parameters maybe? Otherwise we could add signed & unsigned tests with a different FnExecutor, in the same fashion as I did for the long run tests. Wdyt? (see the executor here: tfhe/src/integer/gpu/server_key/radix/tests_long_run/mod.rs:34)

@zama-bot zama-bot removed the approved label Jan 24, 2025
@pdroalves pdroalves force-pushed the pa/fix/independent_streams branch 3 times, most recently from b12602f to 6951ebd Compare January 24, 2025 13:59
- fix a bug in which the wrong GPU may be queried for the max shared memory
- If multiple streams are running split through multiple GPUs,
  operations happening on a stream in GPU i should query GPU i about its
max shared memory,
- also fixes wrong indexing at rust side.
…er operations

- multiple operations are issued in parallel running independently on
  different devices,
- tests will only run when more than 1 GPU is available,
- we only test ERC20-related operators: (overflow_) add/sub, cmp, and if_then_else.
@pdroalves pdroalves force-pushed the pa/fix/independent_streams branch from 6951ebd to 8eb4d86 Compare January 27, 2025 11:34
@zama-bot zama-bot removed the approved label Jan 27, 2025
@zama-bot zama-bot removed the approved label Jan 27, 2025
@pdroalves pdroalves force-pushed the pa/fix/independent_streams branch from 9c23e5d to 65b35f0 Compare January 28, 2025 12:43
@zama-bot zama-bot removed the approved label Jan 28, 2025
@pdroalves pdroalves force-pushed the pa/fix/independent_streams branch from 1c68c50 to d1623e7 Compare January 29, 2025 11:34
@zama-bot zama-bot removed the approved label Jan 29, 2025
@agnesLeroy
Copy link
Contributor

@pdroalves I saw the multi-gpu tests passed: maybe you want to merge the cudaSetDevice encapsulation first and then rebase this one on main before merging?

@zama-bot zama-bot removed the approved label Jan 30, 2025
@pdroalves
Copy link
Contributor Author

@agnesLeroy Actually that PR was based on this one.

@pdroalves pdroalves force-pushed the pa/fix/independent_streams branch from 21a0e94 to 082ac40 Compare January 30, 2025 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants