Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(hlapi): add gpu selection #1990

Merged
merged 1 commit into from
Jan 29, 2025
Merged

feat(hlapi): add gpu selection #1990

merged 1 commit into from
Jan 29, 2025

Conversation

tmontaigu
Copy link
Contributor

@tmontaigu tmontaigu commented Jan 17, 2025

closes: please link all relevant issues

PR content/description

Check-list:

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)
  • Relevant issues are marked as resolved/closed, related issues are linked in the description
  • Check for breaking changes (including serialization changes) and add them to commit message following the conventional commit specification

This change is Reviewable

@cla-bot cla-bot bot added the cla-signed label Jan 17, 2025
Copy link
Member

@IceTDrinker IceTDrinker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, code looks fine, don't know about the general logic and per thread streams, I'm not familiar enough with the GPU I believe

tfhe/src/core_crypto/gpu/vec.rs Outdated Show resolved Hide resolved
tfhe/src/high_level_api/integers/unsigned/inner.rs Outdated Show resolved Hide resolved
tfhe/src/high_level_api/integers/unsigned/base.rs Outdated Show resolved Hide resolved
Comment on lines +151 to +155
if ct.gpu_indexes() == streams.gpu_indexes() {
MaybeCloned::Borrowed(ct.as_ref())
} else {
MaybeCloned::Cloned(ct.duplicate(streams).0)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seen this pattern a few times, could it be a function without it being annoying to write ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is indeed a bit annoying to rewrite as in each case ct is a different type, so some kind of trait would be needed, so its not that interesting

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok no worries

tfhe/src/high_level_api/booleans/inner.rs Outdated Show resolved Hide resolved
tfhe/src/high_level_api/global_state.rs Show resolved Hide resolved
tfhe/src/core_crypto/gpu/vec.rs Outdated Show resolved Hide resolved
tfhe/src/core_crypto/gpu/vec.rs Outdated Show resolved Hide resolved
tfhe/src/core_crypto/gpu/vec.rs Outdated Show resolved Hide resolved
@@ -334,6 +344,22 @@ impl CudaServerKey {
pub(crate) fn message_modulus(&self) -> crate::shortint::MessageModulus {
self.key.key.message_modulus
}

pub(crate) fn pbs_key(&self) -> &crate::integer::gpu::CudaServerKey {
Copy link
Contributor

@mayeul-zama mayeul-zama Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the name we could use here
Maybe key or server_key instead

Just noticed the name pbs_key is used in the cpu code.
It's not public so we could also rename it

Btw it seems that this function is used for code in HL array but not in other places
There are still many cuda_key.key.key which could use the same pattern

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key feels too simple, computation_key could add more meaning, but its longer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about server_key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that could work yes

use std::cell::LazyCell;

thread_local! {
pub(crate) static CUDA_STREAMS: RefCell<CudaStreams> = RefCell::new(CudaStreams::new_multi_gpu());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it needed when there's already POOL which has a field multi: LazyCell<CudaStreams>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I hesitated, this can be removed, it would mean the with_thread_local_cuda_stream would then need to look at the key's gpu_indexes and return the corresponding stream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But as it is now in the PR, with_thread_local_cuda_stream uses the stream returned by CudaStreams::new_multi_gpu()
The equivalent would be

    pub(in crate::high_level_api) fn with_thread_local_cuda_streams<
        R,
        F: for<'a> FnOnce(&'a CudaStreams) -> R,
    >(
        func: F,
    ) -> R {
        POOL.with_borrow(|stream_pool| func(&stream_pool.multi))
    }

Or am I mistaken?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, with_thread_local_cuda_streams use the stream in CUDA_STREAMS, which is set when calling set_server_key/replace_server_key and so this stream is not always a multi-gpu, it can also be a single GPU

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is something better to be done in another PR, as another that can be done with it would be to replace the need to do with_internal_keys and with_thread_local_cuda_streams with just 1 function all by making with_internal_keys also give a handle to the streams as well, but that requires some more changes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the with_thing APIs are pub(crate) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either pub(crate) or pub(in crate::hlapi)

Copy link
Contributor Author

@tmontaigu tmontaigu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 25 files reviewed, 11 unresolved discussions (waiting on @agnesLeroy, @IceTDrinker, and @mayeul-zama)

tfhe/src/core_crypto/gpu/vec.rs Outdated Show resolved Hide resolved
@tmontaigu tmontaigu force-pushed the tm/hlapi-context branch 3 times, most recently from e66991e to 59064a8 Compare January 27, 2025 14:36
@IceTDrinker
Copy link
Member

is it in a state where it needs a re review ?

Copy link
Contributor Author

@tmontaigu tmontaigu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Reviewable status: 0 of 25 files reviewed, 11 unresolved discussions (waiting on @IceTDrinker and @mayeul-zama)

Copy link
Member

@IceTDrinker IceTDrinker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly questions on whether move_to_stream is usable in some places (don't know if you checked them all)

Otherwise looks generally good to me

Reviewed 14 of 25 files at r1, 7 of 10 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @mayeul-zama and @tmontaigu)


tfhe/src/high_level_api/integers/signed/inner.rs line 183 at r3 (raw file):

            }
            Self::Cuda(cuda_ct) => {
                if cuda_ct.gpu_indexes() != streams.gpu_indexes() {

move_to_stream does not apply ?


tfhe/src/high_level_api/integers/signed/inner.rs line 224 at r3 (raw file):

                // We are on a GPU, but it may not be the correct one
                let new = with_thread_local_cuda_streams(|streams| {
                    if cuda_ct.gpu_indexes() == streams.gpu_indexes() {

move_to_stream does not apply ?


tfhe/src/high_level_api/booleans/inner.rs line 224 at r3 (raw file):

                // We are on a GPU, but it may not be the correct one
                let new = with_thread_local_cuda_streams(|streams| {
                    if cuda_ct.gpu_indexes() == streams.gpu_indexes() {

move_to_stream does not apply here ?


tfhe/src/high_level_api/keys/server.rs line 385 at r3 (raw file):

    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            Self::Cpu(_) => f.debug_tuple("Cpu").finish(),

perhaps InteralServerKey::Cpu to match enum debug ? Don't recall what the format usually is


tfhe/src/high_level_api/integers/unsigned/inner.rs line 185 at r3 (raw file):

            }
            Self::Cuda(cuda_ct) => {
                if cuda_ct.gpu_indexes() != streams.gpu_indexes() {

move_to_stream does not apply ?


tfhe/src/high_level_api/integers/unsigned/inner.rs line 225 at r3 (raw file):

                // We are on a GPU, but it may not be the correct one
                let new = with_thread_local_cuda_streams(|streams| {
                    if cuda_ct.gpu_indexes() == streams.gpu_indexes() {

move_to_stream does not apply ?

use std::cell::LazyCell;

thread_local! {
pub(crate) static CUDA_STREAMS: RefCell<CudaStreams> = RefCell::new(CudaStreams::new_multi_gpu());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the with_thing APIs are pub(crate) ?

Comment on lines +151 to +155
if ct.gpu_indexes() == streams.gpu_indexes() {
MaybeCloned::Borrowed(ct.as_ref())
} else {
MaybeCloned::Cloned(ct.duplicate(streams).0)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok no worries

Copy link
Contributor

@mayeul-zama mayeul-zama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 15 of 25 files at r1, 10 of 10 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @tmontaigu)

Copy link
Contributor Author

@tmontaigu tmontaigu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @IceTDrinker)


tfhe/src/high_level_api/integers/signed/inner.rs line 183 at r3 (raw file):

Previously, IceTDrinker (Arthur Meyre) wrote…

move_to_stream does not apply ?

hum, we would need to add a move_to_stream_inplace(&mut self)


tfhe/src/high_level_api/integers/signed/inner.rs line 224 at r3 (raw file):

Previously, IceTDrinker (Arthur Meyre) wrote…

move_to_stream does not apply ?

Same as above, we would need to add a move_to_stream_inplace(&mut self)


tfhe/src/high_level_api/keys/server.rs line 385 at r3 (raw file):

Previously, IceTDrinker (Arthur Meyre) wrote…

perhaps InteralServerKey::Cpu to match enum debug ? Don't recall what the format usually is

Yes InteralServerKey::Cpu would probably be better

use std::cell::LazyCell;

thread_local! {
pub(crate) static CUDA_STREAMS: RefCell<CudaStreams> = RefCell::new(CudaStreams::new_multi_gpu());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either pub(crate) or pub(in crate::hlapi)

@@ -334,6 +344,22 @@ impl CudaServerKey {
pub(crate) fn message_modulus(&self) -> crate::shortint::MessageModulus {
self.key.key.message_modulus
}

pub(crate) fn pbs_key(&self) -> &crate::integer::gpu::CudaServerKey {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that could work yes

@tmontaigu tmontaigu merged commit 57386a1 into main Jan 29, 2025
151 checks passed
@tmontaigu tmontaigu deleted the tm/hlapi-context branch January 29, 2025 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants