-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(hlapi): add gpu selection #1990
Conversation
2dc9b44
to
67e3ed8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, code looks fine, don't know about the general logic and per thread streams, I'm not familiar enough with the GPU I believe
if ct.gpu_indexes() == streams.gpu_indexes() { | ||
MaybeCloned::Borrowed(ct.as_ref()) | ||
} else { | ||
MaybeCloned::Cloned(ct.duplicate(streams).0) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seen this pattern a few times, could it be a function without it being annoying to write ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is indeed a bit annoying to rewrite as in each case ct
is a different type, so some kind of trait would be needed, so its not that interesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok no worries
@@ -334,6 +344,22 @@ impl CudaServerKey { | |||
pub(crate) fn message_modulus(&self) -> crate::shortint::MessageModulus { | |||
self.key.key.message_modulus | |||
} | |||
|
|||
pub(crate) fn pbs_key(&self) -> &crate::integer::gpu::CudaServerKey { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the name we could use here
Maybe key
or server_key
instead
Just noticed the name pbs_key
is used in the cpu code.
It's not public so we could also rename it
Btw it seems that this function is used for code in HL array but not in other places
There are still many cuda_key.key.key which could use the same pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
key
feels too simple, computation_key
could add more meaning, but its longer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about server_key
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that could work yes
use std::cell::LazyCell; | ||
|
||
thread_local! { | ||
pub(crate) static CUDA_STREAMS: RefCell<CudaStreams> = RefCell::new(CudaStreams::new_multi_gpu()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it needed when there's already POOL
which has a field multi: LazyCell<CudaStreams>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I hesitated, this can be removed, it would mean the with_thread_local_cuda_stream
would then need to look at the key's gpu_indexes and return the corresponding stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But as it is now in the PR, with_thread_local_cuda_stream
uses the stream returned by CudaStreams::new_multi_gpu()
The equivalent would be
pub(in crate::high_level_api) fn with_thread_local_cuda_streams<
R,
F: for<'a> FnOnce(&'a CudaStreams) -> R,
>(
func: F,
) -> R {
POOL.with_borrow(|stream_pool| func(&stream_pool.multi))
}
Or am I mistaken?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, with_thread_local_cuda_streams use the stream in CUDA_STREAMS, which is set when calling set_server_key/replace_server_key and so this stream is not always a multi-gpu, it can also be a single GPU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this is something better to be done in another PR, as another that can be done with it would be to replace the need to do with_internal_keys and with_thread_local_cuda_streams
with just 1 function all by making with_internal_keys
also give a handle to the streams as well, but that requires some more changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all the with_thing APIs are pub(crate) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either pub(crate) or pub(in crate::hlapi)
67e3ed8
to
522a5b1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 25 files reviewed, 11 unresolved discussions (waiting on @agnesLeroy, @IceTDrinker, and @mayeul-zama)
e66991e
to
59064a8
Compare
59064a8
to
94d3e01
Compare
is it in a state where it needs a re review ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
Reviewable status: 0 of 25 files reviewed, 11 unresolved discussions (waiting on @IceTDrinker and @mayeul-zama)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly questions on whether move_to_stream is usable in some places (don't know if you checked them all)
Otherwise looks generally good to me
Reviewed 14 of 25 files at r1, 7 of 10 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @mayeul-zama and @tmontaigu)
tfhe/src/high_level_api/integers/signed/inner.rs
line 183 at r3 (raw file):
} Self::Cuda(cuda_ct) => { if cuda_ct.gpu_indexes() != streams.gpu_indexes() {
move_to_stream does not apply ?
tfhe/src/high_level_api/integers/signed/inner.rs
line 224 at r3 (raw file):
// We are on a GPU, but it may not be the correct one let new = with_thread_local_cuda_streams(|streams| { if cuda_ct.gpu_indexes() == streams.gpu_indexes() {
move_to_stream does not apply ?
tfhe/src/high_level_api/booleans/inner.rs
line 224 at r3 (raw file):
// We are on a GPU, but it may not be the correct one let new = with_thread_local_cuda_streams(|streams| { if cuda_ct.gpu_indexes() == streams.gpu_indexes() {
move_to_stream does not apply here ?
tfhe/src/high_level_api/keys/server.rs
line 385 at r3 (raw file):
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { match self { Self::Cpu(_) => f.debug_tuple("Cpu").finish(),
perhaps InteralServerKey::Cpu to match enum debug ? Don't recall what the format usually is
tfhe/src/high_level_api/integers/unsigned/inner.rs
line 185 at r3 (raw file):
} Self::Cuda(cuda_ct) => { if cuda_ct.gpu_indexes() != streams.gpu_indexes() {
move_to_stream does not apply ?
tfhe/src/high_level_api/integers/unsigned/inner.rs
line 225 at r3 (raw file):
// We are on a GPU, but it may not be the correct one let new = with_thread_local_cuda_streams(|streams| { if cuda_ct.gpu_indexes() == streams.gpu_indexes() {
move_to_stream does not apply ?
use std::cell::LazyCell; | ||
|
||
thread_local! { | ||
pub(crate) static CUDA_STREAMS: RefCell<CudaStreams> = RefCell::new(CudaStreams::new_multi_gpu()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all the with_thing APIs are pub(crate) ?
if ct.gpu_indexes() == streams.gpu_indexes() { | ||
MaybeCloned::Borrowed(ct.as_ref()) | ||
} else { | ||
MaybeCloned::Cloned(ct.duplicate(streams).0) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok no worries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 15 of 25 files at r1, 10 of 10 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @tmontaigu)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @IceTDrinker)
tfhe/src/high_level_api/integers/signed/inner.rs
line 183 at r3 (raw file):
Previously, IceTDrinker (Arthur Meyre) wrote…
move_to_stream does not apply ?
hum, we would need to add a move_to_stream_inplace(&mut self)
tfhe/src/high_level_api/integers/signed/inner.rs
line 224 at r3 (raw file):
Previously, IceTDrinker (Arthur Meyre) wrote…
move_to_stream does not apply ?
Same as above, we would need to add a move_to_stream_inplace(&mut self)
tfhe/src/high_level_api/keys/server.rs
line 385 at r3 (raw file):
Previously, IceTDrinker (Arthur Meyre) wrote…
perhaps InteralServerKey::Cpu to match enum debug ? Don't recall what the format usually is
Yes InteralServerKey::Cpu
would probably be better
use std::cell::LazyCell; | ||
|
||
thread_local! { | ||
pub(crate) static CUDA_STREAMS: RefCell<CudaStreams> = RefCell::new(CudaStreams::new_multi_gpu()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either pub(crate) or pub(in crate::hlapi)
@@ -334,6 +344,22 @@ impl CudaServerKey { | |||
pub(crate) fn message_modulus(&self) -> crate::shortint::MessageModulus { | |||
self.key.key.message_modulus | |||
} | |||
|
|||
pub(crate) fn pbs_key(&self) -> &crate::integer::gpu::CudaServerKey { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that could work yes
closes: please link all relevant issues
PR content/description
Check-list:
This change is