[FEA]: Design operator specialization for cuda.parallel / C Parallel #3574

gevtushenko · 2025-01-28T22:07:53Z

Is this a duplicate?

I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct

Area

cuda.parallel (Python)

Is your feature request related to a problem? Please describe.

Currently, cuda.parallel only provides generic version of parallel reduction:

cccl/python/cuda_parallel/cuda/parallel/experimental/algorithms/reduce.py

Lines 167 to 172 in 83b10c2

    
           def reduce_into( 
        
               d_in: DeviceArrayLike | IteratorBase, 
        
               d_out: DeviceArrayLike, 
        
               op: Callable, 
        
               h_init: np.ndarray, 
        
           ):

On the C++ end, we have specialized code paths for some operators:

cccl/cub/cub/warp/specializations/warp_reduce_shfl.cuh

Lines 610 to 625 in 83b10c2

    
           template <class U = T> 
        
           _CCCL_DEVICE _CCCL_FORCEINLINE 
        
           typename ::cuda::std::enable_if<(::cuda::std::is_same<int, U>::value || ::cuda::std::is_same<unsigned int, U>::value) 
        
                                             && detail::reduce_add_exists<>::value, 
        
                                           T>::type 
        
           ReduceImpl(Int2Type<1> /* all_lanes_valid */, T input, int /* valid_items */, ::cuda::std::plus<> /* reduction_op */) 
        
           { 
        
             T output = input; 
        
             NV_IF_TARGET( 
        
               NV_PROVIDES_SM_80, 
        
               (output = __reduce_add_sync(member_mask, input);), 
        
               (output = ReduceImpl<::cuda::std::plus<>>(Int2Type<1>{}, input, LOGICAL_WARP_THREADS, ::cuda::std::plus<>{});)); 
        
             return output; 
        
           }

cuda.parallel doesn't use these optimizations, because generic operators are not classified as, say, cuda::std::plus.

Describe the solution you'd like

cuda.parallel should have a way of recognizing standard operators and mapping them to underlying C++ concepts.

Describe alternatives you've considered

This problem can be split into the interface and implementation components.
On the interface end, one way to achieve that would be through different overloads #2542.
But we can also consider introspecting the operator, or recognizing built-in functions like sum on Python end.
Regardless, before addressing the interface question, we need a machinery to support it on the implementation end.
This issue can be closed by a prototype of an operator specialization machinery that'd allow cuda.parallel to request standard operators like cuda::std::plus, cuda::minimum, cuda::maximum, etc.

Additional context

No response

The text was updated successfully, but these errors were encountered:

shwina · 2025-01-28T22:56:15Z

I know we want to punt on the interface question, but worth noting the operator module from the Python standard library: https://docs.python.org/3/library/operator.html

gevtushenko added the feature request New feature or request. label Jan 28, 2025

github-project-automation bot added this to CCCL Jan 28, 2025

github-project-automation bot moved this to Todo in CCCL Jan 28, 2025

This was referenced Jan 28, 2025

[FEA]: Implement cuda.parallel.{sum,min,max) algorithms #2542

Open

Investigate performance delta between cuda.parallel and CuPy reduction #3213

Open

gevtushenko assigned griwes Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA]: Design operator specialization for cuda.parallel / C Parallel #3574

[FEA]: Design operator specialization for cuda.parallel / C Parallel #3574

gevtushenko commented Jan 28, 2025

shwina commented Jan 28, 2025

[FEA]: Design operator specialization for cuda.parallel / C Parallel #3574

[FEA]: Design operator specialization for cuda.parallel / C Parallel #3574

Comments

gevtushenko commented Jan 28, 2025

Is this a duplicate?

Area

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

shwina commented Jan 28, 2025