[FEA]: Expose CUDA streams in cuda.parallel APIs #3080

shwina · 2024-12-06T20:59:04Z

I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct

cuda.parallel (Python)

Consider the reduce algorithm. The underlying C library exposes a stream argument. On the Python side however, we don't accept a corresponding stream argument, and always pass None (nullptr, or "default stream") to the C API. Thus, Python users are unable to take advantage of concurrency via CUDA streams.

We should change the Python API to accept a stream argument, and pass that to the underlying C library.

There are a few options here, that I'm ordering from least preferred to most (in my opinion):

Accept an int (least preferred)
Accept a concrete numba stream object.
Accept a concrete cuda.core.Stream object.
Accept an object with the __cuda_stream__ protocol - this would automatically include (3) without an explicit dependency on cuda.core.

No response

No response

The text was updated successfully, but these errors were encountered:

shwina added the feature request New feature or request. label Dec 6, 2024

github-project-automation bot added this to CCCL Dec 6, 2024

github-project-automation bot moved this to Todo in CCCL Dec 6, 2024

gevtushenko assigned shwina Dec 20, 2024

shwina assigned NaderAlAwar and unassigned shwina Jan 9, 2025

shwina mentioned this issue Jan 10, 2025

cuda.parallel: Support structured types as algorithm inputs #3218

Open

2 tasks

Provide feedback