Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: Expose CUDA streams in cuda.parallel APIs #3080

Open
1 task
shwina opened this issue Dec 6, 2024 · 0 comments
Open
1 task

[FEA]: Expose CUDA streams in cuda.parallel APIs #3080

shwina opened this issue Dec 6, 2024 · 0 comments
Assignees
Labels
feature request New feature or request.

Comments

@shwina
Copy link
Contributor

shwina commented Dec 6, 2024

Is this a duplicate?

Area

cuda.parallel (Python)

Is your feature request related to a problem? Please describe.

Consider the reduce algorithm. The underlying C library exposes a stream argument. On the Python side however, we don't accept a corresponding stream argument, and always pass None (nullptr, or "default stream") to the C API. Thus, Python users are unable to take advantage of concurrency via CUDA streams.

Describe the solution you'd like

We should change the Python API to accept a stream argument, and pass that to the underlying C library.

What should the argument type be?

There are a few options here, that I'm ordering from least preferred to most (in my opinion):

  1. Accept an int (least preferred)
  2. Accept a concrete numba stream object.
  3. Accept a concrete cuda.core.Stream object.
  4. Accept an object with the __cuda_stream__ protocol - this would automatically include (3) without an explicit dependency on cuda.core.

Describe alternatives you've considered

No response

Additional context

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request.
Projects
Status: Todo
Development

No branches or pull requests

2 participants