Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [STF] parallel_for incorrectly captures variable in a __host__ __device__ lambda with a host execution place #3269

Closed
1 task done
caugonnet opened this issue Jan 8, 2025 · 1 comment
Assignees
Labels
bug Something isn't working right. stf Sequential Task Flow programming model

Comments

@caugonnet
Copy link
Contributor

Is this a duplicate?

Type of Bug

Runtime Error

Component

Not sure

Describe the bug

When using STF's parallel_for construct with a capture list on the host with a host device lambda, the captured variables are corrupted. This breaks the miniWeather example, and can be reduced to this :

#include <cuda/experimental/stf.cuh>

#include <stdio.h>
#include <stdlib.h>

using namespace cuda::experimental::stf;

int main(int argc, char** argv) {
    context ctx;

    int nqpoints = 3;
    auto ltoken = ctx.logical_token();

    ctx.parallel_for(exec_place::host, box(40), ltoken.read())
                    ->*
            [nqpoints] __host__ __device__ (
                    size_t i, void_interface) {
                assert(nqpoints == 3);
            };

    ctx.finalize();
}

This seems to only happen with a host device lambda function.

How to Reproduce

Run the example above, assertion will fail.

Expected behavior

The example should run to completion, and nqpoints should be 3 in the lambda as well

Reproduction link

No response

Operating System

Ubuntu 23.04

nvidia-smi output

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3080 Ti Off | 00000000:04:00.0 Off | N/A |
| 0% 48C P8 19W / 350W | 10MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Quadro P620 Off | 00000000:65:00.0 On | N/A |
| 44% 58C P0 N/A / N/A | 497MiB / 2048MiB | 6% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

NVCC version

/usr/local/cuda-12.6/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Oct_29_23:50:19_PDT_2024
Cuda compilation tools, release 12.6, V12.6.85
Build cuda_12.6.r12.6/compiler.35059454_0

@caugonnet caugonnet added bug Something isn't working right. stf Sequential Task Flow programming model labels Jan 8, 2025
@caugonnet caugonnet self-assigned this Jan 8, 2025
@caugonnet caugonnet added this to CCCL Jan 8, 2025
@github-project-automation github-project-automation bot moved this to Todo in CCCL Jan 8, 2025
@caugonnet
Copy link
Contributor Author

The bug was fixed by #3270, it was caused by a host callback that was storing a reference to an extended lambda function which was out of scope, rather than storing the lambda function itself along with the host callback.

@github-project-automation github-project-automation bot moved this from Todo to Done in CCCL Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right. stf Sequential Task Flow programming model
Projects
Status: Done
Development

No branches or pull requests

1 participant