[BUG] [STF] parallel_for incorrectly captures variable in a host device lambda with a host execution place #3269

caugonnet · 2025-01-08T07:48:56Z

Is this a duplicate?

I confirmed there appear to be no duplicate issues for this bug and that I agree to the Code of Conduct

Type of Bug

Runtime Error

Component

Not sure

Describe the bug

When using STF's parallel_for construct with a capture list on the host with a host device lambda, the captured variables are corrupted. This breaks the miniWeather example, and can be reduced to this :

#include <cuda/experimental/stf.cuh>

#include <stdio.h>
#include <stdlib.h>

using namespace cuda::experimental::stf;

int main(int argc, char** argv) {
    context ctx;

    int nqpoints = 3;
    auto ltoken = ctx.logical_token();

    ctx.parallel_for(exec_place::host, box(40), ltoken.read())
                    ->*
            [nqpoints] __host__ __device__ (
                    size_t i, void_interface) {
                assert(nqpoints == 3);
            };

    ctx.finalize();
}

This seems to only happen with a host device lambda function.

How to Reproduce

Run the example above, assertion will fail.

Expected behavior

The example should run to completion, and nqpoints should be 3 in the lambda as well

Reproduction link

No response

Operating System

Ubuntu 23.04

nvidia-smi output

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3080 Ti Off | 00000000:04:00.0 Off | N/A |
| 0% 48C P8 19W / 350W | 10MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Quadro P620 Off | 00000000:65:00.0 On | N/A |
| 44% 58C P0 N/A / N/A | 497MiB / 2048MiB | 6% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

NVCC version

/usr/local/cuda-12.6/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Oct_29_23:50:19_PDT_2024
Cuda compilation tools, release 12.6, V12.6.85
Build cuda_12.6.r12.6/compiler.35059454_0

The text was updated successfully, but these errors were encountered:

caugonnet · 2025-01-08T14:19:57Z

The bug was fixed by #3270, it was caused by a host callback that was storing a reference to an extended lambda function which was out of scope, rather than storing the lambda function itself along with the host callback.

caugonnet added bug Something isn't working right. stf Sequential Task Flow programming model labels Jan 8, 2025

caugonnet self-assigned this Jan 8, 2025

caugonnet added this to CCCL Jan 8, 2025

github-project-automation bot moved this to Todo in CCCL Jan 8, 2025

caugonnet mentioned this issue Jan 8, 2025

[STF] Corruption of the capture list of an extended lambda with a parallel_for construct on a host execution place #3270

Merged

2 tasks

caugonnet closed this as completed Jan 8, 2025

github-project-automation bot moved this from Todo to Done in CCCL Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] [STF] parallel_for incorrectly captures variable in a host device lambda with a host execution place #3269

[BUG] [STF] parallel_for incorrectly captures variable in a host device lambda with a host execution place #3269

caugonnet commented Jan 8, 2025

caugonnet commented Jan 8, 2025

[BUG] [STF] parallel_for incorrectly captures variable in a __host__ __device__ lambda with a host execution place #3269

[BUG] [STF] parallel_for incorrectly captures variable in a __host__ __device__ lambda with a host execution place #3269

Comments

caugonnet commented Jan 8, 2025

Is this a duplicate?

Type of Bug

Component

Describe the bug

How to Reproduce

Expected behavior

Reproduction link

Operating System

nvidia-smi output

NVCC version

caugonnet commented Jan 8, 2025

[BUG] [STF] parallel_for incorrectly captures variable in a host device lambda with a host execution place #3269

[BUG] [STF] parallel_for incorrectly captures variable in a host device lambda with a host execution place #3269