You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using STF's parallel_for construct with a capture list on the host with a hostdevice lambda, the captured variables are corrupted. This breaks the miniWeather example, and can be reduced to this :
#include <cuda/experimental/stf.cuh>
#include <stdio.h>
#include <stdlib.h>
using namespace cuda::experimental::stf;
int main(int argc, char** argv) {
context ctx;
int nqpoints = 3;
auto ltoken = ctx.logical_token();
ctx.parallel_for(exec_place::host, box(40), ltoken.read())
->*
[nqpoints] __host__ __device__ (
size_t i, void_interface) {
assert(nqpoints == 3);
};
ctx.finalize();
}
This seems to only happen with a hostdevice lambda function.
How to Reproduce
Run the example above, assertion will fail.
Expected behavior
The example should run to completion, and nqpoints should be 3 in the lambda as well
The bug was fixed by #3270, it was caused by a host callback that was storing a reference to an extended lambda function which was out of scope, rather than storing the lambda function itself along with the host callback.
Is this a duplicate?
Type of Bug
Runtime Error
Component
Not sure
Describe the bug
When using STF's parallel_for construct with a capture list on the host with a host device lambda, the captured variables are corrupted. This breaks the miniWeather example, and can be reduced to this :
This seems to only happen with a host device lambda function.
How to Reproduce
Run the example above, assertion will fail.
Expected behavior
The example should run to completion, and nqpoints should be 3 in the lambda as well
Reproduction link
No response
Operating System
Ubuntu 23.04
nvidia-smi output
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3080 Ti Off | 00000000:04:00.0 Off | N/A |
| 0% 48C P8 19W / 350W | 10MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Quadro P620 Off | 00000000:65:00.0 On | N/A |
| 44% 58C P0 N/A / N/A | 497MiB / 2048MiB | 6% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
NVCC version
/usr/local/cuda-12.6/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Oct_29_23:50:19_PDT_2024
Cuda compilation tools, release 12.6, V12.6.85
Build cuda_12.6.r12.6/compiler.35059454_0
The text was updated successfully, but these errors were encountered: