Loading...
cudaErrorCooperativeLaunchTooLarge (719)cudaErrorCooperativeLaunchTooLarge occurs when launching a cooperative kernel with more blocks than the GPU can run concurrently.
CUDA error: cooperative launch too large cudaErrorCooperativeLaunchTooLarge
Check device limit.
int numBlocks;
cudaOccupancyMaxActiveBlocksPerMultiprocessor(&numBlocks, kernel, blockSize, 0);
int deviceNumSMs;
cudaDeviceGetAttribute(&deviceNumSMs, cudaDevAttrMultiProcessorCount, 0);
int maxCoopBlocks = numBlocks * deviceNumSMs;Stay within limits.
cudaLaunchCooperativeKernel((void*)kernel, maxCoopBlocks, blockSize, args);Too many blocks for cooperative launch.
cudaLaunchCooperativeKernel(kernel, 10000, 256, args);Query and respect limit.
int maxBlocks = getMaxCooperativeBlocks(kernel, 256);
cudaLaunchCooperativeKernel(kernel, maxBlocks, 256, args);Allows grid-wide synchronization via grid_group::sync(). All blocks must fit on GPU simultaneously.
Need help debugging CUDA errors? Download RightNow AI for intelligent error analysis and optimization suggestions.