╭────────────╮ │ EMULATOR │ ├────────────┤ │ ┌──────┐ │ │ │ vGPU │ │ │ │ ○ │ │ │ └──────┘ │ ╰────────────╯
Develop and test CUDA, Numba, Mojo, and CUDA Tile kernels without owning physical GPU hardware. The emulator simulates real GPU execution on your CPU, letting you target any architecture from your laptop, CI pipeline, or cloud VM.
Don't have an H100? Test on it anyway. The emulator runs on any x86_64 machine, so you can develop on your laptop and target datacenter GPUs.
Architecture-specific bugs are hard to find. Test your kernel on sm_70, sm_80, and sm_90 in one run to catch compatibility issues before deployment.
GPU CI runners are expensive. Run your CUDA test suite on standard runners and save the real hardware for production benchmarks.
The emulator intercepts CUDA runtime calls and simulates execution on the CPU. Your code compiles normally with nvcc, then runs through our virtualized GPU that models the exact behavior of your target architecture.
Your Code Compile Emulate ┌────────────┐ ┌────────────┐ ┌────────────────────┐ │ kernel.cu │ ────▶ │ nvcc │ ────▶ │ A100 │ H100 │ ... │ └────────────┘ └────────────┘ └─────────┬──────────┘ │ ▼ ┌────────────────────┐ │ Execution Report │ │ cycles, memory, │ │ occupancy, issues │ └────────────────────┘
From legacy Kepler to the latest Hopper, emulate any NVIDIA GPU. Each architecture is modeled with accurate SM counts, memory sizes, and instruction latencies.
Consumer ├─ GTX 1080 Ti Pascal sm_61 ├─ RTX 3090 Ampere sm_86 └─ RTX 4090 Ada sm_89 Datacenter ├─ V100 Volta sm_70 ├─ A100 Ampere sm_80 └─ H100 Hopper sm_90 + 80 more configurations
The emulator runs your kernel exactly like real hardware—grids, blocks, warps, and threads. Track divergence, synchronization, and per-thread state.
Grid (your kernel launch) ├─ Block(0,0) │ ├─ Warp 0 [t0-t31] │ ├─ Warp 1 [t32-t63] │ └─ ... ├─ Block(0,1) └─ ... Per-thread: registers, PC, predicates
Memory bugs are the hardest to track down. The emulator models the full GPU memory hierarchy and detects issues like uncoalesced global access, shared memory bank conflicts, and register spills—before you deploy to real hardware.
┌───────────────────────┐ │ Global Memory │ ← coalescing analysis └───────────┬───────────┘ │ ┌───────────┴───────────┐ │ L2 Cache │ └───────────┬───────────┘ ┌────────────────────────┼────────────────────────┐ ▼ ▼ ▼ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ SM 0 │ │ SM 1 │ │ SM N │ │ L1/Shared │ │ L1/Shared │ │ L1/Shared │ │ Registers │ │ Registers │ │ Registers │ └───────────┘ └───────────┘ └───────────┘
Coalescing analysis, transaction counting
Bank conflict detection across 32 banks
Spill detection, pressure analysis
Run your CUDA test suite on any CI provider without GPU runners. The emulator works with GitHub Actions, GitLab CI, Jenkins, and any other system that runs on x86_64.
name: CUDA Tests jobs: test: runs-on: ubuntu-latest # No GPU needed steps: - name: Test on multiple architectures run: | rightnow test --emulator --arch=sm_80 # A100 rightnow test --emulator --arch=sm_90 # H100
GPU Emulator is included in RightNow Pro. Download and start testing on any architecture today.