GPU Emulation
Test CUDA code on different GPUs without owning the hardware
Quick Start

Getting Started with GPU Emulation
- Open GPU Emulation: Click the GPU Emulation icon in the sidebar (circuit board icon)
- Choose a GPU: Select the GPU you want to test from the list. Click the checkbox next to it
- Run Your Code: Open your
.cu
file and click the Build button. Your code runs on the selected GPU
Available GPUs
The emulator includes all major NVIDIA GPUs across different generations:
Latest Generation
- H100, H200
- RTX 4090, RTX 4080
- L40S
Data Center
- A100, A10
- V100
- T4
Gaming & Workstation
- RTX 3090, RTX 3080
- RTX 2080 Ti
- GTX 1080 Ti
Legacy Systems
- Tesla K80
- GTX 1060
- Earlier architectures
How to Select GPUs

Selection Methods
- Single GPU: Check one box to test on that GPU
- Multiple GPUs: Check multiple boxes to compare performance across different architectures
- Search: Type GPU name in search box to find it quickly
- Clear: Click "Clear Selection" to deselect all GPUs
What You'll See

When emulation is active:
- Status bar shows the selected GPU name
- "Emulation Active" indicator appears
- Performance results show estimated metrics
- Architecture-specific warnings if code uses unsupported features
Best Uses
Development & Testing
- Test code compatibility on different GPUs
- Develop without expensive hardware
- Compare performance across architectures
- Learn CUDA on any computer
Architecture Exploration
- Test Hopper features without H100 access
- Evaluate tensor core performance
- Understand compute capability differences
- Plan for future GPU upgrades
Educational
- Learn CUDA without GPU hardware
- Experiment with different configurations
- Understand architectural differences
- Practice optimization techniques
Performance Analysis
- Estimate performance on target hardware
- Compare workloads across GPU generations
- Identify architecture-specific bottlenecks
- Validate optimization strategies
Emulation Features
Accurate Hardware Modeling
- Compute Capability: Emulates correct CC version for each GPU
- Memory Specs: Realistic memory bandwidth and capacity constraints
- SM Count: Accurate streaming multiprocessor configuration
- Register Limits: Per-thread and per-block register constraints
Performance Estimation
- Execution Time: Estimated kernel runtime based on architecture
- Occupancy: Calculate theoretical occupancy limits
- Memory Throughput: Bandwidth and latency modeling
- Bottleneck Analysis: Identify compute vs memory bound kernels
Compatibility Checking
- Feature Detection: Warns about unsupported CUDA features
- Architecture Warnings: Alerts for architecture-specific code paths
- Compilation Validation: Ensures code compiles for target GPU
- API Compatibility: Checks CUDA API version requirements
Tips and Limitations
Best Practices
- Use emulation for compatibility testing and initial development
- Compare performance trends across GPU generations
- Validate architectural assumptions before hardware purchase
- Test multiple GPUs to ensure broad compatibility
Important Limitations
- Estimates Only: Emulation provides estimates. For production benchmarks, use real hardware
- Performance Accuracy: Timing estimates are approximate and may vary from actual hardware
- Real Hardware Preferred: Always validate critical optimizations on actual target GPUs
- Architecture-Specific Features: Some advanced features may not be fully emulated
Next Steps: For real hardware testing, explore Remote GPU Execution to connect to cloud GPUs or your own remote servers.