Benchmarking
Measure and compare CUDA kernel performance
Quick Start
Getting Started with Benchmarking
- Open Benchmark Panel: Click "Benchmark" in the bottom panel or use command palette: "CUDA: Open Benchmark View"
- Configure Benchmark: Set iterations, warmup runs, and data sizes in the configuration view
- Run Benchmark: Click "Run Benchmark" button and watch results appear in real-time
- Compare Kernels: Select two benchmarked kernels to see side-by-side performance comparison
Configuration
Benchmark Parameters
Configure these parameters for accurate performance measurement:
- Iterations: Number of times to run the kernel (e.g., 100 for reliable statistics)
- Warmup Runs: Initial runs to stabilize GPU and caches (e.g., 10 runs)
- Data Sizes: Small, medium, large test configurations for different workloads
- Timing Method: CUDA Events or CPU timers for different precision needs
Running a Benchmark
- Configure your benchmark settings
- Click "Run Benchmark" button
- Watch progress bar as results stream in
- See live metrics updating in real-time
- Stop button available to cancel long-running benchmarks
Results and Analysis
Live Results
During benchmark execution:
- Running Status: Shows current iteration (e.g., "Running 45/100")
- Time Graph: Real-time performance visualization
- Statistics: Live mean, min, max updates
- Progress Bar: Visual completion indicator
Final Results
After benchmark completion:
- Execution Time: Average kernel runtime in milliseconds
- Statistical Analysis: Mean, median, standard deviation, variance
- Performance Metrics: Throughput (GB/s), efficiency percentages
- Distribution Graph: Histogram showing timing distribution
- Export Button: Save results to CSV or JSON format
Key Metrics
Timing Statistics:
- Mean: Average execution time across all iterations
- Median: Middle value, less affected by outliers
- Standard Deviation: Measure of timing consistency
- Min/Max: Best and worst case performance
Performance Indicators:
- Throughput: Data processed per second (GB/s)
- Occupancy: GPU resource utilization percentage
- Efficiency: Comparison to theoretical peak performance
Comparing Kernels
How to Compare
- Benchmark your original kernel implementation
- Make optimizations to your code
- Benchmark your optimized version
- Click "Compare" button
- View side-by-side comparison
Comparison View Shows:
- Side-by-side metrics: Direct comparison of all performance indicators
- Speedup percentage: How much faster/slower (e.g., "2.3x faster")
- Winner highlighted: Better performing kernel shown in green
- Regression detection: Automatic warning if performance decreased
Multi-Kernel Comparison
Compare multiple optimization approaches:
- Benchmark baseline implementation
- Test different optimization strategies
- Compare all versions simultaneously
- Identify best performing approach
- Export comparison data for reports
Advanced Features
Data Size Scaling
Test performance across different input sizes:
- Small: Cache-friendly workloads
- Medium: Typical production sizes
- Large: Memory-bandwidth bound scenarios
- Custom: Define specific test configurations
Statistical Analysis
Advanced statistical tools for reliable results:
- Outlier Detection: Identifies and filters anomalous measurements
- Confidence Intervals: Statistical significance of improvements
- Variance Analysis: Consistency and stability metrics
- Percentiles: P50, P95, P99 for latency analysis
Export and Reporting
Share benchmark results:
- CSV Export: Raw data for further analysis
- JSON Format: Structured data with metadata
- Charts: Export visualizations as PNG/SVG
- Reports: Generate formatted performance reports
Best Practices
For Accurate Results
- Use at least 100 iterations for statistical reliability
- Always include warmup runs to stabilize GPU clocks and caches
- Keep data sizes consistent when comparing kernels
- Benchmark before and after each optimization
- Close other GPU applications for isolated measurements
- Run benchmarks multiple times and average results
Important Considerations
- Close other GPU applications for accurate results
- Let GPU stabilize between tests (thermal throttling can affect results)
- Use same benchmark configuration when comparing kernels
- Monitor GPU temperature and clock speeds during benchmarking
- Consider power and thermal limits for production scenarios
Pro Tip: Combine benchmarking with Real-Time Profiling to understand why performance changes occur, not just measure that they do.