Benchmarking

Measure and compare CUDA kernel performance

Quick Start

Getting Started with Benchmarking

  1. Open Benchmark Panel: Click "Benchmark" in the bottom panel or use command palette: "CUDA: Open Benchmark View"
  2. Configure Benchmark: Set iterations, warmup runs, and data sizes in the configuration view
  3. Run Benchmark: Click "Run Benchmark" button and watch results appear in real-time
  4. Compare Kernels: Select two benchmarked kernels to see side-by-side performance comparison

Configuration

Benchmark Parameters

Configure these parameters for accurate performance measurement:

  • Iterations: Number of times to run the kernel (e.g., 100 for reliable statistics)
  • Warmup Runs: Initial runs to stabilize GPU and caches (e.g., 10 runs)
  • Data Sizes: Small, medium, large test configurations for different workloads
  • Timing Method: CUDA Events or CPU timers for different precision needs

Running a Benchmark

  1. Configure your benchmark settings
  2. Click "Run Benchmark" button
  3. Watch progress bar as results stream in
  4. See live metrics updating in real-time
  5. Stop button available to cancel long-running benchmarks

Results and Analysis

Live Results

During benchmark execution:

  • Running Status: Shows current iteration (e.g., "Running 45/100")
  • Time Graph: Real-time performance visualization
  • Statistics: Live mean, min, max updates
  • Progress Bar: Visual completion indicator

Final Results

After benchmark completion:

  • Execution Time: Average kernel runtime in milliseconds
  • Statistical Analysis: Mean, median, standard deviation, variance
  • Performance Metrics: Throughput (GB/s), efficiency percentages
  • Distribution Graph: Histogram showing timing distribution
  • Export Button: Save results to CSV or JSON format

Key Metrics

Timing Statistics:

  • Mean: Average execution time across all iterations
  • Median: Middle value, less affected by outliers
  • Standard Deviation: Measure of timing consistency
  • Min/Max: Best and worst case performance

Performance Indicators:

  • Throughput: Data processed per second (GB/s)
  • Occupancy: GPU resource utilization percentage
  • Efficiency: Comparison to theoretical peak performance

Comparing Kernels

How to Compare

  1. Benchmark your original kernel implementation
  2. Make optimizations to your code
  3. Benchmark your optimized version
  4. Click "Compare" button
  5. View side-by-side comparison

Comparison View Shows:

  • Side-by-side metrics: Direct comparison of all performance indicators
  • Speedup percentage: How much faster/slower (e.g., "2.3x faster")
  • Winner highlighted: Better performing kernel shown in green
  • Regression detection: Automatic warning if performance decreased

Multi-Kernel Comparison

Compare multiple optimization approaches:

  • Benchmark baseline implementation
  • Test different optimization strategies
  • Compare all versions simultaneously
  • Identify best performing approach
  • Export comparison data for reports

Advanced Features

Data Size Scaling

Test performance across different input sizes:

  • Small: Cache-friendly workloads
  • Medium: Typical production sizes
  • Large: Memory-bandwidth bound scenarios
  • Custom: Define specific test configurations

Statistical Analysis

Advanced statistical tools for reliable results:

  • Outlier Detection: Identifies and filters anomalous measurements
  • Confidence Intervals: Statistical significance of improvements
  • Variance Analysis: Consistency and stability metrics
  • Percentiles: P50, P95, P99 for latency analysis

Export and Reporting

Share benchmark results:

  • CSV Export: Raw data for further analysis
  • JSON Format: Structured data with metadata
  • Charts: Export visualizations as PNG/SVG
  • Reports: Generate formatted performance reports

Best Practices

For Accurate Results

  • Use at least 100 iterations for statistical reliability
  • Always include warmup runs to stabilize GPU clocks and caches
  • Keep data sizes consistent when comparing kernels
  • Benchmark before and after each optimization
  • Close other GPU applications for isolated measurements
  • Run benchmarks multiple times and average results

Important Considerations

  • Close other GPU applications for accurate results
  • Let GPU stabilize between tests (thermal throttling can affect results)
  • Use same benchmark configuration when comparing kernels
  • Monitor GPU temperature and clock speeds during benchmarking
  • Consider power and thermal limits for production scenarios

Pro Tip: Combine benchmarking with Real-Time Profiling to understand why performance changes occur, not just measure that they do.