RightNow AI is a research lab and software company working on GPU programming tools, CUDA development workflows, model-hardware co-design, and inference infrastructure.

Which NVIDIA GPUs are supported by RightNow AI?

RightNow AI supports all NVIDIA GPUs with CUDA Toolkit 11.0-12.5, including GeForce RTX 40/30/20 series, GTX 16/10 series, Quadro RTX, Tesla, A100, and H100.

How much does RightNow AI cost?

RightNow AI is free to use with unlimited profiling and benchmarking. RightNow Pro costs $29 per month and adds GPU emulator access (50+ GPUs), multi-GPU comparison, and 1,000 AI credits per month.

What CUDA development workflow does RightNow AI support?

RightNow AI supports CUDA development workflows that combine editing, profiling, emulation, remote GPU execution, and benchmarked performance analysis.

Can I use RightNow AI on macOS?

Yes, RightNow AI is fully available on macOS (Apple Silicon and Intel). Mac users can use remote GPUs for free or our built-in GPU emulator for CUDA profiling.

╭────────────╮
│ MULTI-GPU  │
├────────────┤
│  [0]··[1]  │
│   │····│   │
│  [2]··[3]  │
│  ▸ scaling │
╰────────────╯

Multi-GPU

Pro

Profile and optimize across multiple GPUs. Compare kernel performance side-by-side, analyze NVLink communication, and identify load balancing issues before they become production problems.

6GPUs (Pro)

100+Enterprise

NVLinkAware

Why Multi-GPU Profiling?

Compare Hardware

Run the same kernel on A100 and H100 simultaneously. See exactly which hardware is better for your workload.

Find Imbalance

Multi-GPU code often has load imbalance. See per-GPU utilization and identify which device is the bottleneck.

Optimize Comms

NVLink topology matters. Understand inter-GPU bandwidth and optimize data placement for your specific system.

Side-by-Side Comparison

Run benchmarks across multiple GPUs and see results side-by-side. Make informed hardware decisions based on your actual workload.

matmul_kernel [2048x2048]

GPU 0: RTX 4090
├─ Time:        4.2ms
├─ Bandwidth:   892 GB/s
└─ SM Util:     94%

GPU 1: H100
├─ Time:        1.8ms  (2.3x faster)
├─ Bandwidth:   2.1 TB/s
└─ SM Util:     89%

NVLink Analysis

Coming Soon

Visualize your NVLink topology and measure actual peer-to-peer bandwidth. Understand where data placement matters for your multi-GPU workloads.

NVLink Topology

┌─────────┐  NVLink 4  ┌─────────┐
│  GPU 0  │◀──────────▶│  GPU 1  │
│   H100  │  900 GB/s   │   H100  │
└────┬────┘             └────┬────┘
     │                        │
     │      NVLink 4          │
     └────────────────────────┘

Load Balancing

Coming Soon

Monitor per-GPU utilization and identify imbalances. See which GPU is waiting and get suggestions for better work distribution.

Monitors

• Per-GPU utilization
• Memory pressure
• Kernel overlap
• Sync points

Detects

• Work imbalance
• Memory bottlenecks
• Sync overhead
• Idle GPUs

Scale Your Development

Multi-GPU profiling for up to 6 GPUs in Pro. Enterprise supports 100+.

Download Documentation

Related Features

GPU Profiler

Single-GPU performance analysis

GPU Emulator

Test without hardware

AI Features

Chat assistant + auto optimizer

╭────────────╮ │ MULTI-GPU │ ├────────────┤ │ [0]··[1] │ │ │····│ │ │ [2]··[3] │ │ ▸ scaling │ ╰────────────╯