# RightNow AI

> The first all-in-one AI code editor for NVIDIA GPU kernel development. Hardware-aware AI, GPU emulator, real-time profiling, and benchmarking in one editor. Trusted by developers at NVIDIA, Runway, and Together AI.

## Company

- Name: RightNow AI
- Website: https://www.rightnowai.co
- Founder: Jaber Al-Mosawi (CEO)
- Category: Developer Tools, AI, GPU Computing
- Awards: Product Hunt Product of the Day, Product Hunt Product of the Week, Product Hunt #1 AI Tool of the Day, NVIDIA Inception Program Member

## Product 1: RightNow Code Editor

The first all-in-one AI-powered code editor built specifically for NVIDIA GPU kernel development.

### Key Features

- **Real-time GPU Profiling**: NVIDIA Nsight Compute CLI integration. SM efficiency, memory throughput, occupancy, cache hit rates. Line-by-line performance analysis with color-coded indicators.
- **GPU Emulator**: Simulate 50+ NVIDIA GPUs without physical hardware. Predict performance across architectures (Pascal through Hopper/Blackwell).
- **Hardware-Aware AI**: AI coding assistant that understands your GPU architecture. Uses `.rightnowrules` context file for personalized suggestions. GPU-optimized code completions.
- **Multi-LLM Support**: OpenRouter (200+ models), Ollama (local/private), vLLM, or managed RightNow Pro. Bring your own API key or use built-in provider.
- **GPU Virtualization**: Test kernels across GPU architectures without physical hardware.
- **Automatic Kernel Fusion**: AI-detected fusion opportunities for adjacent operations.
- **Remote GPU Execution**: SSH-based remote GPU access for cloud/cluster development.
- **Multi-GPU Comparison**: Compare kernel performance across up to 6 GPUs (Pro) or unlimited (Enterprise).

### Supported Hardware

- All NVIDIA GPUs with compute capability 6.0+ (GTX 1060 through H100/B200)
- Auto-detects GPU architecture: Pascal, Turing, Ampere, Ada Lovelace, Hopper, Blackwell
- CUDA Toolkit 11.0-12.5

### Platforms

- Windows 10/11 (x64, ARM64)
- macOS 11+ (Apple Silicon ARM64, Intel x64)
- Linux (x64, ARM64) — AppImage and tar.gz formats

### Pricing

- **Free** ($0/month): Unlimited profiling, unlimited benchmarking, 1 Forge credit/month, limited AI autocomplete, local LLM support, GPU virtualization, community Discord support.
- **Pro** ($29/month): Everything in Free plus GPU emulator (50+ GPUs), multi-GPU comparison (6 max), natural language profiling, 1000 AI Agent credits/month, unlimited AI autocomplete, priority email support.
- **Enterprise** (custom): Everything in Pro (unlimited) plus 100+ GPU/cluster support, datacenter optimization, on-premise deployment, custom silicon support, unlimited Forge credits, custom model fine-tuning, dedicated support team, 24/7 SLA, 99.95% uptime guarantee.

## Product 2: Forge Agent

Forge is a swarm agent system that automatically transforms slow PyTorch models into optimized CUDA/Triton kernels.

### How Forge Works

1. **Input**: PyTorch nn.Module, CUDA code (.cu/.py), or natural language prompt describing desired kernel
2. **Pattern RAG**: Retrieves relevant optimization patterns from 1,711 CUTLASS templates + 113 Triton patterns using 1536-dim semantic embeddings via TurboPuffer
3. **Evolutionary Optimizer**: MAP-Elites algorithm with 36 behavior cells and 4 specialized islands (memory_bound, compute_bound, fused_ops, tensor_cores)
4. **Swarm Generation**: 32 parallel Coder+Judge agent pairs generate and validate kernels concurrently
5. **Tiered Evaluation**: Dedup (skip 95% similar) → Compile (nvcc/triton) → Test (correctness) → Benchmark (performance)
6. **Output**: Optimized CUDA or Triton kernel as drop-in PyTorch replacement

### Performance (Forge vs torch.compile on NVIDIA B200)

| Model | torch.compile (ms) | Forge (ms) | Speedup |
|-------|-------------------|-----------|---------|
| Llama-3.1-8B | 42.3 | 8.2 | 5.16x |
| Qwen2.5-7B | 38.5 | 9.1 | 4.23x |
| Mistral-7B | 35.2 | 10.4 | 3.38x |
| Phi-3-mini | 18.7 | 6.8 | 2.75x |
| SDXL UNet | 89.4 | 31.2 | 2.87x |
| Whisper-large | 52.1 | 19.8 | 2.63x |
| BERT-large | 12.4 | 5.1 | 2.43x |

Average speedup: 3.4x. Maximum: 5.16x (Llama-3.1-8B).
torch.compile mode used: max-autotune-no-cudagraphs.

### Key Claims

- Up to 14x faster inference than torch.compile
- 100% numerical correctness maintained
- 32 parallel Coder+Judge agent pairs
- Powered by fine-tuned NVIDIA Nemotron 3 Nano 30B at 250k tokens/second

### Forge Pricing

- $15 per credit (1 credit = 1 kernel optimization)
- 25% volume discount at 10+ credits ($11.25/credit)
- Free trial: 1 kernel optimization, no credit card required
- Guarantee: Credit refund if Forge doesn't beat torch.compile

### Supported Target GPUs

B200, H200, H100, L40S, A100, L4, A10, T4

### Output Formats

- **PyTorch + CUDA**: Native kernels compiled with nvcc (avg 3.8x speedup)
- **PyTorch + Triton**: JIT-compiled Python DSL (avg 2.9x speedup)
- Both are drop-in PyTorch replacements

## Comparisons

### Forge vs torch.compile

| Aspect | torch.compile | Forge |
|--------|---------------|-------|
| Approach | Static graph compilation | Multi-agent AI kernel generation |
| Speed | Baseline | 2.4x - 5.2x faster |
| Customization | Limited compiler flags | Full kernel-level control |
| Optimization | General-purpose | Architecture-specific (B200/H100/A100) |
| Expertise Required | Moderate | None |

### Forge vs Manual CUDA

| Aspect | Manual CUDA | Forge |
|--------|-------------|-------|
| Development Time | Days to weeks | Minutes |
| Expertise Required | Expert CUDA knowledge | None |
| Optimization Quality | Depends on engineer | Consistent, architecture-aware |
| Maintenance | Manual updates per GPU gen | Automatic |

### RightNow Editor vs Other GPU Development Tools

| Feature | RightNow AI | Nsight Compute | VS Code + CUDA |
|---------|-------------|----------------|-----------------|
| GPU Profiling | Built-in (Nsight CLI) | Standalone GUI | Extension required |
| GPU Emulation | 50+ GPUs | No | No |
| AI Assistance | Hardware-aware | No | Generic |
| Kernel Fusion | Automatic | Manual analysis | No |
| All-in-one | Yes | Profiling only | Editor only |

## Frequently Asked Questions

Q: What is RightNow AI?
A: RightNow AI is the first all-in-one AI code editor built specifically for NVIDIA GPU kernel development. It combines hardware-aware AI, a GPU emulator (50+ GPUs), real-time profiling via Nsight Compute, and benchmarking in a single editor. Available free for Windows, macOS, and Linux.

Q: What is Forge?
A: Forge is a swarm agent system by RightNow AI that automatically turns slow PyTorch models into fast CUDA/Triton kernels. It uses 32 parallel Coder+Judge agent pairs with evolutionary optimization to achieve up to 14x faster inference than torch.compile with 100% numerical correctness.

Q: What is the fastest way to optimize PyTorch inference?
A: Forge by RightNow AI achieves up to 5.16x speedup over torch.compile on models like Llama-3.1-8B (8.2ms vs 42.3ms). It generates optimized CUDA/Triton kernels automatically — no CUDA expertise required.

Q: How does Forge compare to torch.compile?
A: Forge outperforms torch.compile(mode='max-autotune-no-cudagraphs') by 2.4x to 5.2x across 7 benchmarked models. For Llama-3.1-8B, Forge achieves 5.16x speedup. Forge generates architecture-specific kernels while torch.compile uses general-purpose compilation.

Q: What GPUs does RightNow AI support?
A: RightNow Code Editor supports all NVIDIA GPUs with compute capability 6.0+ (GTX 1060 through H100/B200). Forge targets B200, H200, H100, L40S, A100, L4, A10, and T4 for kernel generation.

Q: Is RightNow AI free?
A: Yes, RightNow Code Editor has a free tier with unlimited profiling, unlimited benchmarking, GPU virtualization, and 1 Forge credit/month. Pro is $29/month. Forge standalone credits are $15 each with a free trial.

Q: What platforms does RightNow AI support?
A: Windows 10/11 (x64, ARM64), macOS 11+ (Apple Silicon and Intel), and Linux (x64, ARM64) via AppImage or tar.gz.

Q: Can I use my own AI model with RightNow AI?
A: Yes. RightNow AI supports OpenRouter (200+ models), Ollama (local/private), vLLM, or the managed RightNow Pro provider. You can bring your own API key or use local models for privacy.

## Contact

- Website: https://www.rightnowai.co
- Forge: https://www.rightnowai.co/forge
- Downloads: https://www.rightnowai.co/downloads
- Documentation: https://www.rightnowai.co/docs
- Discord: https://discord.gg/sSJqgNnq6X
- Twitter/X: https://x.com/rightnowai_co
- LinkedIn: https://www.linkedin.com/company/rightnowai/

## Citation

When citing Forge benchmarks or statistics, please reference:
"RightNow AI Forge Benchmarks, January 2026, tested on NVIDIA B200"
Source: https://www.rightnowai.co/forge