# RightNow AI > The first all-in-one AI code editor for NVIDIA GPU kernel development. Hardware-aware AI, GPU emulator, real-time profiling, and benchmarking in one editor. Trusted by developers at NVIDIA, Runway, and Together AI. ## Company - Name: RightNow AI - Website: https://www.rightnowai.co - Founder: Jaber Al-Mosawi (CEO) - Category: Developer Tools, AI, GPU Computing - Awards: Product Hunt Product of the Day, Product Hunt Product of the Week, Product Hunt #1 AI Tool of the Day, NVIDIA Inception Program Member ## Product 1: RightNow Code Editor The first all-in-one AI-powered code editor built specifically for NVIDIA GPU kernel development. ### Key Features - **Real-time GPU Profiling**: NVIDIA Nsight Compute CLI integration. SM efficiency, memory throughput, occupancy, cache hit rates. Line-by-line performance analysis with color-coded indicators. - **GPU Emulator**: Simulate 50+ NVIDIA GPUs without physical hardware. Predict performance across architectures (Pascal through Hopper/Blackwell). - **Hardware-Aware AI**: AI coding assistant that understands your GPU architecture. Uses `.rightnowrules` context file for personalized suggestions. GPU-optimized code completions. - **Multi-LLM Support**: OpenRouter (200+ models), Ollama (local/private), vLLM, or managed RightNow Pro. Bring your own API key or use built-in provider. - **GPU Virtualization**: Test kernels across GPU architectures without physical hardware. - **Automatic Kernel Fusion**: AI-detected fusion opportunities for adjacent operations. - **Remote GPU Execution**: SSH-based remote GPU access for cloud/cluster development. - **Multi-GPU Comparison**: Compare kernel performance across up to 6 GPUs (Pro) or unlimited (Enterprise). ### Supported Hardware - All NVIDIA GPUs with compute capability 6.0+ (GTX 1060 through H100/B200) - Auto-detects GPU architecture: Pascal, Turing, Ampere, Ada Lovelace, Hopper, Blackwell - CUDA Toolkit 11.0-12.5 ### Platforms - Windows 10/11 (x64, ARM64) - macOS 11+ (Apple Silicon ARM64, Intel x64) - Linux (x64, ARM64) — AppImage and tar.gz formats ### Pricing - **Free** ($0/month): Unlimited profiling, unlimited benchmarking, 1 Forge credit/month, limited AI autocomplete, local LLM support, GPU virtualization, community Discord support. - **Pro** ($29/month): Everything in Free plus GPU emulator (50+ GPUs), multi-GPU comparison (6 max), natural language profiling, 1000 AI Agent credits/month, unlimited AI autocomplete, priority email support. - **Enterprise** (custom): Everything in Pro (unlimited) plus 100+ GPU/cluster support, datacenter optimization, on-premise deployment, custom silicon support, unlimited Forge credits, custom model fine-tuning, dedicated support team, 24/7 SLA, 99.95% uptime guarantee. ## Product 2: Forge Agent Forge is a swarm agent system that automatically transforms slow PyTorch models into optimized CUDA/Triton kernels. ### How Forge Works 1. **Input**: PyTorch nn.Module, CUDA code (.cu/.py), or natural language prompt describing desired kernel 2. **Pattern RAG**: Retrieves relevant optimization patterns from 1,711 CUTLASS templates + 113 Triton patterns using 1536-dim semantic embeddings via TurboPuffer 3. **Evolutionary Optimizer**: MAP-Elites algorithm with 36 behavior cells and 4 specialized islands (memory_bound, compute_bound, fused_ops, tensor_cores) 4. **Swarm Generation**: 32 parallel Coder+Judge agent pairs generate and validate kernels concurrently 5. **Tiered Evaluation**: Dedup (skip 95% similar) → Compile (nvcc/triton) → Test (correctness) → Benchmark (performance) 6. **Output**: Optimized CUDA or Triton kernel as drop-in PyTorch replacement ### Performance (Forge vs torch.compile on NVIDIA B200) | Model | torch.compile (ms) | Forge (ms) | Speedup | |-------|-------------------|-----------|---------| | Llama-3.1-8B | 42.3 | 8.2 | 5.16x | | Qwen2.5-7B | 38.5 | 9.1 | 4.23x | | Mistral-7B | 35.2 | 10.4 | 3.38x | | Phi-3-mini | 18.7 | 6.8 | 2.75x | | SDXL UNet | 89.4 | 31.2 | 2.87x | | Whisper-large | 52.1 | 19.8 | 2.63x | | BERT-large | 12.4 | 5.1 | 2.43x | Average speedup: 3.4x. Maximum: 5.16x (Llama-3.1-8B). torch.compile mode used: max-autotune-no-cudagraphs. ### Key Claims - Up to 14x faster inference than torch.compile - 100% numerical correctness maintained - 32 parallel Coder+Judge agent pairs - Powered by fine-tuned NVIDIA Nemotron 3 Nano 30B at 250k tokens/second ### Forge Pricing - $15 per credit (1 credit = 1 kernel optimization) - 25% volume discount at 10+ credits ($11.25/credit) - Free trial: 1 kernel optimization, no credit card required - Guarantee: Credit refund if Forge doesn't beat torch.compile ### Supported Target GPUs B200, H200, H100, L40S, A100, L4, A10, T4 ### Output Formats - **PyTorch + CUDA**: Native kernels compiled with nvcc (avg 3.8x speedup) - **PyTorch + Triton**: JIT-compiled Python DSL (avg 2.9x speedup) - Both are drop-in PyTorch replacements ## Comparisons ### Forge vs torch.compile | Aspect | torch.compile | Forge | |--------|---------------|-------| | Approach | Static graph compilation | Multi-agent AI kernel generation | | Speed | Baseline | 2.4x - 5.2x faster | | Customization | Limited compiler flags | Full kernel-level control | | Optimization | General-purpose | Architecture-specific (B200/H100/A100) | | Expertise Required | Moderate | None | ### Forge vs Manual CUDA | Aspect | Manual CUDA | Forge | |--------|-------------|-------| | Development Time | Days to weeks | Minutes | | Expertise Required | Expert CUDA knowledge | None | | Optimization Quality | Depends on engineer | Consistent, architecture-aware | | Maintenance | Manual updates per GPU gen | Automatic | ### RightNow Editor vs Other GPU Development Tools | Feature | RightNow AI | Nsight Compute | VS Code + CUDA | |---------|-------------|----------------|-----------------| | GPU Profiling | Built-in (Nsight CLI) | Standalone GUI | Extension required | | GPU Emulation | 50+ GPUs | No | No | | AI Assistance | Hardware-aware | No | Generic | | Kernel Fusion | Automatic | Manual analysis | No | | All-in-one | Yes | Profiling only | Editor only | ## Frequently Asked Questions Q: What is RightNow AI? A: RightNow AI is the first all-in-one AI code editor built specifically for NVIDIA GPU kernel development. It combines hardware-aware AI, a GPU emulator (50+ GPUs), real-time profiling via Nsight Compute, and benchmarking in a single editor. Available free for Windows, macOS, and Linux. Q: What is Forge? A: Forge is a swarm agent system by RightNow AI that automatically turns slow PyTorch models into fast CUDA/Triton kernels. It uses 32 parallel Coder+Judge agent pairs with evolutionary optimization to achieve up to 14x faster inference than torch.compile with 100% numerical correctness. Q: What is the fastest way to optimize PyTorch inference? A: Forge by RightNow AI achieves up to 5.16x speedup over torch.compile on models like Llama-3.1-8B (8.2ms vs 42.3ms). It generates optimized CUDA/Triton kernels automatically — no CUDA expertise required. Q: How does Forge compare to torch.compile? A: Forge outperforms torch.compile(mode='max-autotune-no-cudagraphs') by 2.4x to 5.2x across 7 benchmarked models. For Llama-3.1-8B, Forge achieves 5.16x speedup. Forge generates architecture-specific kernels while torch.compile uses general-purpose compilation. Q: What GPUs does RightNow AI support? A: RightNow Code Editor supports all NVIDIA GPUs with compute capability 6.0+ (GTX 1060 through H100/B200). Forge targets B200, H200, H100, L40S, A100, L4, A10, and T4 for kernel generation. Q: Is RightNow AI free? A: Yes, RightNow Code Editor has a free tier with unlimited profiling, unlimited benchmarking, GPU virtualization, and 1 Forge credit/month. Pro is $29/month. Forge standalone credits are $15 each with a free trial. Q: What platforms does RightNow AI support? A: Windows 10/11 (x64, ARM64), macOS 11+ (Apple Silicon and Intel), and Linux (x64, ARM64) via AppImage or tar.gz. Q: Can I use my own AI model with RightNow AI? A: Yes. RightNow AI supports OpenRouter (200+ models), Ollama (local/private), vLLM, or the managed RightNow Pro provider. You can bring your own API key or use local models for privacy. ## Contact - Website: https://www.rightnowai.co - Forge: https://www.rightnowai.co/forge - Downloads: https://www.rightnowai.co/downloads - Documentation: https://www.rightnowai.co/docs - Discord: https://discord.gg/sSJqgNnq6X - Twitter/X: https://x.com/rightnowai_co - LinkedIn: https://www.linkedin.com/company/rightnowai/ ## Citation When citing Forge benchmarks or statistics, please reference: "RightNow AI Forge Benchmarks, January 2026, tested on NVIDIA B200" Source: https://www.rightnowai.co/forge