RightNow AI is a research lab and software company working on GPU programming tools, CUDA development workflows, model-hardware co-design, and inference infrastructure.

Which NVIDIA GPUs are supported by RightNow AI?

RightNow AI supports all NVIDIA GPUs with CUDA Toolkit 11.0-12.5, including GeForce RTX 40/30/20 series, GTX 16/10 series, Quadro RTX, Tesla, A100, and H100.

How much does RightNow AI cost?

RightNow AI is free to use with unlimited profiling and benchmarking. RightNow Pro costs $29 per month and adds GPU emulator access (50+ GPUs), multi-GPU comparison, and 1,000 AI credits per month.

What CUDA development workflow does RightNow AI support?

RightNow AI supports CUDA development workflows that combine editing, profiling, emulation, remote GPU execution, and benchmarked performance analysis.

Core Features | RightNow AI Docs

Q: Can I use RightNow AI on macOS?

Yes, RightNow AI is fully available on macOS (Apple Silicon and Intel). Mac users can use remote GPUs for free or our built-in GPU emulator for CUDA profiling.

AI-Powered CUDA Coding

Smart Autocomplete

Context-aware CUDA completions with Fill-in-the-Middle (FIM) optimization. Supports 20+ FIM-capable models including DeepSeek R1, Codestral 2501, and StarCoder2.

Ctrl+K Editing

Select any code and press Ctrl+K to describe changes in natural language:

"Optimize this kernel for memory bandwidth"
"Add error checking to this CUDA call"
"Convert this to use shared memory"

Chat Integration

Full project context with CUDA-specific knowledge:

Ask questions about GPU architecture optimization
Get recommendations for specific hardware (Ampere, Ada Lovelace, Hopper)
Troubleshoot CUDA compilation and runtime issues

Real-Time CUDA Profiling

NVIDIA Nsight Compute Integration

Production-grade profiling using nv-nsight-cu-cli with comprehensive hardware metrics:

Core Performance Metrics:

SM Efficiency: Streaming Multiprocessor utilization percentage
Memory Throughput: Achieved vs theoretical memory bandwidth (GB/s)
Occupancy: Active warps vs maximum theoretical warps
Warp Efficiency: Percentage of active threads in executed warps
Instruction Replay Overhead: Pipeline stall analysis
Global Memory Efficiency: Coalesced memory access patterns
Shared Memory Efficiency: Bank conflict analysis
Branch Efficiency: Divergent execution measurement

Advanced Metrics:

L1/L2 Cache Hit Rates: Memory hierarchy performance
Register Usage: Per-thread register consumption
Power Draw: Real-time GPU power consumption (watts)
Temperature: GPU thermal monitoring
Roofline Analysis: Compute vs memory-bound classification

Multi-Level Profiling Support

🧊 Kernel Profiling

Profile specific __global__ functions with targeted analysis

▶️ Application Profiling

Full executable profiling with complete call graphs

💻 CLI Integration

Direct nv-nsight-cu-cli integration with custom metrics

Visual Profiling Interface

CodeLens Integration:

Inline performance metrics displayed above CUDA kernels
Real-time execution time, SM efficiency, memory throughput
Color-coded performance indicators

🟢 Green

>80% efficiency (optimized kernels)

🟡 Orange

40-80% efficiency (moderate performance)

🔴 Red

<40% efficiency (needs optimization)

Interactive Profiling Controls:

Gutter Play Buttons: One-click profiling from editor margins
Dedicated Profiling Panel: Comprehensive results view with historical data
Multi-GPU Support: Device switching and cross-GPU analysis
Elevated Profiling: Windows UAC support for performance counter access

Hardware Detection & Monitoring

GPU Hardware Integration

Multi-Vendor Detection: NVIDIA, AMD, Intel, Apple Silicon
Real-Time Monitoring: nvidia-smi integration for live metrics
Hardware Specifications: Automatic detection of compute capability, SM count, memory specs
Architecture Support: Turing, Ampere, Ada Lovelace, Hopper optimizations

CUDA Environment Detection

Toolkit Detection: Automatic CUDA 11.0-12.5 detection
Registry Integration: Windows performance counter access
Multi-Version Support: Compatible with various NCU versions
Diagnostic Capabilities: Comprehensive environment validation

AI-Powered Performance Analysis

Bottleneck Classification: Memory-bound vs compute-bound identification
Architecture-Specific Suggestions: Tailored for detected GPU architecture
Performance Trend Analysis: Historical optimization tracking
Automated Code Suggestions: AI-generated kernel optimizations based on profiling data

AI Provider Architecture

BYOK (Bring Your Own Key) - Free Tier

OpenRouter API Key: The only supported BYOK option
Get unlimited usage with your own OpenRouter key
Access to 200+ models through OpenRouter's unified API
Support for providers: OpenAI, Anthropic, DeepSeek, Mistral, Google, and more

RightNow Proxy - Pro Tier

Managed Service: Pre-configured OpenRouter integration
Curated Models: Optimized model selection for CUDA development
Usage Tracking: Comprehensive analytics and billing
Priority Access: Faster response times and premium models

Local Models (Privacy-First)

Complete offline capability with no data leaving your machine:

Ollama: Easy local model management with CUDA acceleration
vLLM: High-performance inference server for CUDA GPUs
LM Studio: User-friendly local deployment with GPU support

macOS Support: RightNow AI is available on macOS (Apple Silicon and Intel). Mac users can use remote GPUs for free or our GPU emulator for CUDA profiling.