╭────────────╮ │ AI │ ├────────────┤ │ ◦──◦ │ │ ╲ ╱ │ │ ◦ │ │ ╱ ╲ │ │ ◦──◦ │ ╰────────────╯
Two ways to use AI: chat interactively to write and debug CUDA code, or let the optimizer automatically improve your kernels with one click. Both know your GPU and support custom agents with skills and MCP integrations.
Interactive chat for writing kernels, debugging issues, and asking questions. GPU-aware responses tailored to your hardware.
One-click optimization that reads profiling data, rewrites your code, verifies correctness, and shows the speedup.
Bring your own API key or run completely offline with local models. The model list shows cloud providers and local GPU-backed models in one place.
├─ OpenAI GPT-4o, GPT-4 Turbo ├─ Anthropic Claude 3.5 Sonnet ├─ Google Gemini 1.5 Pro ├─ Deepseek Deepseek V3 └─ OpenRouter 100+ models
├─ Ollama Qwen2.5, Llama 3 ├─ LM Studio Any GGUF model └─ vLLM Self-hosted
Both the chat assistant and optimizer know your hardware. No generic suggestions.
// Detected hardware: GPU: RTX 4090 (sm_89, 128 SMs, 24GB) CUDA: 12.4 Features: Tensor cores, async copy, L2 persistence // AI uses this for all suggestions and optimizations
Chat assistant and auto optimizer included in Pro plan.