RightNow AI uses 4 carefully curated models selected for CUDA development performance, powered by RightNow Proxy with local provider support.
All models are accessed via RightNow Proxy, eliminating the need for multiple API keys. These 4 models were selected after extensive experimentation for optimal CUDA coding performance.
Best overall coding model
Ultra-fast model (73% SWE-bench)
Agentic coding specialist
Free tier model with excellent tool calling
Model Selection: These 4 models were chosen after extensive testing on CUDA code generation, kernel optimization, and debugging tasks. No additional models can be added.
All cloud models are accessed through RightNow Proxy, which routes requests to Anthropic and OpenAI via OpenRouter. No API key configuration required.
No BYOK: RightNow AI does not support bringing your own API keys. All models are accessed through RightNow Proxy for simplified setup and consistent performance.
Fill-in-the-Middle (FIM) models are specialized for code completion and autocomplete features. These run separately from chat models.
codestral-2501 - Mistral's code-specialized modeldeepseek-r1-distill-qwen-7b - DeepSeek R1 7B distilled modeldeepseek-r1-distill-qwen-14b - DeepSeek R1 14B distilled modeldeepseek-r1-distill-qwen-32b - DeepSeek R1 32B distilled modelAutocomplete Configuration: Select your preferred FIM model in Settings → AI Providers → Autocomplete Model. FIM models optimize for inline code suggestions.
Complete offline capability with no data leaving your machine.
Setup:
ollama pull codellamahttp://localhost:11434Benefits:
Setup:
pip install vllmpython -m vllm.entrypoints.api_server --model codellama/CodeLlama-7b-Instruct-hfBenefits:
Setup:
Benefits:
Use local models for privacy-sensitive projects where code cannot leave your machine.