AI Providers
Configure OpenRouter BYOK and local AI models
OpenRouter BYOK (Free Tier)
OpenRouter is the only supported BYOK provider - all other AI providers route through OpenRouter's unified API.
Setup Steps
- Create Account: Sign up at openrouter.ai
- Get API Key: Visit openrouter.ai/settings/keys
- Configure RightNow AI:
- Go to Settings → AI Providers → OpenRouter
- Enter your OpenRouter API key
- Test connection
Available Models
Access 200+ models through OpenRouter's unified API:
Free Models (with your API key):
google/gemini-2.0-flash-exp:free
mistralai/mistral-small-3.1-24b-instruct:free
Premium Models (with your API key):
- OpenAI: GPT-4, GPT-4 Turbo, GPT-3.5 Turbo
- Anthropic: Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku
- DeepSeek: R1 series, Chat models
- Mistral: Large, Codestral 2501
- Google: Gemini 2.0 Flash
Provider Routing
All cloud providers automatically route through OpenRouter:
- OpenAI → OpenRouter → OpenAI
- Anthropic → OpenRouter → Anthropic
- DeepSeek → OpenRouter → DeepSeek
- Mistral → OpenRouter → Mistral
- Google → OpenRouter → Google
RightNow Pro (Managed Service)
No API key setup required - fully managed OpenRouter integration.
Benefits
- Curated Models: Optimized selection for CUDA development
- Usage Tracking: Comprehensive analytics and billing
- Priority Access: Faster response times and premium models
- Seamless Experience: No API key management needed
Available Models
Chat Models:
- anthropic/claude-sonnet-4
- google/gemini-2.5-flash
- deepseek/deepseek-chat-v3-0324
FIM Models (Autocomplete):
- codestral-2501
- deepseek-r1-distill-qwen-7b
Ready to upgrade? Visit rightnowai.co/pricing to get started with RightNow Pro.
Local Models (Privacy-First)
Complete offline capability with no data leaving your machine.
Ollama
Setup:
- Install Ollama on your system
- Pull a model:
ollama pull codellama
- Configure RightNow AI:
- Settings → AI Providers → Ollama
- Set endpoint:
http://localhost:11434
- Select your model and test connection
Benefits:
- Easy local model management
- CUDA acceleration support
- Automatic model updates
vLLM
Setup:
- Install vLLM:
pip install vllm
- Start server:
python -m vllm.entrypoints.api_server --model codellama/CodeLlama-7b-Instruct-hf
- Configure RightNow AI:
- Settings → AI Providers → vLLM
- Set endpoint and model
- Test connection
Benefits:
- High-performance inference server
- Optimized for CUDA GPUs
- Excellent throughput for large models
LM Studio
Setup:
- Download and install LM Studio
- Download a CUDA-compatible model
- Start local server in LM Studio
- Configure RightNow AI:
- Settings → AI Providers → LM Studio
- Configure endpoint and test connection
Benefits:
- User-friendly interface
- GPU acceleration support
- Easy model management
Use local models for privacy-sensitive projects where code cannot leave your machine.