AI Providers

Configure OpenRouter BYOK and local AI models

OpenRouter BYOK (Free Tier)

OpenRouter is the only supported BYOK provider - all other AI providers route through OpenRouter's unified API.

Setup Steps

  1. Create Account: Sign up at openrouter.ai
  2. Get API Key: Visit openrouter.ai/settings/keys
  3. Configure RightNow AI:
    • Go to Settings → AI Providers → OpenRouter
    • Enter your OpenRouter API key
    • Test connection

Available Models

Access 200+ models through OpenRouter's unified API:

Free Models (with your API key):

  • google/gemini-2.0-flash-exp:free
  • mistralai/mistral-small-3.1-24b-instruct:free

Premium Models (with your API key):

  • OpenAI: GPT-4, GPT-4 Turbo, GPT-3.5 Turbo
  • Anthropic: Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku
  • DeepSeek: R1 series, Chat models
  • Mistral: Large, Codestral 2501
  • Google: Gemini 2.0 Flash

Provider Routing

All cloud providers automatically route through OpenRouter:

  • OpenAI → OpenRouter → OpenAI
  • Anthropic → OpenRouter → Anthropic
  • DeepSeek → OpenRouter → DeepSeek
  • Mistral → OpenRouter → Mistral
  • Google → OpenRouter → Google

RightNow Pro (Managed Service)

No API key setup required - fully managed OpenRouter integration.

Benefits

  • Curated Models: Optimized selection for CUDA development
  • Usage Tracking: Comprehensive analytics and billing
  • Priority Access: Faster response times and premium models
  • Seamless Experience: No API key management needed

Available Models

Chat Models:

  • anthropic/claude-sonnet-4
  • google/gemini-2.5-flash
  • deepseek/deepseek-chat-v3-0324

FIM Models (Autocomplete):

  • codestral-2501
  • deepseek-r1-distill-qwen-7b

Ready to upgrade? Visit rightnowai.co/pricing to get started with RightNow Pro.

Local Models (Privacy-First)

Complete offline capability with no data leaving your machine.

Ollama

Setup:

  1. Install Ollama on your system
  2. Pull a model: ollama pull codellama
  3. Configure RightNow AI:
    • Settings → AI Providers → Ollama
    • Set endpoint: http://localhost:11434
    • Select your model and test connection

Benefits:

  • Easy local model management
  • CUDA acceleration support
  • Automatic model updates

vLLM

Setup:

  1. Install vLLM: pip install vllm
  2. Start server: python -m vllm.entrypoints.api_server --model codellama/CodeLlama-7b-Instruct-hf
  3. Configure RightNow AI:
    • Settings → AI Providers → vLLM
    • Set endpoint and model
    • Test connection

Benefits:

  • High-performance inference server
  • Optimized for CUDA GPUs
  • Excellent throughput for large models

LM Studio

Setup:

  1. Download and install LM Studio
  2. Download a CUDA-compatible model
  3. Start local server in LM Studio
  4. Configure RightNow AI:
    • Settings → AI Providers → LM Studio
    • Configure endpoint and test connection

Benefits:

  • User-friendly interface
  • GPU acceleration support
  • Easy model management

Use local models for privacy-sensitive projects where code cannot leave your machine.