Advanced Features

Personalized AI rules and profiling persistence for power users

.rightnowrules - Personalized AI Guidelines

RightNow AI's .rightnowrules feature provides hardware-aware, personalized AI assistance tailored to your CUDA development workflow and GPU architecture.

Overview

The .rightnowrules file contains custom AI guidelines that enhance every interaction - from chat conversations to code completions. It combines:

  • Hardware Detection: Automatic GPU architecture analysis
  • User Preferences: Your experience level and optimization focus
  • Project Context: Workspace-specific CUDA development guidelines

Creating .rightnowrules

Automatic Generation

Command Palette Method:

  1. Open Command Palette (Ctrl+Shift+P)
  2. Run: RightNow: Generate .rightnowrules File
  3. Confirm hardware detection and preferences
  4. File created in workspace root

Auto-Suggestion:

  • RightNow AI detects CUDA files (.cu, .cuh, .cuf) in your workspace
  • Shows notification when no .rightnowrules exists
  • Click "Generate" for instant setup

Manual Creation

Create .rightnowrules in your workspace root:

text
# CUDA Development Guidelines
# Hardware: RTX 4090 (Ada Lovelace) | CUDA 12.3
# Profile: Intermediate | Machine Learning | Balanced

**Key Guidelines:**
- Optimize for tensor core utilization on Ada Lovelace
- Balance memory coalescing with occupancy
- Prioritize mixed precision for AI workloads
- Target 70%+ GPU utilization for optimal performance

## My Preferences
- Use explicit memory management over automatic
- Optimize for inference latency < 10ms
- Prefer readable code over micro-optimizations
- Focus on FP16/BF16 tensor operations

Hardware-Aware Generation

RightNow AI automatically detects and optimizes for your specific GPU:

GPU Architecture Detection

Supported Architectures:

  • Pascal (GTX 10 series): Focus on memory optimization
  • Turing (RTX 20 series): RT cores and tensor cores
  • Ampere (RTX 30 series): Sparse tensors and structural sparsity
  • Ada Lovelace (RTX 40 series): 3rd-gen RT cores, shader efficiency
  • Hopper (H100): Transformer engine, thread block clusters

Detection Includes:

  • Compute capability and SM count
  • Tensor core availability and generation
  • Memory bandwidth and capacity
  • Multi-GPU configurations

User Preference Integration

Experience Level:

  • beginner: Focus on correctness and learning
  • intermediate: Balance performance and readability
  • expert: Aggressive optimizations and advanced features

Primary Use Case:

  • machine_learning: Tensor operations, mixed precision
  • scientific_computing: Double precision, memory bandwidth
  • graphics: RT cores, rasterization pipelines
  • general: Balanced approach across domains

Optimization Focus:

  • performance: Maximum throughput and speed
  • memory: Minimize memory usage and bandwidth
  • power: Energy-efficient implementations
  • balanced: Compromise across all factors

Example Generated Content

text
# CUDA Development Guidelines
# Auto-generated for Windows on 1/4/2025

**Hardware:** RTX 4090 (Ada Lovelace) | CUDA 12.3

**User Profile:** Intermediate developer | machine_learning focus | balanced optimization

**Key Guidelines:**
- Optimize memory coalescing and shared memory usage
- Balance occupancy with register pressure
- Prioritize tensor operations and mixed precision when available
- Your Ada Lovelace GPU supports 3rd-gen tensor cores for AI workloads

## My Preferences
# Add your specific guidelines here:
# - Coding style (e.g., "prefer explicit memory management")
# - Performance targets (e.g., "optimize for latency < 10ms")
# - Project constraints (e.g., "limited to 4GB GPU memory")
# - Architecture preferences (e.g., "focus on Ampere features")

Integration with AI Features

Chat Assistant

System messages include your .rightnowrules as context for all conversations

Code Completion

Autocomplete respects your guidelines and coding preferences

Quick Edit

Ctrl+K editing follows your optimization priorities and style

Multi-Workspace Support

  • Per-Workspace Rules: Each workspace folder can have its own .rightnowrules
  • Rule Combination: Multiple workspace folders combine their rules intelligently
  • Context Switching: AI automatically adapts when switching between projects

Profiling Data Persistence

RightNow AI maintains comprehensive profiling history in .rightnow/profiling/kernels.json for tracking optimization progress across sessions.

File Structure

text
.rightnow/
└── profiling/
    └── kernels.json  # Persistent profiling database

What Gets Stored

Comprehensive Metrics

Core Performance Data:

  • Execution time and GPU utilization
  • Memory throughput and occupancy
  • SM efficiency and warp efficiency
  • Cache hit rates (L1/L2) and register usage

Advanced Analytics:

  • Branch efficiency and instruction replay overhead
  • Global/shared memory efficiency
  • Temperature and power consumption
  • Roofline analysis (compute vs memory bound)

Historical Tracking

Session Management:

  • Multiple profiling sessions per kernel
  • Timestamps for optimization timeline
  • Performance trend analysis
  • Before/after comparison data

Content-Based Keys:

  • Stable kernel identification across code changes
  • Preserves history when line numbers change
  • Detects real kernel modifications vs. cosmetic edits

AI Recommendations

NCU Integration:

  • Official NVIDIA Nsight Compute recommendations
  • Architecture-specific optimization suggestions
  • Bottleneck identification and solutions
  • Performance improvement tracking

Example Data Structure

json
{
  "version": "1.0",
  "lastUpdated": "2024-01-31T10:00:00Z",
  "kernels": {
    "file:///C:/projects/matmul.cu:matrixMul:a1b2c3:L45": {
      "kernelName": "matrixMul",
      "sourceFile": "matmul.cu",
      "lineNumber": 45,
      "sessions": [
        {
          "executionTime": 12.5,
          "smEfficiency": 85.2,
          "memoryThroughput": 450.8,
          "occupancy": 68.4,
          "l1CacheHitRate": 89.3,
          "recommendations": [
            "Consider increasing occupancy by reducing register usage",
            "Memory access pattern is well coalesced"
          ],
          "timestamp": 1706698800000,
          "performance": "fast"
        }
      ]
    }
  }
}

Optimization Tracking

View performance improvements over time and identify regression points

Session Persistence

Profiling data survives editor restarts and code modifications

Team Collaboration

Share profiling data across team members for collaborative optimization

Smart Identification

Content-based kernel IDs preserve history through code changes

Configuration

Both features work automatically with minimal setup:

User Preferences (Settings → RightNow AI):

  • cudaExperienceLevel: Beginner, Intermediate, Expert
  • cudaPrimaryUseCase: ML, Scientific, Graphics, General
  • cudaOptimizationFocus: Performance, Memory, Power, Balanced

File Management:

  • .rightnowrules: Version control recommended for team guidelines
  • .rightnow/profiling/: Add to .gitignore for personal profiling data

Pro tip: Commit .rightnowrules to share team coding guidelines, but keep .rightnow/profiling/ local for individual optimization tracking.