Advanced Features
Personalized AI rules and profiling persistence for power users
.rightnowrules - Personalized AI Guidelines
RightNow AI's .rightnowrules
feature provides hardware-aware, personalized AI assistance tailored to your CUDA development workflow and GPU architecture.
Overview
The .rightnowrules
file contains custom AI guidelines that enhance every interaction - from chat conversations to code completions. It combines:
- Hardware Detection: Automatic GPU architecture analysis
- User Preferences: Your experience level and optimization focus
- Project Context: Workspace-specific CUDA development guidelines
Creating .rightnowrules
Automatic Generation
Command Palette Method:
- Open Command Palette (
Ctrl+Shift+P
) - Run:
RightNow: Generate .rightnowrules File
- Confirm hardware detection and preferences
- File created in workspace root
Auto-Suggestion:
- RightNow AI detects CUDA files (
.cu
,.cuh
,.cuf
) in your workspace - Shows notification when no
.rightnowrules
exists - Click "Generate" for instant setup
Manual Creation
Create .rightnowrules
in your workspace root:
# CUDA Development Guidelines
# Hardware: RTX 4090 (Ada Lovelace) | CUDA 12.3
# Profile: Intermediate | Machine Learning | Balanced
**Key Guidelines:**
- Optimize for tensor core utilization on Ada Lovelace
- Balance memory coalescing with occupancy
- Prioritize mixed precision for AI workloads
- Target 70%+ GPU utilization for optimal performance
## My Preferences
- Use explicit memory management over automatic
- Optimize for inference latency < 10ms
- Prefer readable code over micro-optimizations
- Focus on FP16/BF16 tensor operations
Hardware-Aware Generation
RightNow AI automatically detects and optimizes for your specific GPU:
GPU Architecture Detection
Supported Architectures:
- Pascal (GTX 10 series): Focus on memory optimization
- Turing (RTX 20 series): RT cores and tensor cores
- Ampere (RTX 30 series): Sparse tensors and structural sparsity
- Ada Lovelace (RTX 40 series): 3rd-gen RT cores, shader efficiency
- Hopper (H100): Transformer engine, thread block clusters
Detection Includes:
- Compute capability and SM count
- Tensor core availability and generation
- Memory bandwidth and capacity
- Multi-GPU configurations
User Preference Integration
Experience Level:
beginner
: Focus on correctness and learningintermediate
: Balance performance and readabilityexpert
: Aggressive optimizations and advanced features
Primary Use Case:
machine_learning
: Tensor operations, mixed precisionscientific_computing
: Double precision, memory bandwidthgraphics
: RT cores, rasterization pipelinesgeneral
: Balanced approach across domains
Optimization Focus:
performance
: Maximum throughput and speedmemory
: Minimize memory usage and bandwidthpower
: Energy-efficient implementationsbalanced
: Compromise across all factors
Example Generated Content
# CUDA Development Guidelines
# Auto-generated for Windows on 1/4/2025
**Hardware:** RTX 4090 (Ada Lovelace) | CUDA 12.3
**User Profile:** Intermediate developer | machine_learning focus | balanced optimization
**Key Guidelines:**
- Optimize memory coalescing and shared memory usage
- Balance occupancy with register pressure
- Prioritize tensor operations and mixed precision when available
- Your Ada Lovelace GPU supports 3rd-gen tensor cores for AI workloads
## My Preferences
# Add your specific guidelines here:
# - Coding style (e.g., "prefer explicit memory management")
# - Performance targets (e.g., "optimize for latency < 10ms")
# - Project constraints (e.g., "limited to 4GB GPU memory")
# - Architecture preferences (e.g., "focus on Ampere features")
Integration with AI Features
Chat Assistant
System messages include your .rightnowrules
as context for all conversations
Code Completion
Autocomplete respects your guidelines and coding preferences
Quick Edit
Ctrl+K editing follows your optimization priorities and style
Multi-Workspace Support
- Per-Workspace Rules: Each workspace folder can have its own
.rightnowrules
- Rule Combination: Multiple workspace folders combine their rules intelligently
- Context Switching: AI automatically adapts when switching between projects
Profiling Data Persistence
RightNow AI maintains comprehensive profiling history in .rightnow/profiling/kernels.json
for tracking optimization progress across sessions.
File Structure
.rightnow/
└── profiling/
└── kernels.json # Persistent profiling database
What Gets Stored
Comprehensive Metrics
Core Performance Data:
- Execution time and GPU utilization
- Memory throughput and occupancy
- SM efficiency and warp efficiency
- Cache hit rates (L1/L2) and register usage
Advanced Analytics:
- Branch efficiency and instruction replay overhead
- Global/shared memory efficiency
- Temperature and power consumption
- Roofline analysis (compute vs memory bound)
Historical Tracking
Session Management:
- Multiple profiling sessions per kernel
- Timestamps for optimization timeline
- Performance trend analysis
- Before/after comparison data
Content-Based Keys:
- Stable kernel identification across code changes
- Preserves history when line numbers change
- Detects real kernel modifications vs. cosmetic edits
AI Recommendations
NCU Integration:
- Official NVIDIA Nsight Compute recommendations
- Architecture-specific optimization suggestions
- Bottleneck identification and solutions
- Performance improvement tracking
Example Data Structure
{
"version": "1.0",
"lastUpdated": "2024-01-31T10:00:00Z",
"kernels": {
"file:///C:/projects/matmul.cu:matrixMul:a1b2c3:L45": {
"kernelName": "matrixMul",
"sourceFile": "matmul.cu",
"lineNumber": 45,
"sessions": [
{
"executionTime": 12.5,
"smEfficiency": 85.2,
"memoryThroughput": 450.8,
"occupancy": 68.4,
"l1CacheHitRate": 89.3,
"recommendations": [
"Consider increasing occupancy by reducing register usage",
"Memory access pattern is well coalesced"
],
"timestamp": 1706698800000,
"performance": "fast"
}
]
}
}
}
Optimization Tracking
View performance improvements over time and identify regression points
Session Persistence
Profiling data survives editor restarts and code modifications
Team Collaboration
Share profiling data across team members for collaborative optimization
Smart Identification
Content-based kernel IDs preserve history through code changes
Configuration
Both features work automatically with minimal setup:
User Preferences (Settings → RightNow AI):
cudaExperienceLevel
: Beginner, Intermediate, ExpertcudaPrimaryUseCase
: ML, Scientific, Graphics, GeneralcudaOptimizationFocus
: Performance, Memory, Power, Balanced
File Management:
.rightnowrules
: Version control recommended for team guidelines.rightnow/profiling/
: Add to.gitignore
for personal profiling data
Pro tip: Commit .rightnowrules
to share team coding guidelines, but keep .rightnow/profiling/
local for individual optimization tracking.