The NVIDIA GeForce RTX 2080 Super represents the mid-range Turing lineup with 3,072 CUDA cores and 8GB GDDR6 memory. As a previous-generation card, it provides basic Tensor Core functionality without modern features like TF32, making it best suited for budget CUDA work in the used market. For CUDA developers on tight budgets, the RTX 2080 Super offers reasonable FP32 compute and first-generation Tensor Cores at attractive used prices. While significantly slower than Ampere or Ada cards for ML workloads, it remains viable for inference and traditional CUDA programming. This guide covers realistic performance expectations, optimization strategies for Turing architecture, and scenarios where the RTX 2080 Super still makes sense in 2025.
| Architecture | Turing (TU104) |
| CUDA Cores | 3,072 |
| Tensor Cores | 384 |
| Memory | 8GB GDDR6 |
| Memory Bandwidth | 496 GB/s |
| Base / Boost Clock | 1650 / 1815 MHz |
| FP32 Performance | 11.2 TFLOPS |
| FP16 Performance | 22.3 TFLOPS |
| L2 Cache | 4MB |
| TDP | 250W |
| NVLink | No |
| MSRP | $699 |
| Release | July 2019 |
This code snippet shows how to detect your RTX 2080 Super, check available memory, and configure optimal settings for the Turing (TU104) architecture.
import torch
import pynvml
# Check if RTX 2080 Super is available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {torch.cuda.get_device_name(0)}")
# RTX 2080 Super Memory: 8GB - Optimal batch sizes
# Architecture: Turing (TU104)
# CUDA Cores: 3,072
# Memory-efficient training for RTX 2080 Super
torch.backends.cuda.matmul.allow_tf32 = True # Enable TF32 for Turing (TU104)
torch.backends.cudnn.allow_tf32 = True
# Check available memory
pynvml.nvmlInit()
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
info = pynvml.nvmlDeviceGetMemoryInfo(handle)
print(f"Free memory: {info.free / 1024**3:.1f} GB / 8 GB total")
# Recommended batch size calculation for RTX 2080 Super
model_memory_gb = 2.0 # Adjust based on your model
batch_multiplier = (8 - model_memory_gb) / 4 # 4GB per batch unit
recommended_batch = int(batch_multiplier * 32)
print(f"Recommended batch size for RTX 2080 Super: {recommended_batch}")| Task | Performance | Comparison |
|---|---|---|
| ResNet-50 Training FP16 (imgs/sec) | 380 | Requires explicit mixed precision |
| BERT-Base Inference FP16 (sentences/sec) | 640 | Usable for inference |
| Stable Diffusion (512x512, sec/img) | 12.5 | Slow vs modern cards |
| cuBLAS SGEMM 4096x4096 (TFLOPS) | 10.6 | 95% of theoretical peak |
| Memory Bandwidth (GB/s measured) | 466 | 94% of theoretical peak |
| LLaMA-7B Inference (tokens/sec) | 14 | Very limited, needs quantization |
| Use Case | Rating | Notes |
|---|---|---|
| Learning CUDA | Fair | Budget option but outdated architecture |
| Classical CUDA | Good | Good FP32 for traditional compute |
| Budget Inference | Fair | Works with FP16/INT8 |
| Small Model Training | Poor | Very slow without TF32/BF16 |
| Modern DL Training | Poor | Not recommended for serious ML |
| Extreme Budget | Good | Cheapest option with Tensor Cores |
Only at very low prices (under $200). Better options exist in the $250-300 range like RTX 3060 Ti. The 2080 Super lacks TF32 and has slow Tensor Cores, making it poor for modern ML.
Yes, for basic CUDA programming and classical algorithms. However, you will miss modern features like TF32 and FP8 that are important for current ML workflows. Better to spend a bit more for RTX 3060 Ti.
RTX 3060 is much better for ML. It has TF32, better Tensor Cores, and 12GB VRAM vs 8GB. Choose RTX 3060 unless the 2080 Super is essentially free.
All major frameworks support Turing (CC 7.5). However, you miss performance benefits of TF32 and modern mixed precision. Works fine but slow for ML compared to Ampere or Ada.
Much better for ML, 12GB VRAM
Faster, TF32 support, better value
Slightly slower, cheaper used
No Tensor Cores but very cheap
Ready to optimize your CUDA kernels for RTX 2080 Super? Download RightNow AI for real-time performance analysis.