The NVIDIA GeForce GTX 1660 Super delivers Turing architecture without Tensor Cores, featuring 1,408 CUDA cores and 6GB GDDR6 memory. As one of the most affordable CUDA-capable GPUs in the used market, it provides basic compute functionality for extreme budget scenarios, though it lacks features necessary for modern ML workloads. For absolute minimal CUDA learning budgets, the GTX 1660 Super offers FP32 compute at rock-bottom prices. However, with no Tensor Cores, no ray tracing, and only 6GB VRAM, its utility is extremely limited for anything beyond basic CUDA programming concepts. This guide sets realistic expectations and identifies the narrow use cases where the GTX 1660 Super remains viable in 2025.
| Architecture | Turing (TU116) |
| CUDA Cores | 1,408 |
| Tensor Cores | 0 |
| Memory | 6GB GDDR6 |
| Memory Bandwidth | 336 GB/s |
| Base / Boost Clock | 1530 / 1785 MHz |
| FP32 Performance | 5 TFLOPS |
| FP16 Performance | 0.16 TFLOPS |
| L2 Cache | 1.5MB |
| TDP | 125W |
| NVLink | No |
| MSRP | $229 |
| Release | October 2019 |
This code snippet shows how to detect your GTX 1660 Super, check available memory, and configure optimal settings for the Turing (TU116) architecture.
import torch
import pynvml
# Check if GTX 1660 Super is available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {torch.cuda.get_device_name(0)}")
# GTX 1660 Super Memory: 6GB - Optimal batch sizes
# Architecture: Turing (TU116)
# CUDA Cores: 1,408
# Memory-efficient training for GTX 1660 Super
torch.backends.cuda.matmul.allow_tf32 = True # Enable TF32 for Turing (TU116)
torch.backends.cudnn.allow_tf32 = True
# Check available memory
pynvml.nvmlInit()
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
info = pynvml.nvmlDeviceGetMemoryInfo(handle)
print(f"Free memory: {info.free / 1024**3:.1f} GB / 6 GB total")
# Recommended batch size calculation for GTX 1660 Super
model_memory_gb = 2.0 # Adjust based on your model
batch_multiplier = (6 - model_memory_gb) / 4 # 4GB per batch unit
recommended_batch = int(batch_multiplier * 32)
print(f"Recommended batch size for GTX 1660 Super: {recommended_batch}")| Task | Performance | Comparison |
|---|---|---|
| ResNet-50 Training FP32 (imgs/sec) | 95 | Extremely slow, FP32 only |
| cuBLAS SGEMM 2048x2048 (TFLOPS) | 4.7 | 94% of theoretical peak |
| Memory Bandwidth (GB/s measured) | 315 | 94% of theoretical peak |
| Classical compute | Fair | Basic FP32 adequate |
| Any ML workload | Poor | No acceleration, not viable |
| Stable Diffusion | Unusable | 30+ seconds, impractical |
| Use Case | Rating | Notes |
|---|---|---|
| Learning CUDA Basics | Poor | Only if absolutely no budget |
| Classical Algorithms | Fair | Basic FP32 compute works |
| ML Training | Poor | Not viable without Tensor Cores |
| ML Inference | Poor | Too slow to be useful |
| Absolute Minimum Budget | Fair | Cheapest CUDA option ($90-110) |
| Any Serious Work | Poor | Insufficient for real workloads |
No. With no Tensor Cores, ML training is 5-10x slower than even entry-level RTX cards. Only consider if your budget is under $120 and you accept this is only for basic CUDA learning, not ML.
Only for basic CUDA programming syntax and concepts. You cannot learn modern ML workflows, mixed precision training, or Tensor Core programming. Strongly recommend saving for RTX 3060 12GB instead.
GTX 1660 Super at $90-110 used is the absolute minimum. However, for $200-250, RTX 3060 Ti provides vastly better learning experience with Tensor Cores, TF32, and 8GB VRAM.
Classical CUDA algorithms, basic GPU programming concepts, video encoding, and light general compute. Not suitable for machine learning, deep learning, or any modern AI workflows.
Tensor Cores, 12GB, proper ML capability
Has Tensor Cores, slightly more expensive
Slightly faster, similar limitations
Much better, worth saving for
Ready to optimize your CUDA kernels for GTX 1660 Super? Download RightNow AI for real-time performance analysis.