The NVIDIA GeForce GTX 1660 Ti features Turing architecture without Tensor Cores or ray tracing, offering 1,536 CUDA cores and 6GB GDDR6 memory. As a pure compute card, it provides decent FP32 performance at very low used prices, though the lack of Tensor Cores makes it unsuitable for modern ML workloads. For extreme budget CUDA learning, the GTX 1660 Ti represents one of the cheapest ways to access a modern CUDA architecture. However, without Tensor Cores and with only 6GB VRAM, it is severely limited for anything beyond basic CUDA programming and classical algorithms. This guide provides realistic expectations for using the GTX 1660 Ti in 2025 and identifies scenarios where it might still be appropriate.
| Architecture | Turing (TU116) |
| CUDA Cores | 1,536 |
| Tensor Cores | 0 |
| Memory | 6GB GDDR6 |
| Memory Bandwidth | 288 GB/s |
| Base / Boost Clock | 1500 / 1770 MHz |
| FP32 Performance | 5.4 TFLOPS |
| FP16 Performance | 0.17 TFLOPS |
| L2 Cache | 1.5MB |
| TDP | 120W |
| NVLink | No |
| MSRP | $279 |
| Release | February 2019 |
This code snippet shows how to detect your GTX 1660 Ti, check available memory, and configure optimal settings for the Turing (TU116) architecture.
import torch
import pynvml
# Check if GTX 1660 Ti is available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {torch.cuda.get_device_name(0)}")
# GTX 1660 Ti Memory: 6GB - Optimal batch sizes
# Architecture: Turing (TU116)
# CUDA Cores: 1,536
# Memory-efficient training for GTX 1660 Ti
torch.backends.cuda.matmul.allow_tf32 = True # Enable TF32 for Turing (TU116)
torch.backends.cudnn.allow_tf32 = True
# Check available memory
pynvml.nvmlInit()
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
info = pynvml.nvmlDeviceGetMemoryInfo(handle)
print(f"Free memory: {info.free / 1024**3:.1f} GB / 6 GB total")
# Recommended batch size calculation for GTX 1660 Ti
model_memory_gb = 2.0 # Adjust based on your model
batch_multiplier = (6 - model_memory_gb) / 4 # 4GB per batch unit
recommended_batch = int(batch_multiplier * 32)
print(f"Recommended batch size for GTX 1660 Ti: {recommended_batch}")| Task | Performance | Comparison |
|---|---|---|
| ResNet-50 Training FP32 (imgs/sec) | 110 | FP32 only, very slow |
| cuBLAS SGEMM 2048x2048 (TFLOPS) | 5.1 | 94% of theoretical peak |
| Memory Bandwidth (GB/s measured) | 270 | 94% of theoretical peak |
| Classical CUDA algorithms | Fair | Acceptable FP32 compute |
| Any DL workload | Poor | No Tensor Cores, not recommended |
| Stable Diffusion | Unusable | 25+ seconds, not practical |
| Use Case | Rating | Notes |
|---|---|---|
| Learning Basic CUDA | Fair | 6GB limiting even for basics |
| Classical CUDA Algorithms | Fair | Acceptable FP32 performance |
| ML Training | Poor | No Tensor Cores, extremely slow |
| ML Inference | Poor | No acceleration, too slow |
| Extreme Budget | Fair | Cheapest modern CUDA option ($100-130) |
| Any Modern Workflow | Poor | Insufficient for current needs |
Not recommended. With no Tensor Cores and only 6GB VRAM, ML training is extremely slow (5-10x slower than cards with Tensor Cores). Only consider if you have absolutely no alternative.
Fair for learning basic CUDA programming (kernels, memory management, etc). However, you cannot learn modern ML workflows that require Tensor Cores. For $50-100 more, RTX 2060 Super or 3060 Ti is vastly better.
Classical CUDA algorithms, basic GPU programming learning, general compute tasks, video encoding. Not suitable for ML, DL, or any Tensor Core-dependent workflows.
Save for RTX 3060 12GB. The difference in ML capability is enormous (Tensor Cores, TF32, 12GB VRAM). The GTX 1660 Ti only makes sense if your budget is under $130 and you only need basic CUDA learning.
Tensor Cores, 12GB, vastly better for ML
Has Tensor Cores, similar price used
Slightly slower, even cheaper
Much better, worth saving for
Ready to optimize your CUDA kernels for GTX 1660 Ti? Download RightNow AI for real-time performance analysis.