Sparse matrices have mostly zero entries, common in graph algorithms, PDEs, and network problems. Direct sparse solvers use specialized factorizations; iterative solvers use matrix-vector products. cuSPARSE provides sparse BLAS operations; cuSOLVER provides sparse direct solvers. Choice depends on matrix structure, sparsity pattern, and number of solves.
Sparse LU/Cholesky for direct solve with fill-in minimization.
Incomplete LU for fast approximate solve in iterative methods.
AMD/RCM reordering to minimize fill-in during factorization.
Dense conversion destroys sparsity benefits—O(n²) memory and O(n³) time.
// Converting sparse to dense - terrible for large sparse matrices!
void sparse_solve_naive(int* rowPtr, int* colIdx, float* values,
float* b, float* x, int n, int nnz) {
// Convert to dense (O(n²) memory!)
float* d_A_dense;
cudaMalloc(&d_A_dense, n * n * sizeof(float));
sparse_to_dense(rowPtr, colIdx, values, d_A_dense, n);
// Dense solve (O(n³) time!)
cusolverDnSgetrf(...);
cusolverDnSgetrs(...);
}Sparse direct solve exploits structure; iterative with ILU for very large systems.
void sparse_solve_direct(cusolverSpHandle_t solver,
int n, int nnz,
int* d_rowPtr, int* d_colIdx, float* d_values,
float* d_b, float* d_x) {
cusparseMatDescr_t descr;
cusparseCreateMatDescr(&descr);
// Analyze sparsity pattern (do once, reuse for multiple solves)
csrluInfo_t info;
cusolverSpCreateCsrluInfo(&info);
cusolverSpXcsrluAnalysis(solver, n, nnz, descr, d_rowPtr, d_colIdx, info);
// Get buffer size
size_t buffer_size;
cusolverSpScsrluBufferInfo(solver, n, nnz, descr, d_values, d_rowPtr, d_colIdx,
info, &buffer_size, &buffer_size);
void* d_buffer;
cudaMalloc(&d_buffer, buffer_size);
// Numeric factorization
cusolverSpScsrluFactor(solver, n, nnz, descr, d_values, d_rowPtr, d_colIdx,
info, 1e-12, d_buffer);
// Solve
cusolverSpScsrluSolve(solver, n, d_b, d_x, info, d_buffer);
}
// Iterative solve with ILU preconditioner
void sparse_solve_iterative(cusparseHandle_t sparse, cublasHandle_t blas,
int n, int nnz, int* d_rowPtr, int* d_colIdx, float* d_values,
float* d_b, float* d_x, int max_iter, float tol) {
// Setup ILU(0) preconditioner
csrilu02Info_t iluInfo;
cusparseCreateCsrilu02Info(&iluInfo);
cusparseScsrilu02(sparse, n, nnz, descr, d_values, d_rowPtr, d_colIdx, iluInfo, ...);
// Conjugate gradient with ILU preconditioning
pcg_solve(sparse, blas, d_rowPtr, d_colIdx, d_values, d_b, d_x, iluInfo, max_iter, tol);
}| Metric | Naive | Optimized | Improvement |
|---|---|---|---|
| n=100k, nnz=1M | OOM | 450ms (direct) | Feasible |
| n=1M, nnz=10M (5-point stencil) | OOM | 2.1s (CG+ILU) | Feasible |
| Memory n=100k sparse | 40GB (dense) | 50MB (sparse) | 800x less |
Direct: moderate size (n<100k), multiple RHS, need exact solution, good fill-in pattern. Iterative: very large (n>1M), single RHS, tolerance acceptable, matrix is SPD or has good spectrum.
CSR (Compressed Sparse Row) for row operations, SpMV. CSC for column operations. COO for construction/modification. CSR is most common for solving. Convert COO→CSR for computation.
Krylov methods for sparse systems
Classic iterative method for SPD sparse
Ready to optimize your CUDA code? Download RightNow AI and get real-time performance analysis for your kernels.