Jacobi iteration solves Ax=b by: x^(k+1) = D^(-1)(b - (L+U)x^(k)) where A=D+L+U. Each component updates independently, making it embarrassingly parallel. While slow to converge as a standalone solver, Jacobi is valuable as a smoother in multigrid and as a simple preconditioner.
Use x_new = (1-ω)x + ω*x_jacobi for stability.
Combine diagonal scaling and update in single kernel.
Optimal ω=2/3 for Poisson equation smoothing.
Dense Jacobi iteration - inefficient for sparse matrices.
__global__ void jacobi_naive(float* A, float* b, float* x, float* x_new, int n) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i >= n) return;
float sum = b[i];
float diag = A[i * n + i];
for (int j = 0; j < n; j++) {
if (j != i) {
sum -= A[i * n + j] * x[j];
}
}
x_new[i] = sum / diag;
}
void jacobi_solve(float* d_A, float* d_b, float* d_x, int n, int max_iter, float tol) {
float* d_x_new;
cudaMalloc(&d_x_new, n * sizeof(float));
for (int iter = 0; iter < max_iter; iter++) {
jacobi_naive<<<(n+255)/256, 256>>>(d_A, d_b, d_x, d_x_new, n);
cudaMemcpy(d_x, d_x_new, n * sizeof(float), D2D);
// Check convergence (expensive)
float residual = compute_residual(d_A, d_b, d_x, n);
if (residual < tol) break;
}
}Sparse damped Jacobi with precomputed diagonal inverse for smoothing.
__global__ void jacobi_sparse_damped(int* rowPtr, int* colIdx, float* vals,
float* diag_inv, float* b, float* x, float* x_new,
int n, float omega) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i >= n) return;
float sum = b[i];
// Sparse row iteration
for (int k = rowPtr[i]; k < rowPtr[i+1]; k++) {
int j = colIdx[k];
if (j != i) {
sum -= vals[k] * x[j];
}
}
float x_jacobi = sum * diag_inv[i];
// Damped update: x_new = (1-ω)x + ω*x_jacobi
x_new[i] = (1.0f - omega) * x[i] + omega * x_jacobi;
}
void jacobi_smoother(cusparseHandle_t sparse, int* d_rowPtr, int* d_colIdx, float* d_vals,
float* d_diag_inv, float* d_b, float* d_x, int n, int sweeps, float omega) {
float* d_x_new;
cudaMalloc(&d_x_new, n * sizeof(float));
for (int s = 0; s < sweeps; s++) {
jacobi_sparse_damped<<<(n+255)/256, 256>>>(
d_rowPtr, d_colIdx, d_vals, d_diag_inv, d_b, d_x, d_x_new, n, omega);
// Swap pointers
float* tmp = d_x; d_x = d_x_new; d_x_new = tmp;
}
}| Metric | Naive | Optimized | Improvement |
|---|---|---|---|
| Convergence (Poisson) | ρ ≈ 0.99 (slow) | ρ ≈ 0.97 (ω=2/3) | 3x faster convergence |
| As smoother (3 sweeps) | 3.5ms | 3.5ms | Same (parallel) |
| vs Gauss-Seidel | Fully parallel | Fully parallel | Better parallelism |
Use Jacobi as: (1) smoother in multigrid (2-3 sweeps), (2) simple preconditioner (diagonal scaling), (3) parallel baseline for comparison. Do not use as standalone solver - CG/BiCGSTAB are much faster.
For Poisson equation: ω=2/3 is optimal for smoothing. For general SPD: ω = 2/(λ_max + λ_min) is optimal, but λ estimation is expensive. Start with ω=0.8 and adjust. For convergence guarantee, need ω < 2/ρ(D^(-1)(L+U)).
Ready to optimize your CUDA code? Download RightNow AI and get real-time performance analysis for your kernels.