SOR accelerates Gauss-Seidel by over-relaxation: x_new = ω*x_GS + (1-ω)*x_old. With optimal ω, convergence is O(1/n) vs O(1/n²) for Gauss-Seidel. Finding optimal ω requires eigenvalue estimation. For 2D Poisson: ω_opt = 2/(1 + sin(π/n)).
Use power iteration to estimate spectral radius for ω_opt.
Same coloring as Gauss-Seidel with SOR update.
Symmetric SOR as preconditioner for CG.
Sequential SOR with no parallelism.
void sor_naive(float* d_A, float* d_b, float* d_x, int n, float omega, int max_iter) {
for (int iter = 0; iter < max_iter; iter++) {
sor_sweep<<<1, 1>>>(d_A, d_b, d_x, n, omega); // Sequential!
}
}
__global__ void sor_sweep(float* A, float* b, float* x, int n, float omega) {
for (int i = 0; i < n; i++) {
float sigma = 0;
for (int j = 0; j < n; j++) {
if (j != i) sigma += A[i*n + j] * x[j];
}
float x_gs = (b[i] - sigma) / A[i*n + i];
x[i] = omega * x_gs + (1 - omega) * x[i];
}
}Red-black SOR with optimal ω for Poisson equation.
__global__ void sor_red(float* x, float* b, float omega, float h2_inv, int nx, int ny) {
int i = blockIdx.x * blockDim.x + threadIdx.x + 1;
int j = blockIdx.y * blockDim.y + threadIdx.y + 1;
if (i < nx-1 && j < ny-1 && (i+j) % 2 == 0) {
int idx = j * nx + i;
float x_gs = 0.25f * (x[idx-1] + x[idx+1] + x[idx-nx] + x[idx+nx] - h2_inv * b[idx]);
x[idx] = omega * x_gs + (1 - omega) * x[idx];
}
}
__global__ void sor_black(float* x, float* b, float omega, float h2_inv, int nx, int ny) {
int i = blockIdx.x * blockDim.x + threadIdx.x + 1;
int j = blockIdx.y * blockDim.y + threadIdx.y + 1;
if (i < nx-1 && j < ny-1 && (i+j) % 2 == 1) {
int idx = j * nx + i;
float x_gs = 0.25f * (x[idx-1] + x[idx+1] + x[idx-nx] + x[idx+nx] - h2_inv * b[idx]);
x[idx] = omega * x_gs + (1 - omega) * x[idx];
}
}
float compute_optimal_omega(int n) {
// For 2D Poisson equation
return 2.0f / (1.0f + sinf(M_PI / n));
}| Metric | Naive | Optimized | Improvement |
|---|---|---|---|
| Convergence (Poisson 128²) | 1650 iters (GS) | 127 iters (SOR ω=1.93) | 13x fewer |
| GPU parallel efficiency | 0% | 50% (red-black) | Usable |
For model problems (Poisson): ω_opt = 2/(1+sin(πh)). For general matrices: estimate ρ(G_J) via power iteration, then ω_opt = 2/(1+sqrt(1-ρ²)). If unknown, use ω=1.5-1.8 as starting point.
Symmetric SOR: apply SOR forward then backward. SSOR is symmetric, so it can precondition CG. SSOR preconditioner: M = (D+ωL)D^{-1}(D+ωU)/ω(2-ω). Common choice for simple preconditioning.
Ready to optimize your CUDA code? Download RightNow AI and get real-time performance analysis for your kernels.