The condition number κ(A) = σ_max / σ_min measures how sensitive a linear system Ax=b is to perturbations. High condition numbers (κ > 10^6 for float32) indicate ill-conditioning where small input changes cause large output changes. Computing exact condition number requires both extreme singular values. For large matrices, estimation methods provide useful bounds at lower cost.
Use power iteration for σ_max and inverse iteration for σ_min.
Use LAPACK-style 1-norm condition estimator without full SVD.
Approximate extreme singular values via random projections.
Full SVD computes all singular values when only two are needed.
float condition_number_svd(cusolverDnHandle_t handle, float* d_A, int m, int n) {
int min_mn = min(m, n);
float* d_S;
cudaMalloc(&d_S, min_mn * sizeof(float));
cusolverDnSgesvd(handle, 'N', 'N', m, n, d_A, m, d_S, NULL, m, NULL, n, ...);
float sigma_max, sigma_min;
cudaMemcpy(&sigma_max, d_S, sizeof(float), D2H);
cudaMemcpy(&sigma_min, d_S + min_mn - 1, sizeof(float), D2H);
return sigma_max / sigma_min;
}Estimates condition via 1-norm without computing full SVD.
// Estimate condition number without full SVD
float estimate_condition(cublasHandle_t handle, float* d_A, float* d_LU,
int* d_pivot, int n) {
// Compute ||A||_1 (max column sum)
float norm_A = matrix_1_norm(d_A, n);
// LU factorization
cublasSgetrfBatched(handle, n, &d_A, n, d_pivot, &info, 1);
// Estimate ||A^{-1}||_1 via iterative method (LAPACK dlacon)
float norm_Ainv = estimate_inverse_1_norm(handle, d_LU, d_pivot, n);
return norm_A * norm_Ainv;
}
__global__ void matrix_1_norm_kernel(float* A, int n, float* col_sums) {
int col = blockIdx.x * blockDim.x + threadIdx.x;
if (col >= n) return;
float sum = 0;
for (int row = 0; row < n; row++) {
sum += fabsf(A[row + col * n]);
}
col_sums[col] = sum;
}| Metric | Naive | Optimized | Improvement |
|---|---|---|---|
| 4096x4096 matrix | 1.9s (full SVD) | 120ms (estimation) | 16x faster |
| Accuracy vs exact | Exact | Within 10x of true value | Trade-off |
Rule of thumb: κ > 1/ε where ε is machine precision. For float32 (ε ≈ 1e-7), κ > 1e7 is problematic. For float64 (ε ≈ 1e-16), κ > 1e16. Values above these lose all significant digits.
Options: (1) Preconditioning - multiply by approximate inverse, (2) Scaling - equilibrate rows/columns, (3) Regularization - add small diagonal (Tikhonov), (4) Use higher precision for critical operations.
Ready to optimize your CUDA code? Download RightNow AI and get real-time performance analysis for your kernels.