The spectral norm ||A||_2 = σ_max(A) (largest singular value) measures the maximum stretch of any vector by the matrix. It's crucial for spectral normalization in GANs, Lipschitz-constrained networks, and stability analysis. Unlike full SVD which costs O(mn²), the largest singular value can be found efficiently via power iteration in O(mn) per iteration, typically converging in 10-20 iterations.
Iteratively compute dominant singular value/vectors without full SVD.
Reuse u, v vectors from previous forward pass for faster convergence.
During training, one iteration per forward pass often suffices.
Full SVD computes all singular values when only the largest is needed.
float spectral_norm_svd(cusolverDnHandle_t handle, float* d_A, int m, int n) {
int min_mn = min(m, n);
float* d_S;
cudaMalloc(&d_S, min_mn * sizeof(float));
// Full SVD computation (wasteful!)
cusolverDnSgesvd(handle, 'N', 'N', m, n, d_A, m, d_S, NULL, m, NULL, n, ...);
float sigma_max;
cudaMemcpy(&sigma_max, d_S, sizeof(float), D2H);
return sigma_max;
}Power iteration finds σ_max in O(mn) per iteration, typically 10-20 iterations.
void spectral_norm_power(cublasHandle_t handle, float* d_W, int m, int n,
float* d_u, float* d_v, float* sigma, int iters) {
float *d_Wv, *d_WTu;
cudaMalloc(&d_Wv, m * sizeof(float));
cudaMalloc(&d_WTu, n * sizeof(float));
float alpha = 1.0f, beta = 0.0f;
for (int i = 0; i < iters; i++) {
// v = W^T u / ||W^T u||
cublasSgemv(handle, CUBLAS_OP_T, m, n, &alpha, d_W, m, d_u, 1, &beta, d_WTu, 1);
float norm_v;
cublasSnrm2(handle, n, d_WTu, 1, &norm_v);
float inv = 1.0f / norm_v;
cublasSscal(handle, n, &inv, d_WTu, 1);
cudaMemcpy(d_v, d_WTu, n * sizeof(float), D2D);
// u = W v / ||W v||
cublasSgemv(handle, CUBLAS_OP_N, m, n, &alpha, d_W, m, d_v, 1, &beta, d_Wv, 1);
float norm_u;
cublasSnrm2(handle, m, d_Wv, 1, &norm_u);
inv = 1.0f / norm_u;
cublasSscal(handle, m, &inv, d_Wv, 1);
cudaMemcpy(d_u, d_Wv, m * sizeof(float), D2D);
*sigma = norm_u;
}
}| Metric | Naive | Optimized | Improvement |
|---|---|---|---|
| 1024x1024 matrix | 85ms (full SVD) | 0.8ms (10 power iters) | 106x faster |
| Training iteration (warm start) | 0.8ms (10 iters) | 0.09ms (1 iter) | 9x faster |
For random init, 10-20 iterations. With warm start during training, 1 iteration per forward pass often suffices since weights change slowly.
Spectral normalization divides weights by their spectral norm: W_sn = W / σ(W). This constrains Lipschitz constant to 1, stabilizing GAN training. Apply to discriminator weights.
Different matrix norm, cheaper to compute
Ratio of largest to smallest singular value
Ready to optimize your CUDA code? Download RightNow AI and get real-time performance analysis for your kernels.