IFFT reconstructs time/spatial domain signals from frequency domain representation. It is mathematically the conjugate of FFT with normalization factor 1/N. cuFFT's IFFT does NOT normalize—you must divide by N after the transform.
Combine IFFT with normalization in single kernel.
Use 1/sqrt(N) for both forward and inverse.
Same cufftHandle works for forward and inverse.
IFFT without normalization gives incorrect magnitude.
void ifft_naive(cufftComplex* d_data, int n) {
cufftHandle plan;
cufftPlan1d(&plan, n, CUFFT_C2C, 1);
cufftExecC2C(plan, d_data, d_data, CUFFT_INVERSE);
cufftDestroy(plan);
// BUG: Missing normalization! Results are N times too large.
}Correct IFFT with explicit normalization by 1/N.
class FFT {
cufftHandle plan;
int n;
float* d_norm_factor;
public:
void init(int n_) {
n = n_;
cufftPlan1d(&plan, n, CUFFT_C2C, 1);
}
void forward(cufftComplex* d_in, cufftComplex* d_out) {
cufftExecC2C(plan, d_in, d_out, CUFFT_FORWARD);
}
void inverse(cufftComplex* d_in, cufftComplex* d_out) {
cufftExecC2C(plan, d_in, d_out, CUFFT_INVERSE);
normalize_complex<<<(n+255)/256, 256>>>(d_out, n, 1.0f / n);
}
};
__global__ void normalize_complex(cufftComplex* data, int n, float factor) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < n) {
data[i].x *= factor;
data[i].y *= factor;
}
}
// Alternative: symmetric normalization
void fft_symmetric(cufftComplex* d_data, int n, bool forward) {
cufftHandle plan;
cufftPlan1d(&plan, n, CUFFT_C2C, 1);
cufftExecC2C(plan, d_data, d_data, forward ? CUFFT_FORWARD : CUFFT_INVERSE);
normalize_complex<<<(n+255)/256, 256>>>(d_data, n, 1.0f / sqrtf((float)n));
cufftDestroy(plan);
}| Metric | Naive | Optimized | Improvement |
|---|---|---|---|
| IFFT 1M points | 0.8ms (exec) | 0.9ms (exec + normalize) | 12% overhead |
| Correctness | N times wrong | Correct | Essential |
Following FFTW convention: forward FFT has no normalization, inverse has no normalization. This allows flexible normalization schemes (1/N, 1/sqrt(N), or none for convolution). User must normalize appropriately for their use case.
Use 1/sqrt(N) on both forward and inverse when: (1) you want FFT to be unitary, (2) you are computing power spectral density, (3) you want Parseval equality to hold exactly. Most signal processing uses 1/N on inverse only.
Ready to optimize your CUDA code? Download RightNow AI and get real-time performance analysis for your kernels.