Comparing FFT Implementations in Clojure
Exploration from the Scicloj DSP Study Group
Welcome! This exploration comes from our DSP study group, where we’re learning digital signal processing together using Clojure. We’re following the excellent book Think DSP by Allen B. Downey (available free online).
Introduction
The Fast Fourier Transform (FFT) is fundamental to signal processing, audio analysis, image compression, and scientific computing. If you’re working with frequency analysis in Clojure, you have several Java FFT libraries to choose from.
This post compares four approaches:
- Apache Commons Math - Mature library with general mathematical transforms
- JDSP - Digital signal processing library (uses Apache Commons Math internally)
- JTransforms - First multithreaded, pure-Java FFT library
- Fastmath - Clojure math library (wraps JTransforms with idiomatic API)
We’ll compute the FFT of the same test signal using each library, measure performance, and discuss their trade-offs.
The Libraries
Apache Commons Math
Apache Commons Math is a comprehensive mathematical library for Java. It provides FastFourierTransformer with various transform types and normalization options.
Key features:
- Part of Apache Commons (widely used, stable, maintained)
- Supports multiple normalization conventions
- 1D FFT, DCT/DST, Hadamard transforms
- Requires signal length to be a power of 2
- Returns
Complex[]arrays (allocation overhead)
JDSP
JDSP by Sambit Paul is a Java Digital Signal Processing library providing filters, transforms, peak detection, and more. It uses Apache Commons Math internally for FFT computation.
Key features:
- Convenient wrapper around Apache Commons Math
- Simple API:
new FastFourier(signal).transform() - Includes comprehensive DSP utilities (filters, wavelets, convolution, STFT)
- Good for projects needing broader DSP functionality beyond FFT
JTransforms
JTransforms by Piotr Wendykier is the first open-source, multithreaded FFT library written in pure Java. It’s optimized for performance with parallel processing support.
Key features:
- Parallelized split-radix and mixed-radix algorithms
- Supports 1D, 2D, and 3D transforms (FFT, DCT, DST, DHT)
- In-place mutations (efficient but not functional)
- Mixed-radix support: works with arbitrary sizes (not just power-of-2)
- Used internally by Fastmath and dtype-next
Fastmath
fastmath (version 3.x) by Tomasz Sulej is a Clojure library for fast primitive-based mathematics. Its fastmath.transform namespace wraps JTransforms with an idiomatic Clojure API.
Key features:
- Immutable, functional API (no in-place mutations)
- Leverages JTransforms’ parallelized performance and mixed-radix algorithms
- Protocol-based design:
transformer→forward-1d/reverse-1d - Supports 1D and 2D transforms, multiple transform types (FFT, DCT, DST, DHT)
- Additional signal processing utilities (wavelets, oscillators, envelopes, noise)
Test Signal: Two-Tone Sine Wave
We’ll create a simple test signal: a combination of two sine waves at 5 Hz and 12 Hz, sampled at 100 Hz for 1 second.
Hz
(def sample-rate 100.0)seconds
(def duration 1.0)Power of 2 (required by Apache Commons Math)
(def n-samples 128)(def time-points
(dfn/* (range n-samples) (/ duration n-samples)))(defn generate-test-signal
"Generate a two-tone sine wave (5 Hz + 12 Hz) test signal.
Returns a Java double array suitable for FFT libraries."
[n-samples]
;; Time vector in seconds: [0, 1/sample-rate, 2/sample-rate, ...]
(let [t (dfn/* (range n-samples) (/ 1.0 sample-rate))]
(dtype/->double-array
(dfn/+ (dfn/sin (dfn/* 2.0 Math/PI 5.0 t))
(dfn/sin (dfn/* 2.0 Math/PI 12.0 t))))))Convert to Java double array for FFT libraries (dtype-next functional ops work on sequences, but Java FFT expects double[])
(def signal (generate-test-signal n-samples))Let’s visualize the signal:
(-> (tc/dataset {:time (take 128 time-points)
:amplitude (take 128 signal)})
(plotly/base {:=x :time
:=y :amplitude
:=x-title "Time (s)"
:=y-title "Amplitude"
:=title "Test Signal: 5 Hz + 12 Hz"
:=width 700
:=height 300})
(plotly/layer-line {:=mark-color "steelblue"}))Visualization Helper
To visualize FFT results throughout this post, we’ll use a helper function:
(defn plot-fft-spectrum
"Plot FFT magnitude spectrum.
Parameters:
- magnitudes: sequence of magnitude values
- title: plot title
- color: line color (e.g., 'darkgreen', 'darkorange')
Returns Plotly visualization."
[magnitudes title color]
(let [n-bins (count magnitudes)
freq-bins (dfn/* (range n-bins) (/ sample-rate n-samples))]
(-> (tc/dataset {:frequency freq-bins
:magnitude magnitudes})
(plotly/base {:=x :frequency
:=y :magnitude
:=x-title "Frequency (Hz)"
:=y-title "Magnitude"
:=title title
:=width 700
:=height 300})
(plotly/layer-line {:=mark-color color}))))FFT Implementation #1: Apache Commons Math
Apache Commons Math requires creating a FastFourierTransformer with a normalization type, then calling transform with the signal.
(defn fft-apache-commons
"Compute FFT using Apache Commons Math."
[^doubles signal]
(let [transformer (FastFourierTransformer. DftNormalization/STANDARD)]
; Result is Complex[] array
(.transform transformer signal TransformType/FORWARD)))Let’s run it and measure time:
(def commons-result
(time (fft-apache-commons signal)))"Elapsed time: 3.297995 msecs"
The output is an array of Complex objects. To extract magnitudes:
(defn complex-magnitude
"Compute magnitude from Apache Commons Math Complex."
[^Complex c]
(.abs c))(def commons-magnitudes
(mapv complex-magnitude commons-result))Visualize the first half (positive frequencies):
(plot-fft-spectrum
(take (quot (count commons-magnitudes) 2) commons-magnitudes)
"FFT Spectrum (Apache Commons Math)"
"darkgreen")Note: You’ll notice the peaks around 5 Hz and 12 Hz aren’t perfectly sharp—energy spreads to neighboring frequency bins. This is called spectral leakage, which occurs when signal frequencies don’t align exactly with FFT bin frequencies. We’ll explore windowing techniques to address this in a future post.
FFT Implementation #2: JDSP
JDSP provides a wrapper around Apache Commons Math with a simpler API.
(defn fft-jdsp
"Compute FFT using JDSP."
[signal]
(let [fft (FastFourier. signal)]
(.transform fft)
; Get magnitude values (positive frequencies only)
(.getMagnitude fft true)))(def jdsp-result
(time (fft-jdsp signal)))"Elapsed time: 3.417683 msecs"
Visualize
(plot-fft-spectrum
jdsp-result
"FFT Spectrum (JDSP)"
"darkorange")FFT Implementation #3: JTransforms (Direct)
JTransforms mutates the input array in-place. The realForward method transforms the array into interleaved [real0, imag0, real1, imag1, ...] format.
(defn fft-jtransforms
"Compute FFT using JTransforms directly."
[signal]
; Copy signal to avoid mutating original
(let [fft-obj (DoubleFFT_1D. (long (count signal)))
signal-copy (double-array signal)]
(.realForward fft-obj signal-copy)
signal-copy))(def jtransforms-result
(time (fft-jtransforms signal)))"Elapsed time: 6.958577 msecs"
Extract magnitudes from interleaved format:
(defn jtransforms-magnitude
"Compute magnitudes from JTransforms interleaved [real, imag, ...] format."
[spectrum]
(let [n-bins (quot (count spectrum) 2)]
(mapv (fn [i]
(let [real (nth spectrum (* 2 i))
imag (nth spectrum (inc (* 2 i)))]
(Math/sqrt (+ (* real real) (* imag imag)))))
(range n-bins))))(def jtransforms-magnitudes
(jtransforms-magnitude jtransforms-result))Visualize
(plot-fft-spectrum
jtransforms-magnitudes
"FFT Spectrum (JTransforms)"
"purple")FFT Implementation #4: Fastmath
Fastmath provides the most Clojure-idiomatic API. Create a transformer, then use forward-1d.
(defn fft-fastmath
"Compute FFT using fastmath."
[signal]
(let [transformer (fm-transform/transformer :real :fft)]
(fm-transform/forward-1d transformer signal)))(def fastmath-result
(time (fft-fastmath signal)))"Elapsed time: 0.488341 msecs"
Extract magnitudes (Fastmath uses JTransforms format internally):
(defn fastmath-magnitude
"Compute magnitudes from fastmath FFT output."
[spectrum]
(let [n-bins (quot (dtype/ecount spectrum) 2)]
(mapv (fn [i]
(let [real (dtype/get-value spectrum (* 2 i))
imag (dtype/get-value spectrum (inc (* 2 i)))]
(Math/sqrt (+ (* real real) (* imag imag)))))
(range n-bins))))(def fastmath-magnitudes
(fastmath-magnitude fastmath-result))Visualize
(plot-fft-spectrum
fastmath-magnitudes
"FFT Spectrum (Fastmath)"
"crimson")Performance Comparison
To properly compare these libraries, we’ll use criterium for rigorous JVM benchmarking. Criterium handles JVM warmup, runs multiple iterations, and provides statistical analysis.
We’ll test two signal sizes: 128 samples (small, 2^7) and 131,072 samples (large, 2^17).
(defn benchmark-library
"Benchmark an FFT function using criterium."
[fft-fn signal]
(let [result (crit/quick-benchmark* (fn [] (fft-fn signal)) {})]
{:mean-ms (* (first (:mean result)) 1e3)
:lower-q-ms (* (first (:lower-q result)) 1e3)
:upper-q-ms (* (first (:upper-q result)) 1e3)}))Run benchmarks for both sizes
(def library-benchmarks
(for [size [128 131072]
[lib-name fft-fn] [["Apache Commons Math" fft-apache-commons]
["JDSP" fft-jdsp]
["JTransforms" fft-jtransforms]
["Fastmath" fft-fastmath]]]
(let [sig (generate-test-signal size)
result (benchmark-library fft-fn sig)]
(assoc result
:library lib-name
:size size))))Display results
(kind/table
(map (fn [{:keys [library size mean-ms lower-q-ms upper-q-ms]}]
{:library library
:size (if (= size 128) "128 (2^7)" "131K (2^17)")
:mean-time (format "%.6f ms" mean-ms)
:range (format "%.6f - %.6f ms" lower-q-ms upper-q-ms)})
library-benchmarks))| library | size | mean-time | range |
|---|---|---|---|
| Apache Commons Math | 128 (2^7) | 0.002549 ms | 0.001530 - 0.003715 ms |
| JDSP | 128 (2^7) | 0.002323 ms | 0.001799 - 0.003061 ms |
| JTransforms | 128 (2^7) | 0.003806 ms | 0.002108 - 0.005893 ms |
| Fastmath | 128 (2^7) | 0.003233 ms | 0.002548 - 0.003728 ms |
| Apache Commons Math | 131K (2^17) | 4.074209 ms | 3.636852 - 4.766877 ms |
| JDSP | 131K (2^17) | 6.113325 ms | 5.437638 - 6.811960 ms |
| JTransforms | 131K (2^17) | 5.189599 ms | 4.337465 - 7.382050 ms |
| Fastmath | 131K (2^17) | 4.924395 ms | 2.848666 - 7.202860 ms |
Exploring Parallelization: An Experiment
JTransforms advertises parallel processing support, which sounds promising for large FFTs. But does it actually help? Let’s explore.
Disclaimer: This section documents an exploratory experiment with uncertain conclusions. We’re learning as we go, and the results are puzzling!
How JTransforms Works: Three Algorithm Plans
Looking at the JTransforms source code, we discovered it uses three different execution plans:
- SPLIT_RADIX (Cooley-Tukey variant) - Used for power-of-2 sizes, supports multithreading
- MIXED_RADIX (Cooley-Tukey variant) - Used for factorable sizes (divisible by 2, 3, 4, 5), primarily sequential
- BLUESTEIN (Chirp Z-transform) - Used for arbitrary sizes with large prime factors, supports 1-4 threads
The plan is automatically selected in the constructor based on signal size:
- Power-of-2 → SPLIT_RADIX (fastest, our benchmarks use this)
- Factorable by small primes → MIXED_RADIX
- Large prime factors (remainder ≥ 211) → BLUESTEIN
This explains why arbitrary sizes like 100, 127, 1000 work—they use MIXED_RADIX or BLUESTEIN!
The JTransforms paper (Wendykier & Nagy 2008) discusses these implementations, noting that 1D transforms can use at most 2 or 4 threads due to the algorithm’s structure.
Testing Thread Performance
Let’s test with power-of-2 sizes to ensure we’re using the fast SPLIT_RADIX plan. We’ll use criterium for proper JVM benchmarking (warmup, statistics, GC handling):
(import 'pl.edu.icm.jlargearrays.ConcurrencyUtils
'org.jtransforms.utils.CommonUtils)org.jtransforms.utils.CommonUtils(defn benchmark-with-threads
"Benchmark FFT at specific thread count using criterium for statistical analysis."
[n-threads signal]
(let [previous-threads (ConcurrencyUtils/getNumberOfThreads)]
(try
(ConcurrencyUtils/setNumberOfThreads n-threads)
;; Use criterium's quick-bench for proper JVM warmup and statistics
(let [result (crit/quick-benchmark* (fn [] (fft-fastmath signal)) {})]
{:threads n-threads
;; Criterium returns [value (lower-ci upper-ci)] for each metric
:mean-ms (* (first (:mean result)) 1e3) ; Convert seconds to milliseconds
:variance-ms (* (first (:variance result)) 1e6) ; Variance in ms^2
:lower-q-ms (* (first (:lower-q result)) 1e3)
:upper-q-ms (* (first (:upper-q result)) 1e3)})
(finally
(ConcurrencyUtils/setNumberOfThreads previous-threads)))))We’ll test multiple power-of-2 signal sizes and various thread counts (including 8 and 16, which the paper says are unsupported for 1D)
Experiment setup:
- Signal sizes: 16384 (2^14), 65536 (2^16), 131072 (2^17), 262144 (2^18)
- Thread counts: 1, 2, 4, 8, 16
- Each configuration benchmarked with criterium (JVM warmup, multiple samples, statistical analysis)
(def threading-experiment
(for [size [16384 65536 131072 262144]
threads [1 2 4 8 16]]
(let [sig (generate-test-signal size)
result (benchmark-with-threads threads sig)]
(assoc result :size size))))Let’s visualize the results:
(-> (tc/dataset threading-experiment)
(plotly/base {:=x :threads
:=y :mean-ms
:=color :size
:=color-type :nominal
:=title "FFT Performance vs Thread Count (Fastmath/JTransforms)"
:=x-title "Number of Threads"
:=y-title "Mean Time per FFT (ms)"
:=width 800
:=height 500})
(plotly/layer-point {:=mark-size 10})
(plotly/layer-line {:=mark-opacity 0.6}))What We Found (Spoiler: It’s Confusing!)
Looking at the visualization above, we see:
- No consistent speedup from adding threads
- Performance is similar across all thread counts for each signal size
- Performance scales linearly with signal size (not with thread count)
- Sometimes 1 thread is fastest, sometimes 2 or 4, sometimes even 16!
- The paper said 1D can only use 2-4 threads, but 8 and 16 threads work (they just don’t help)
Key observation: Thread count makes little practical difference for 1D FFT at these sizes.
Why Doesn’t Threading Help?
We’re honestly not sure! Here are some hypotheses:
1. Memory Bandwidth Bottleneck
FFT is likely memory-bound, not compute-bound. The algorithm reads and writes the entire array multiple times. All threads compete for the same memory bus, which doesn’t scale with CPU cores.
2. Signal Sizes Too Small
Even our largest test (262K samples, 2^18) might be below the threshold where parallelization becomes effective. The overhead of thread coordination could outweigh any benefits.
3. 1D FFT Algorithm Limitations
The Cooley-Tukey algorithm for 1D FFT has log₂(n) stages that must execute sequentially. For 131K samples, that’s only 17 stages—not much to parallelize. Amdahl’s Law limits speedup when much of the work is sequential.
4. JVM/JIT Complexity
Despite using criterium for proper warmup, there may be JIT compilation differences between thread configurations, or thread pool initialization effects we’re not accounting for.
Practical Takeaway
Based on our experiments: Don’t worry about thread count for 1D FFT. Use the default (typically matching your CPU cores), or even just 1 thread. The performance difference is negligible for typical signal sizes, and single-threaded is more predictable.
If you need good performance, focus on:
- Using power-of-2 signal sizes (triggers fast SPLIT_RADIX plan)
- Choosing an appropriate library (JTransforms/Fastmath are typically fastest for 1D, though differences aren’t huge)
- Optimizing your overall algorithm to minimize FFT calls
Note: 2D and 3D FFTs may benefit more from parallelization since they have more work to distribute. We only tested 1D here.
Machine Configuration and Reproducibility
Performance results depend heavily on your hardware. Here’s how to inspect your environment:
(def machine-info
{:jvm-version (System/getProperty "java.version")
:jvm-vendor (System/getProperty "java.vendor")
:jvm-name (System/getProperty "java.vm.name")
:heap-max-mb (/ (.maxMemory (Runtime/getRuntime)) 1024.0 1024.0)
:heap-total-mb (/ (.totalMemory (Runtime/getRuntime)) 1024.0 1024.0)
:os-name (System/getProperty "os.name")
:os-arch (System/getProperty "os.arch")
:os-version (System/getProperty "os.version")
:available-processors (.availableProcessors (Runtime/getRuntime))
:clojure-version (clojure-version)})(kind/table
[(assoc machine-info :property "Machine Configuration")])| jvm-vendor | available-processors | jvm-version | heap-max-mb | property | jvm-name | os-version | clojure-version | os-name | os-arch | heap-total-mb |
|---|---|---|---|---|---|---|---|---|---|---|
| Ubuntu | 22 | 21.0.9 | 15936.0 | Machine Configuration | OpenJDK 64-Bit Server VM | 6.14.0-36-generic | 1.12.3 | Linux | amd64 | 456.0 |
This benchmark ran on:
- CPU: 22 cores (Linux 6.14.0-36-generic, amd64)
- JVM: OpenJDK 21.0.9 (Ubuntu)
- Heap: 15,936 MB max
- Clojure: 1.12.3
Signal Size Requirements
Different libraries have different requirements for signal lengths:
Apache Commons Math: Power-of-2 Required
Apache Commons Math strictly requires signal length to be a power of 2 (128, 256, 512, 1024, etc.). This is because it only implements the classic Cooley-Tukey algorithm.
JTransforms/Fastmath: Flexible (with caveats)
Based on our source code investigation (see the threading section above), JTransforms uses three execution plans:
- SPLIT_RADIX (Cooley-Tukey variant) - Fast, multithreaded, requires power-of-2
- MIXED_RADIX (Cooley-Tukey variant) - For factorable sizes, primarily sequential
- BLUESTEIN (Chirp Z-transform) - For arbitrary sizes with large prime factors, supports 1-4 threads
When you provide a non-power-of-2 size, JTransforms uses MIXED_RADIX or BLUESTEIN depending on the size’s prime factorization. This means:
- ✅ It works with any size (we tested 100, 127, 1000 successfully)
- ⚠️ Performance may be slower for non-power-of-2 sizes
- ⚠️ MIXED_RADIX is primarily sequential; BLUESTEIN supports limited parallelization
Recommendation: Use power-of-2 sizes when possible for best performance (triggers SPLIT_RADIX). If your data doesn’t fit, zero-pad to the next power of 2, or use JTransforms/Fastmath with the understanding that a different plan will be selected.
Conclusions and Recommendations
After comparing these libraries, here are my recommendations:
For most Clojure projects: Fastmath 3
Fastmath provides a good balance of performance and developer experience:
- Idiomatic Clojure API (functional, immutable)
- Leverages JTransforms’ performance
- Rich ecosystem (transforms, signal processing, statistics)
- Actively maintained
The fastmath.transform namespace makes FFT feel natural in Clojure:
Round-trip example
(comment
(let [fft (fm-transform/transformer :real :fft)]
(-> signal
(fm-transform/forward-1d fft)
(fm-transform/reverse-1d fft))))For direct Java interop: JTransforms
If you want to use JTransforms directly and don’t mind mutation:
- Use
DoubleFFT_1Ddirectly - In-place mutations avoid allocations
- May be slightly faster than Fastmath wrapper in some cases
- Good for real-time audio processing or when you need fine-grained control
For broader DSP needs: JDSP
If you need filters, wavelets, peak detection, and FFT in one package:
- Convenient, batteries-included library
- Simple API
- Good documentation
- Note: FFT performance is somewhat slower than JTransforms/Fastmath
Apache Commons Math: Consider alternatives
While Commons Math is excellent for general mathematics, for FFT specifically you might prefer other options:
- Somewhat slower than JTransforms/Fastmath
- Returns boxed
Complex[]objects (allocation overhead) - Still a reasonable choice if you’re already using Commons Math for other features
Summary
The Clojure ecosystem offers several good FFT options through Java interop. For typical signal processing tasks, Fastmath 3 is a solid choice: JTransforms’ performance wrapped in a Clojure-friendly API. For direct control, use JTransforms directly. And if you need comprehensive DSP utilities beyond FFT, JDSP is worth exploring.
source: src/dsp/fft_comparison.clj