Comparing FFT Implementations in Clojure

Author

Affiliations

Published

December 16, 2025

Exploration from the Scicloj DSP Study Group

Welcome! This exploration comes from our DSP study group, where we’re learning digital signal processing together using Clojure. We’re following the excellent book Think DSP by Allen B. Downey (available free online).

Introduction

The Fast Fourier Transform (FFT) is fundamental to signal processing, audio analysis, image compression, and scientific computing. If you’re working with frequency analysis in Clojure, you have several Java FFT libraries to choose from.

This post compares four approaches:

Apache Commons Math - Mature library with general mathematical transforms
JDSP - Digital signal processing library (uses Apache Commons Math internally)
JTransforms - First multithreaded, pure-Java FFT library
Fastmath - Clojure math library (wraps JTransforms with idiomatic API)

We’ll compute the FFT of the same test signal using each library, measure performance, and discuss their trade-offs.

The Libraries

Apache Commons Math

Apache Commons Math is a comprehensive mathematical library for Java. It provides FastFourierTransformer with various transform types and normalization options.

Key features:

Part of Apache Commons (widely used, stable, maintained)
Supports multiple normalization conventions
1D FFT, DCT/DST, Hadamard transforms
Requires signal length to be a power of 2
Returns Complex[] arrays (allocation overhead)

JDSP

JDSP by Sambit Paul is a Java Digital Signal Processing library providing filters, transforms, peak detection, and more. It uses Apache Commons Math internally for FFT computation.

Key features:

Convenient wrapper around Apache Commons Math
Simple API: new FastFourier(signal).transform()
Includes comprehensive DSP utilities (filters, wavelets, convolution, STFT)
Good for projects needing broader DSP functionality beyond FFT

JTransforms

JTransforms by Piotr Wendykier is the first open-source, multithreaded FFT library written in pure Java. It’s optimized for performance with parallel processing support.

Key features:

Parallelized split-radix and mixed-radix algorithms
Supports 1D, 2D, and 3D transforms (FFT, DCT, DST, DHT)
In-place mutations (efficient but not functional)
Mixed-radix support: works with arbitrary sizes (not just power-of-2)
Used internally by Fastmath and dtype-next

Fastmath

fastmath (version 3.x) by Tomasz Sulej is a Clojure library for fast primitive-based mathematics. Its fastmath.transform namespace wraps JTransforms with an idiomatic Clojure API.

Key features:

Immutable, functional API (no in-place mutations)
Leverages JTransforms’ parallelized performance and mixed-radix algorithms
Protocol-based design: transformer → forward-1d/reverse-1d
Supports 1D and 2D transforms, multiple transform types (FFT, DCT, DST, DHT)
Additional signal processing utilities (wavelets, oscillators, envelopes, noise)

Test Signal: Two-Tone Sine Wave

We’ll create a simple test signal: a combination of two sine waves at 5 Hz and 12 Hz, sampled at 100 Hz for 1 second.

(def sample-rate 100.0)

seconds

(def duration 1.0)

Power of 2 (required by Apache Commons Math)

(def n-samples 128)

(def time-points
  (dfn/* (range n-samples) (/ duration n-samples)))

(defn generate-test-signal
  "Generate a two-tone sine wave (5 Hz + 12 Hz) test signal.
  Returns a Java double array suitable for FFT libraries."
  [n-samples]
  ;; Time vector in seconds: [0, 1/sample-rate, 2/sample-rate, ...]
  (let [t (dfn/* (range n-samples) (/ 1.0 sample-rate))]
    (dtype/->double-array
     (dfn/+ (dfn/sin (dfn/* 2.0 Math/PI 5.0 t))
            (dfn/sin (dfn/* 2.0 Math/PI 12.0 t))))))

Convert to Java double array for FFT libraries (dtype-next functional ops work on sequences, but Java FFT expects double[])

(def signal (generate-test-signal n-samples))

Let’s visualize the signal:

(-> (tc/dataset {:time (take 128 time-points)
                 :amplitude (take 128 signal)})
    (plotly/base {:=x :time
                  :=y :amplitude
                  :=x-title "Time (s)"
                  :=y-title "Amplitude"
                  :=title "Test Signal: 5 Hz + 12 Hz"
                  :=width 700
                  :=height 300})
    (plotly/layer-line {:=mark-color "steelblue"}))

Visualization Helper

To visualize FFT results throughout this post, we’ll use a helper function:

(defn plot-fft-spectrum
  "Plot FFT magnitude spectrum.
  
  Parameters:

  - magnitudes: sequence of magnitude values
  - title: plot title
  - color: line color (e.g., 'darkgreen', 'darkorange')
  
  Returns Plotly visualization."
  [magnitudes title color]
  (let [n-bins (count magnitudes)
        freq-bins (dfn/* (range n-bins) (/ sample-rate n-samples))]
    (-> (tc/dataset {:frequency freq-bins
                     :magnitude magnitudes})
        (plotly/base {:=x :frequency
                      :=y :magnitude
                      :=x-title "Frequency (Hz)"
                      :=y-title "Magnitude"
                      :=title title
                      :=width 700
                      :=height 300})
        (plotly/layer-line {:=mark-color color}))))

FFT Implementation #1: Apache Commons Math

Apache Commons Math requires creating a FastFourierTransformer with a normalization type, then calling transform with the signal.

(defn fft-apache-commons
  "Compute FFT using Apache Commons Math."
  [^doubles signal]
  (let [transformer (FastFourierTransformer. DftNormalization/STANDARD)]
    ; Result is Complex[] array
    (.transform transformer signal TransformType/FORWARD)))

Let’s run it and measure time:

(def commons-result
  (time (fft-apache-commons signal)))

OUT

"Elapsed time: 3.297995 msecs"

The output is an array of Complex objects. To extract magnitudes:

(defn complex-magnitude
  "Compute magnitude from Apache Commons Math Complex."
  [^Complex c]
  (.abs c))

(def commons-magnitudes
  (mapv complex-magnitude commons-result))

Visualize the first half (positive frequencies):

(plot-fft-spectrum
 (take (quot (count commons-magnitudes) 2) commons-magnitudes)
 "FFT Spectrum (Apache Commons Math)"
 "darkgreen")

Note: You’ll notice the peaks around 5 Hz and 12 Hz aren’t perfectly sharp—energy spreads to neighboring frequency bins. This is called spectral leakage, which occurs when signal frequencies don’t align exactly with FFT bin frequencies. We’ll explore windowing techniques to address this in a future post.

FFT Implementation #2: JDSP

JDSP provides a wrapper around Apache Commons Math with a simpler API.

(defn fft-jdsp
  "Compute FFT using JDSP."
  [signal]
  (let [fft (FastFourier. signal)]
    (.transform fft)
    ; Get magnitude values (positive frequencies only)
    (.getMagnitude fft true)))

(def jdsp-result
  (time (fft-jdsp signal)))

OUT

"Elapsed time: 3.417683 msecs"

Visualize

(plot-fft-spectrum
 jdsp-result
 "FFT Spectrum (JDSP)"
 "darkorange")

FFT Implementation #3: JTransforms (Direct)

JTransforms mutates the input array in-place. The realForward method transforms the array into interleaved [real0, imag0, real1, imag1, ...] format.

(defn fft-jtransforms
  "Compute FFT using JTransforms directly."
  [signal]
  ; Copy signal to avoid mutating original
  (let [fft-obj (DoubleFFT_1D. (long (count signal)))
        signal-copy (double-array signal)]
    (.realForward fft-obj signal-copy)
    signal-copy))

(def jtransforms-result
  (time (fft-jtransforms signal)))

OUT

"Elapsed time: 6.958577 msecs"

Extract magnitudes from interleaved format:

(defn jtransforms-magnitude
  "Compute magnitudes from JTransforms interleaved [real, imag, ...] format."
  [spectrum]
  (let [n-bins (quot (count spectrum) 2)]
    (mapv (fn [i]
            (let [real (nth spectrum (* 2 i))
                  imag (nth spectrum (inc (* 2 i)))]
              (Math/sqrt (+ (* real real) (* imag imag)))))
          (range n-bins))))

(def jtransforms-magnitudes
  (jtransforms-magnitude jtransforms-result))

Visualize

(plot-fft-spectrum
 jtransforms-magnitudes
 "FFT Spectrum (JTransforms)"
 "purple")

FFT Implementation #4: Fastmath

Fastmath provides the most Clojure-idiomatic API. Create a transformer, then use forward-1d.

(defn fft-fastmath
  "Compute FFT using fastmath."
  [signal]
  (let [transformer (fm-transform/transformer :real :fft)]
    (fm-transform/forward-1d transformer signal)))

(def fastmath-result
  (time (fft-fastmath signal)))

OUT

"Elapsed time: 0.488341 msecs"

Extract magnitudes (Fastmath uses JTransforms format internally):

(defn fastmath-magnitude
  "Compute magnitudes from fastmath FFT output."
  [spectrum]
  (let [n-bins (quot (dtype/ecount spectrum) 2)]
    (mapv (fn [i]
            (let [real (dtype/get-value spectrum (* 2 i))
                  imag (dtype/get-value spectrum (inc (* 2 i)))]
              (Math/sqrt (+ (* real real) (* imag imag)))))
          (range n-bins))))

(def fastmath-magnitudes
  (fastmath-magnitude fastmath-result))

Visualize

(plot-fft-spectrum
 fastmath-magnitudes
 "FFT Spectrum (Fastmath)"
 "crimson")

Performance Comparison

To properly compare these libraries, we’ll use criterium for rigorous JVM benchmarking. Criterium handles JVM warmup, runs multiple iterations, and provides statistical analysis.

We’ll test two signal sizes: 128 samples (small, 2^7) and 131,072 samples (large, 2^17).

(defn benchmark-library
  "Benchmark an FFT function using criterium."
  [fft-fn signal]
  (let [result (crit/quick-benchmark* (fn [] (fft-fn signal)) {})]
    {:mean-ms (* (first (:mean result)) 1e3)
     :lower-q-ms (* (first (:lower-q result)) 1e3)
     :upper-q-ms (* (first (:upper-q result)) 1e3)}))

Run benchmarks for both sizes

(def library-benchmarks
  (for [size [128 131072]
        [lib-name fft-fn] [["Apache Commons Math" fft-apache-commons]
                           ["JDSP" fft-jdsp]
                           ["JTransforms" fft-jtransforms]
                           ["Fastmath" fft-fastmath]]]
    (let [sig (generate-test-signal size)
          result (benchmark-library fft-fn sig)]
      (assoc result
             :library lib-name
             :size size))))

Display results

(kind/table
 (map (fn [{:keys [library size mean-ms lower-q-ms upper-q-ms]}]
        {:library library
         :size (if (= size 128) "128 (2^7)" "131K (2^17)")
         :mean-time (format "%.6f ms" mean-ms)
         :range (format "%.6f - %.6f ms" lower-q-ms upper-q-ms)})
      library-benchmarks))

library	size	mean-time	range
Apache Commons Math	128 (2^7)	0.002549 ms	0.001530 - 0.003715 ms
JDSP	128 (2^7)	0.002323 ms	0.001799 - 0.003061 ms
JTransforms	128 (2^7)	0.003806 ms	0.002108 - 0.005893 ms
Fastmath	128 (2^7)	0.003233 ms	0.002548 - 0.003728 ms
Apache Commons Math	131K (2^17)	4.074209 ms	3.636852 - 4.766877 ms
JDSP	131K (2^17)	6.113325 ms	5.437638 - 6.811960 ms
JTransforms	131K (2^17)	5.189599 ms	4.337465 - 7.382050 ms
Fastmath	131K (2^17)	4.924395 ms	2.848666 - 7.202860 ms

Exploring Parallelization: An Experiment

JTransforms advertises parallel processing support, which sounds promising for large FFTs. But does it actually help? Let’s explore.

Disclaimer: This section documents an exploratory experiment with uncertain conclusions. We’re learning as we go, and the results are puzzling!

How JTransforms Works: Three Algorithm Plans

Looking at the JTransforms source code, we discovered it uses three different execution plans:

SPLIT_RADIX (Cooley-Tukey variant) - Used for power-of-2 sizes, supports multithreading
MIXED_RADIX (Cooley-Tukey variant) - Used for factorable sizes (divisible by 2, 3, 4, 5), primarily sequential
BLUESTEIN (Chirp Z-transform) - Used for arbitrary sizes with large prime factors, supports 1-4 threads

The plan is automatically selected in the constructor based on signal size:

Power-of-2 → SPLIT_RADIX (fastest, our benchmarks use this)
Factorable by small primes → MIXED_RADIX
Large prime factors (remainder ≥ 211) → BLUESTEIN

This explains why arbitrary sizes like 100, 127, 1000 work—they use MIXED_RADIX or BLUESTEIN!

The JTransforms paper (Wendykier & Nagy 2008) discusses these implementations, noting that 1D transforms can use at most 2 or 4 threads due to the algorithm’s structure.

Testing Thread Performance

Let’s test with power-of-2 sizes to ensure we’re using the fast SPLIT_RADIX plan. We’ll use criterium for proper JVM benchmarking (warmup, statistics, GC handling):

(import 'pl.edu.icm.jlargearrays.ConcurrencyUtils
        'org.jtransforms.utils.CommonUtils)

org.jtransforms.utils.CommonUtils

(defn benchmark-with-threads
  "Benchmark FFT at specific thread count using criterium for statistical analysis."
  [n-threads signal]
  (let [previous-threads (ConcurrencyUtils/getNumberOfThreads)]
    (try
      (ConcurrencyUtils/setNumberOfThreads n-threads)
      ;; Use criterium's quick-bench for proper JVM warmup and statistics
      (let [result (crit/quick-benchmark* (fn [] (fft-fastmath signal)) {})]
        {:threads n-threads
         ;; Criterium returns [value (lower-ci upper-ci)] for each metric
         :mean-ms (* (first (:mean result)) 1e3) ; Convert seconds to milliseconds
         :variance-ms (* (first (:variance result)) 1e6) ; Variance in ms^2
         :lower-q-ms (* (first (:lower-q result)) 1e3)
         :upper-q-ms (* (first (:upper-q result)) 1e3)})
      (finally
        (ConcurrencyUtils/setNumberOfThreads previous-threads)))))

We’ll test multiple power-of-2 signal sizes and various thread counts (including 8 and 16, which the paper says are unsupported for 1D)

Experiment setup:

Signal sizes: 16384 (2^14), 65536 (2^16), 131072 (2^17), 262144 (2^18)
Thread counts: 1, 2, 4, 8, 16
Each configuration benchmarked with criterium (JVM warmup, multiple samples, statistical analysis)

(def threading-experiment
  (for [size [16384 65536 131072 262144]
        threads [1 2 4 8 16]]
    (let [sig (generate-test-signal size)
          result (benchmark-with-threads threads sig)]
      (assoc result :size size))))

Let’s visualize the results:

(-> (tc/dataset threading-experiment)
    (plotly/base {:=x :threads
                  :=y :mean-ms
                  :=color :size
                  :=color-type :nominal
                  :=title "FFT Performance vs Thread Count (Fastmath/JTransforms)"
                  :=x-title "Number of Threads"
                  :=y-title "Mean Time per FFT (ms)"
                  :=width 800
                  :=height 500})
    (plotly/layer-point {:=mark-size 10})
    (plotly/layer-line {:=mark-opacity 0.6}))

What We Found (Spoiler: It’s Confusing!)

Looking at the visualization above, we see:

No consistent speedup from adding threads
Performance is similar across all thread counts for each signal size
Performance scales linearly with signal size (not with thread count)
Sometimes 1 thread is fastest, sometimes 2 or 4, sometimes even 16!
The paper said 1D can only use 2-4 threads, but 8 and 16 threads work (they just don’t help)

Key observation: Thread count makes little practical difference for 1D FFT at these sizes.

Why Doesn’t Threading Help?

We’re honestly not sure! Here are some hypotheses:

1. Memory Bandwidth Bottleneck

FFT is likely memory-bound, not compute-bound. The algorithm reads and writes the entire array multiple times. All threads compete for the same memory bus, which doesn’t scale with CPU cores.

2. Signal Sizes Too Small

Even our largest test (262K samples, 2^18) might be below the threshold where parallelization becomes effective. The overhead of thread coordination could outweigh any benefits.

3. 1D FFT Algorithm Limitations

The Cooley-Tukey algorithm for 1D FFT has log₂(n) stages that must execute sequentially. For 131K samples, that’s only 17 stages—not much to parallelize. Amdahl’s Law limits speedup when much of the work is sequential.

4. JVM/JIT Complexity

Despite using criterium for proper warmup, there may be JIT compilation differences between thread configurations, or thread pool initialization effects we’re not accounting for.

Practical Takeaway

Based on our experiments: Don’t worry about thread count for 1D FFT. Use the default (typically matching your CPU cores), or even just 1 thread. The performance difference is negligible for typical signal sizes, and single-threaded is more predictable.

If you need good performance, focus on:

Using power-of-2 signal sizes (triggers fast SPLIT_RADIX plan)
Choosing an appropriate library (JTransforms/Fastmath are typically fastest for 1D, though differences aren’t huge)
Optimizing your overall algorithm to minimize FFT calls

Note: 2D and 3D FFTs may benefit more from parallelization since they have more work to distribute. We only tested 1D here.

Machine Configuration and Reproducibility

Performance results depend heavily on your hardware. Here’s how to inspect your environment:

(def machine-info
  {:jvm-version (System/getProperty "java.version")
   :jvm-vendor (System/getProperty "java.vendor")
   :jvm-name (System/getProperty "java.vm.name")
   :heap-max-mb (/ (.maxMemory (Runtime/getRuntime)) 1024.0 1024.0)
   :heap-total-mb (/ (.totalMemory (Runtime/getRuntime)) 1024.0 1024.0)
   :os-name (System/getProperty "os.name")
   :os-arch (System/getProperty "os.arch")
   :os-version (System/getProperty "os.version")
   :available-processors (.availableProcessors (Runtime/getRuntime))
   :clojure-version (clojure-version)})

(kind/table
 [(assoc machine-info :property "Machine Configuration")])

jvm-vendor	available-processors	jvm-version	heap-max-mb	property	jvm-name	os-version	clojure-version	os-name	os-arch	heap-total-mb
Ubuntu	22	21.0.9	15936.0	Machine Configuration	OpenJDK 64-Bit Server VM	6.14.0-36-generic	1.12.3	Linux	amd64	456.0

This benchmark ran on:

CPU: 22 cores (Linux 6.14.0-36-generic, amd64)
JVM: OpenJDK 21.0.9 (Ubuntu)
Heap: 15,936 MB max
Clojure: 1.12.3

Signal Size Requirements

Different libraries have different requirements for signal lengths:

Apache Commons Math: Power-of-2 Required

Apache Commons Math strictly requires signal length to be a power of 2 (128, 256, 512, 1024, etc.). This is because it only implements the classic Cooley-Tukey algorithm.

JTransforms/Fastmath: Flexible (with caveats)

Based on our source code investigation (see the threading section above), JTransforms uses three execution plans:

SPLIT_RADIX (Cooley-Tukey variant) - Fast, multithreaded, requires power-of-2
MIXED_RADIX (Cooley-Tukey variant) - For factorable sizes, primarily sequential
BLUESTEIN (Chirp Z-transform) - For arbitrary sizes with large prime factors, supports 1-4 threads

When you provide a non-power-of-2 size, JTransforms uses MIXED_RADIX or BLUESTEIN depending on the size’s prime factorization. This means:

✅ It works with any size (we tested 100, 127, 1000 successfully)
⚠️ Performance may be slower for non-power-of-2 sizes
⚠️ MIXED_RADIX is primarily sequential; BLUESTEIN supports limited parallelization

Recommendation: Use power-of-2 sizes when possible for best performance (triggers SPLIT_RADIX). If your data doesn’t fit, zero-pad to the next power of 2, or use JTransforms/Fastmath with the understanding that a different plan will be selected.

Conclusions and Recommendations

After comparing these libraries, here are my recommendations:

For most Clojure projects: Fastmath 3

Fastmath provides a good balance of performance and developer experience:

Idiomatic Clojure API (functional, immutable)
Leverages JTransforms’ performance
Rich ecosystem (transforms, signal processing, statistics)
Actively maintained

The fastmath.transform namespace makes FFT feel natural in Clojure:

Round-trip example

(comment
  (let [fft (fm-transform/transformer :real :fft)]
    (-> signal
        (fm-transform/forward-1d fft)
        (fm-transform/reverse-1d fft))))

For direct Java interop: JTransforms

If you want to use JTransforms directly and don’t mind mutation:

Use DoubleFFT_1D directly
In-place mutations avoid allocations
May be slightly faster than Fastmath wrapper in some cases
Good for real-time audio processing or when you need fine-grained control

For broader DSP needs: JDSP

If you need filters, wavelets, peak detection, and FFT in one package:

Convenient, batteries-included library
Simple API
Good documentation
Note: FFT performance is somewhat slower than JTransforms/Fastmath

Apache Commons Math: Consider alternatives

While Commons Math is excellent for general mathematics, for FFT specifically you might prefer other options:

Somewhat slower than JTransforms/Fastmath
Returns boxed Complex[] objects (allocation overhead)
Still a reasonable choice if you’re already using Commons Math for other features

Summary

The Clojure ecosystem offers several good FFT options through Java interop. For typical signal processing tasks, Fastmath 3 is a solid choice: JTransforms’ performance wrapped in a Clojure-friendly API. For direct control, use JTransforms directly. And if you need comprehensive DSP utilities beyond FFT, JDSP is worth exploring.

source: src/dsp/fft_comparison.clj

Introduction

The Libraries

Apache Commons Math

JDSP

JTransforms

Fastmath

Test Signal: Two-Tone Sine Wave

Visualization Helper

FFT Implementation #1: Apache Commons Math

FFT Implementation #2: JDSP

FFT Implementation #3: JTransforms (Direct)

FFT Implementation #4: Fastmath

Performance Comparison

Exploring Parallelization: An Experiment

How JTransforms Works: Three Algorithm Plans

Testing Thread Performance

What We Found (Spoiler: It’s Confusing!)

Why Doesn’t Threading Help?

Practical Takeaway

Machine Configuration and Reproducibility

Signal Size Requirements

Apache Commons Math: Power-of-2 Required

JTransforms/Fastmath: Flexible (with caveats)

Related Functionality

Apache Commons Math

JDSP

JTransforms

Fastmath

Conclusions and Recommendations

Summary