Draft

Extracting Heart Rate from Smartphone Camera Video

Using signal processing to extract and validate heart rate from Camera PPG data against clinical ground truth
Published

November 21, 2025

Introduction

What is Camera PPG?

Photoplethysmography (PPG) is a technique for measuring blood volume changes in tissue, typically used to measure heart rate and blood oxygen levels. Traditional PPG sensors (like those in fitness watches) use dedicated infrared LEDs and photodetectors.

Camera PPG (also called remote PPG or rPPG) is a fascinating alternative: it uses an ordinary smartphone camera to detect these same blood volume changes by analyzing subtle color variations in the skin. When you place your fingertip over the camera, each heartbeat causes a tiny change in blood volume that slightly alters the amount of light absorbed by the tissue. By recording video and analyzing the RGB color channels over time, we can extract vital signs like heart rate and SpO2 (blood oxygen saturation).

The MTHS Dataset

We’re using the MTHS dataset from the MEDVSE repository, which contains smartphone camera PPG data from 60 subjects. For each subject, the dataset provides:

  • Signal data (signal_x.npy): RGB time series sampled at 30 Hz from fingertip videos
  • Ground truth labels (label_x.npy): Heart rate (bpm) and SpO2 (%) sampled at 1 Hz, measured with clinical-grade equipment

What We’ll Explore

Our goal is to extract heart rate from smartphone camera video and validate it against clinical measurements.

The MTHS dataset provides both the raw camera data (RGB signals) and ground truth heart rate measurements from medical-grade equipment. This allows us to:

  1. Develop and test signal processing algorithms
  2. Validate our extracted heart rates against the ground truth
  3. Calibrate parameters (filter cutoffs, window sizes, etc.) for optimal accuracy

In this notebook, we’ll walk through the complete pipeline:

  1. Loading and visualizing raw RGB signals from multiple subjects
  2. Signal preprocessing: removing DC offset, bandpass filtering to isolate heart rate frequencies (0.5-5 Hz, corresponding to 30-300 bpm), and standardization
  3. Power spectrum analysis: using windowed FFT to identify the dominant frequency
  4. Heart rate extraction: finding the peak frequency and converting to bpm
  5. Validation: comparing our estimates with ground truth measurements

This exploration demonstrates how Clojure’s ecosystem—combining tablecloth for data wrangling, dtype-next for efficient numerical computation, jdsp for signal processing, and tableplot for visualization—provides powerful tools for biomedical signal analysis.

Setup

(ns dsp.mths
  (:require
   ;; Python interop for loading .npy files
   [libpython-clj2.require :refer [require-python]]
   [libpython-clj2.python :refer [py. py.. py.-] :as py]
   ;; Support for numpy array conversion to dtype-next
   [libpython-clj2.python.np-array]

   ;; Numerical computing with dtype-next
   [tech.v3.datatype :as dtype] ; Array operations, shapes, types
   [tech.v3.datatype.functional :as dfn] ; Vectorized math operations
   [tech.v3.tensor :as tensor] ; Multi-dimensional array operations
   [tech.v3.dataset :as ds] ; Dataset core functionality
   [tech.v3.dataset.tensor :as ds-tensor] ; Dataset <-> tensor conversions
   [tech.v3.parallel.for :as pfor] ; Parallel processing

   ;; Data manipulation with tablecloth
   [tablecloth.api :as tc] ; Dataset transformations
   [tablecloth.column.api :as tcc] ; Column-level operations

   ;; Visualization
   [scicloj.tableplot.v1.plotly :as plotly] ; Declarative plotting
   [scicloj.kindly.v4.kind :as kind] ; Rendering hints

   ;; Statistics
   [fastmath.stats :as stats] ; Statistical functions

   ;; Utilities
   [clojure.java.io :as io] ; File I/O
   [babashka.fs :as fs] ; Filesystem operations
   [clojure.string :as str]) ; String manipulation

  ;; Java DSP library (jdsp) - signal processing tools
  (:import
   [com.github.psambit9791.jdsp.filter Butterworth Chebyshev] ; Digital filters
   [com.github.psambit9791.jdsp.signal Detrend] ; DC removal
   [com.github.psambit9791.jdsp.signal.peaks FindPeak Peak] ; Peak detection
   [com.github.psambit9791.jdsp.transform DiscreteFourier FastFourier] ; FFT
   [com.github.psambit9791.jdsp.windows Hanning]))

Window functions

(require-python '[numpy :as np])
:ok

Reading data

The MTHS dataset stores signals and labels as NumPy .npy files. Each subject has two files:

  • signal_X.npy: RGB time series (3 channels × time samples)
  • label_X.npy: Ground truth measurements

We’ll use libpython-clj to load these files via NumPy, then convert them to dtype-next structures for efficient processing in Clojure.

We assume you have downloaded the MEDVSE repo alongside your Clojure project repo.

(def data-base-path
  "../MEDVSE/MTHS/Data/")

Let’s see how we read one file as a NumPy array:

(np/load (str data-base-path "/signal_12.npy"))
[[2.48598532e+02 7.92920525e-03 4.04724272e+01]
 [2.48792894e+02 7.06163194e-03 4.04879837e+01]
 [2.48882323e+02 6.91406250e-03 4.05330609e+01]
 ...
 [2.49076848e+02 1.34948881e-02 4.01680367e+01]
 [2.49276841e+02 1.38247492e-02 4.02729514e+01]
 [2.49379961e+02 7.99141590e-03 4.03729572e+01]]

Now, let’s read all data files and organize them in a single map. We’ll use keywords like [:signal 12] and [:label 12] as keys for easy access.

(def raw-data
  (-> data-base-path
      fs/list-dir
      (->> (map (fn [path]
                  (let [nam (-> path
                                fs/file-name
                                (str/replace #"\.npy" ""))]
                    [[(keyword (re-find #"signal|label" nam))
                      (-> nam
                          (str/split #"_")
                          last
                          Integer/parseInt)]
                     (-> path
                         str
                         np/load)])))
           (into {}))))

NumPy arrays can be inspected using dtype-next:

(dtype/shape (raw-data [:signal 23]))
[3570 3]

30Hz signal → [n-samples, 3] array

(dtype/shape (raw-data [:label 23]))
[119 2]

1Hz labels → [n-samples, 2] array (heart rate + SpO2)

The sampling rate is 30 Hz (30 samples per second)

(def sampling-rate 30)

Ground Truth Data

Before we dive into signal processing, let’s look at the ground truth labels. These provide the reference heart rate values we’ll use to validate our algorithms.

(defn labels [i]
  (some-> [:label i]
          raw-data
          ds-tensor/tensor->dataset
          (tc/rename-columns [:heart-rate :spo2])))

For example, subject 23’s ground truth measurements:

(labels 23)

:_unnamed [119 2]:

:heart-rate :spo2
70.0 100.0
70.0 99.0
70.0 99.0
70.0 99.0
70.0 99.0
70.0 99.0
70.0 99.0
68.0 99.0
63.0 99.0
60.0 99.0
63.0 99.0
63.0 99.0
63.0 99.0
63.0 99.0
63.0 99.0
63.0 99.0
63.0 99.0
63.0 99.0
65.0 100.0
66.0 100.0
66.0 100.0

Let’s create a helper function to convert a subject’s raw signal into a tablecloth dataset with properly named columns (:R, :G, :B) and a time column (:t in seconds):

(defn signal [i]
  (some-> [:signal i]
          raw-data
          ds-tensor/tensor->dataset
          (tc/rename-columns [:R :G :B])
          (tc/add-column :t (tcc// (range) 30.0))))

For example:

(signal 23)

:_unnamed [3570 4]:

:R :G :B :t
253.64658131 18.44284963 17.69192178 0.00000000
253.85280430 18.49430073 17.75038098 0.03333333
253.81535397 18.40544946 17.65379051 0.06666667
253.69167438 18.26294030 17.56666136 0.10000000
253.64916715 18.17253762 17.59384163 0.13333333
253.63671923 18.09752894 17.56237847 0.16666667
253.65624228 18.06113378 17.58690152 0.20000000
253.65709153 18.01346354 17.58241416 0.23333333
253.68944975 17.96046393 17.59604649 0.26666667
253.65804012 17.95549334 17.57225453 0.30000000
253.86924190 29.68705729 14.72164641 118.63333333
253.90138985 30.20249228 14.77345679 118.66666667
253.89115114 30.84974778 14.81303482 118.70000000
253.88201148 31.16283131 14.84677228 118.73333333
253.87836468 31.40066310 14.84951534 118.76666667
253.88823881 31.72274740 14.85886671 118.80000000
253.87952595 31.82796441 14.88614149 118.83333333
253.86784288 31.85746624 14.88736063 118.86666667
253.86660156 31.85836757 14.92308015 118.90000000
253.97774354 31.96165606 15.00338252 118.93333333
253.88016107 32.09662616 14.90804591 118.96666667

Plotting

Let’s create a plotting function to visualize all three RGB channels over time. In Camera PPG, different color channels can have different signal-to-noise ratios depending on skin tone and lighting conditions. Typically, the green channel is strongest due to hemoglobin’s absorption characteristics.

(defn plot-signal [s]
  (-> s
      (plotly/base {:=x :t
                    :=mark-opacity 0.7})
      (plotly/layer-line {:=y :R
                          :=mark-color "red"})
      (plotly/layer-line {:=y :G
                          :=mark-color "green"})
      (plotly/layer-line {:=y :B
                          :=mark-color "blue"})))
(-> 23
    signal
    plot-signal)

Signal Processing Pipeline

Raw Camera PPG signals are noisy and contain artifacts. Before we can extract heart rate, we need to clean them up through a series of transformations:

  1. DC removal (detrending): Removes the constant offset, leaving only the AC component (the oscillating part due to heartbeats)

  2. Bandpass filtering: Keeps only frequencies in the 0.5-5 Hz range, which corresponds to heart rates between 30-300 bpm. This removes high-frequency noise and low-frequency drift.

  3. Standardization: Scales the signal to zero mean and unit variance, making it easier to compare across subjects and channels.

Let’s create a utility function to visualize how each transformation affects all subjects:

(defn plot-signals-with-transformations [transformations]
  (kind/table
   {:row-vectors (->> (range 2 62)
                      (pfor/pmap (fn [i]
                                   (->> transformations
                                        vals
                                        (reductions (fn [ds t]
                                                      (tc/update-columns
                                                       ds [:R :G :B] t))
                                                    (signal i))
                                        (map plot-signal)
                                        (cons (kind/md (str "### " i))))))

                      kind/table)
    :column-names (concat ["subject" "raw"]
                          (keys transformations))}))

Transformation Functions

DC Removal (Detrending)

Removes the constant (DC) component from the signal. This is crucial because camera sensors capture both the steady ambient light level and the small oscillations from blood volume changes. We only care about the oscillations.

(defn remove-dc [signal]
  (-> signal
      double-array
      (Detrend. "constant")
      .detrendSignal))

Bandpass Filter

A 4th-order Butterworth bandpass filter that keeps only the frequencies associated with normal heart rates (0.5-5 Hz = 30-300 bpm). This is a standard technique in PPG signal processing.

(defn bandpass-filter [signal {:keys [fs order low-cutoff high-cutoff]}]
  (let [flt (Butterworth. fs)
        result (.bandPassFilter flt
                                (double-array signal)
                                order
                                low-cutoff
                                high-cutoff)]
    (vec result)))

Applying the Pipeline

Now let’s apply our complete preprocessing pipeline to all subjects. We’ll visualize the raw signal alongside each transformation step to see how the signal quality improves:

(let [[low-cutoff high-cutoff] [0.5 5]
      window-size 10
      window-samples (* sampling-rate window-size)
      overlap-fraction 0.5
      hop (* sampling-rate window-samples)
      windows-starts (range 0)]
  (plot-signals-with-transformations
   {:remove-dc remove-dc
    :bandpass #(bandpass-filter % {:fs sampling-rate
                                   :order 4
                                   :low-cutoff low-cutoff
                                   :high-cutoff high-cutoff})
    :standardize stats/standardize}))
subject raw remove-dc bandpass standardize

2

3

4

5

6

7

8