Extracting Heart Rate from Smartphone Camera Video
Introduction
What is Camera PPG?
Photoplethysmography (PPG) is a technique for measuring blood volume changes in tissue, typically used to measure heart rate and blood oxygen levels. Traditional PPG sensors (like those in fitness watches) use dedicated infrared LEDs and photodetectors.
Camera PPG (also called remote PPG or rPPG) is a fascinating alternative: it uses an ordinary smartphone camera to detect these same blood volume changes by analyzing subtle color variations in the skin. When you place your fingertip over the camera, each heartbeat causes a tiny change in blood volume that slightly alters the amount of light absorbed by the tissue. By recording video and analyzing the RGB color channels over time, we can extract vital signs like heart rate and SpO2 (blood oxygen saturation).
The MTHS Dataset
We’re using the MTHS dataset from the MEDVSE repository, which contains smartphone camera PPG data from 60 subjects. For each subject, the dataset provides:
What We’ll Explore
Our goal is to extract heart rate from smartphone camera video and validate it against clinical measurements.
The MTHS dataset provides both the raw camera data (RGB signals) and ground truth heart rate measurements from medical-grade equipment. This allows us to:
- Develop and test signal processing algorithms
- Validate our extracted heart rates against the ground truth
- Calibrate parameters (filter cutoffs, window sizes, etc.) for optimal accuracy
In this notebook, we’ll walk through the complete pipeline:
- Loading and visualizing raw RGB signals from multiple subjects
- Signal preprocessing: removing DC offset, bandpass filtering to isolate heart rate frequencies (0.5-5 Hz, corresponding to 30-300 bpm), and standardization
- Power spectrum analysis: using windowed FFT to identify the dominant frequency
- Heart rate extraction: finding the peak frequency and converting to bpm
- Validation: comparing our estimates with ground truth measurements
This exploration demonstrates how Clojure’s ecosystem—combining tablecloth for data wrangling, dtype-next for efficient numerical computation, jdsp for signal processing, and tableplot for visualization—provides powerful tools for biomedical signal analysis.
Setup
(ns dsp.mths
(:require
;; Python interop for loading .npy files
[libpython-clj2.require :refer [require-python]]
[libpython-clj2.python :refer [py. py.. py.-] :as py]
;; Support for numpy array conversion to dtype-next
[libpython-clj2.python.np-array]
;; Numerical computing with dtype-next
[tech.v3.datatype :as dtype] ; Array operations, shapes, types
[tech.v3.datatype.functional :as dfn] ; Vectorized math operations
[tech.v3.tensor :as tensor] ; Multi-dimensional array operations
[tech.v3.dataset :as ds] ; Dataset core functionality
[tech.v3.dataset.tensor :as ds-tensor] ; Dataset <-> tensor conversions
[tech.v3.parallel.for :as pfor] ; Parallel processing
;; Data manipulation with tablecloth
[tablecloth.api :as tc] ; Dataset transformations
[tablecloth.column.api :as tcc] ; Column-level operations
;; Visualization
[scicloj.tableplot.v1.plotly :as plotly] ; Declarative plotting
[scicloj.kindly.v4.kind :as kind] ; Rendering hints
;; Statistics
[fastmath.stats :as stats] ; Statistical functions
;; Utilities
[clojure.java.io :as io] ; File I/O
[babashka.fs :as fs] ; Filesystem operations
[clojure.string :as str]) ; String manipulation
;; Java DSP library (jdsp) - signal processing tools
(:import
[com.github.psambit9791.jdsp.filter Butterworth Chebyshev] ; Digital filters
[com.github.psambit9791.jdsp.signal Detrend] ; DC removal
[com.github.psambit9791.jdsp.signal.peaks FindPeak Peak] ; Peak detection
[com.github.psambit9791.jdsp.transform DiscreteFourier FastFourier] ; FFT
[com.github.psambit9791.jdsp.windows Hanning]))Window functions
(require-python '[numpy :as np]):okReading data
The MTHS dataset stores signals and labels as NumPy .npy files. Each subject has two files:
signal_X.npy: RGB time series (3 channels × time samples)label_X.npy: Ground truth measurements
We’ll use libpython-clj to load these files via NumPy, then convert them to dtype-next structures for efficient processing in Clojure.
We assume you have downloaded the MEDVSE repo alongside your Clojure project repo.
(def data-base-path
"../MEDVSE/MTHS/Data/")Let’s see how we read one file as a NumPy array:
(np/load (str data-base-path "/signal_12.npy"))[[2.48598532e+02 7.92920525e-03 4.04724272e+01]
[2.48792894e+02 7.06163194e-03 4.04879837e+01]
[2.48882323e+02 6.91406250e-03 4.05330609e+01]
...
[2.49076848e+02 1.34948881e-02 4.01680367e+01]
[2.49276841e+02 1.38247492e-02 4.02729514e+01]
[2.49379961e+02 7.99141590e-03 4.03729572e+01]]Now, let’s read all data files and organize them in a single map. We’ll use keywords like [:signal 12] and [:label 12] as keys for easy access.
(def raw-data
(-> data-base-path
fs/list-dir
(->> (map (fn [path]
(let [nam (-> path
fs/file-name
(str/replace #"\.npy" ""))]
[[(keyword (re-find #"signal|label" nam))
(-> nam
(str/split #"_")
last
Integer/parseInt)]
(-> path
str
np/load)])))
(into {}))))NumPy arrays can be inspected using dtype-next:
(dtype/shape (raw-data [:signal 23]))[3570 3]30Hz signal → [n-samples, 3] array
(dtype/shape (raw-data [:label 23]))[119 2]1Hz labels → [n-samples, 2] array (heart rate + SpO2)
The sampling rate is 30 Hz (30 samples per second)
(def sampling-rate 30)Ground Truth Data
Before we dive into signal processing, let’s look at the ground truth labels. These provide the reference heart rate values we’ll use to validate our algorithms.
(defn labels [i]
(some-> [:label i]
raw-data
ds-tensor/tensor->dataset
(tc/rename-columns [:heart-rate :spo2])))For example, subject 23’s ground truth measurements:
(labels 23):_unnamed [119 2]:
| :heart-rate | :spo2 |
|---|---|
| 70.0 | 100.0 |
| 70.0 | 99.0 |
| 70.0 | 99.0 |
| 70.0 | 99.0 |
| 70.0 | 99.0 |
| 70.0 | 99.0 |
| 70.0 | 99.0 |
| 68.0 | 99.0 |
| 63.0 | 99.0 |
| 60.0 | 99.0 |
| … | … |
| 63.0 | 99.0 |
| 63.0 | 99.0 |
| 63.0 | 99.0 |
| 63.0 | 99.0 |
| 63.0 | 99.0 |
| 63.0 | 99.0 |
| 63.0 | 99.0 |
| 63.0 | 99.0 |
| 65.0 | 100.0 |
| 66.0 | 100.0 |
| 66.0 | 100.0 |
Let’s create a helper function to convert a subject’s raw signal into a tablecloth dataset with properly named columns (:R, :G, :B) and a time column (:t in seconds):
(defn signal [i]
(some-> [:signal i]
raw-data
ds-tensor/tensor->dataset
(tc/rename-columns [:R :G :B])
(tc/add-column :t (tcc// (range) 30.0))))For example:
(signal 23):_unnamed [3570 4]:
| :R | :G | :B | :t |
|---|---|---|---|
| 253.64658131 | 18.44284963 | 17.69192178 | 0.00000000 |
| 253.85280430 | 18.49430073 | 17.75038098 | 0.03333333 |
| 253.81535397 | 18.40544946 | 17.65379051 | 0.06666667 |
| 253.69167438 | 18.26294030 | 17.56666136 | 0.10000000 |
| 253.64916715 | 18.17253762 | 17.59384163 | 0.13333333 |
| 253.63671923 | 18.09752894 | 17.56237847 | 0.16666667 |
| 253.65624228 | 18.06113378 | 17.58690152 | 0.20000000 |
| 253.65709153 | 18.01346354 | 17.58241416 | 0.23333333 |
| 253.68944975 | 17.96046393 | 17.59604649 | 0.26666667 |
| 253.65804012 | 17.95549334 | 17.57225453 | 0.30000000 |
| … | … | … | … |
| 253.86924190 | 29.68705729 | 14.72164641 | 118.63333333 |
| 253.90138985 | 30.20249228 | 14.77345679 | 118.66666667 |
| 253.89115114 | 30.84974778 | 14.81303482 | 118.70000000 |
| 253.88201148 | 31.16283131 | 14.84677228 | 118.73333333 |
| 253.87836468 | 31.40066310 | 14.84951534 | 118.76666667 |
| 253.88823881 | 31.72274740 | 14.85886671 | 118.80000000 |
| 253.87952595 | 31.82796441 | 14.88614149 | 118.83333333 |
| 253.86784288 | 31.85746624 | 14.88736063 | 118.86666667 |
| 253.86660156 | 31.85836757 | 14.92308015 | 118.90000000 |
| 253.97774354 | 31.96165606 | 15.00338252 | 118.93333333 |
| 253.88016107 | 32.09662616 | 14.90804591 | 118.96666667 |
Plotting
Let’s create a plotting function to visualize all three RGB channels over time. In Camera PPG, different color channels can have different signal-to-noise ratios depending on skin tone and lighting conditions. Typically, the green channel is strongest due to hemoglobin’s absorption characteristics.
(defn plot-signal [s]
(-> s
(plotly/base {:=x :t
:=mark-opacity 0.7})
(plotly/layer-line {:=y :R
:=mark-color "red"})
(plotly/layer-line {:=y :G
:=mark-color "green"})
(plotly/layer-line {:=y :B
:=mark-color "blue"})))(-> 23
signal
plot-signal)Signal Processing Pipeline
Raw Camera PPG signals are noisy and contain artifacts. Before we can extract heart rate, we need to clean them up through a series of transformations:
DC removal (detrending): Removes the constant offset, leaving only the AC component (the oscillating part due to heartbeats)
Bandpass filtering: Keeps only frequencies in the 0.5-5 Hz range, which corresponds to heart rates between 30-300 bpm. This removes high-frequency noise and low-frequency drift.
Standardization: Scales the signal to zero mean and unit variance, making it easier to compare across subjects and channels.
Let’s create a utility function to visualize how each transformation affects all subjects:
(defn plot-signals-with-transformations [transformations]
(kind/table
{:row-vectors (->> (range 2 62)
(pfor/pmap (fn [i]
(->> transformations
vals
(reductions (fn [ds t]
(tc/update-columns
ds [:R :G :B] t))
(signal i))
(map plot-signal)
(cons (kind/md (str "### " i))))))
kind/table)
:column-names (concat ["subject" "raw"]
(keys transformations))}))Transformation Functions
DC Removal (Detrending)
Removes the constant (DC) component from the signal. This is crucial because camera sensors capture both the steady ambient light level and the small oscillations from blood volume changes. We only care about the oscillations.
(defn remove-dc [signal]
(-> signal
double-array
(Detrend. "constant")
.detrendSignal))Bandpass Filter
A 4th-order Butterworth bandpass filter that keeps only the frequencies associated with normal heart rates (0.5-5 Hz = 30-300 bpm). This is a standard technique in PPG signal processing.
(defn bandpass-filter [signal {:keys [fs order low-cutoff high-cutoff]}]
(let [flt (Butterworth. fs)
result (.bandPassFilter flt
(double-array signal)
order
low-cutoff
high-cutoff)]
(vec result)))Applying the Pipeline
Now let’s apply our complete preprocessing pipeline to all subjects. We’ll visualize the raw signal alongside each transformation step to see how the signal quality improves:
(let [[low-cutoff high-cutoff] [0.5 5]
window-size 10
window-samples (* sampling-rate window-size)
overlap-fraction 0.5
hop (* sampling-rate window-samples)
windows-starts (range 0)]
(plot-signals-with-transformations
{:remove-dc remove-dc
:bandpass #(bandpass-filter % {:fs sampling-rate
:order 4
:low-cutoff low-cutoff
:high-cutoff high-cutoff})
:standardize stats/standardize}))| subject | raw | remove-dc | bandpass | standardize |
|---|---|---|---|---|
2 |
||||
3 |
||||
4 |
||||
5 |
||||
6 |
||||
7 |
||||
8 |