Building a SPLOM using geom.viz

Progressive tutorial building scatter plot matrices step-by-step with thi.ng/geom.viz
Published

January 13, 2026

Keywords

datavis, splom, scatterplot-matrix, geomviz, tutorial

Introduction

A SPLOM (scatter plot matrix) displays pairwise relationships between multiple variables in a grid. It’s invaluable for exploratory data analysis.

We’ll build one using thi.ng/geom.viz, progressing from a simple scatter plot to a complete 4×4 matrix. Each step introduces exactly one new concept.

Here we focus on static SVG generation. We’re also exploring adding interactivity using D3.js for features like brushing and linking, but that’s beyond the scope of this notebook.

This tutorial is part of ongoing work on the Tableplot plotting library and the Real-World-Data dev group’s exploration of visualization APIs for Clojure. By building a SPLOM manually, we better understand what a high-level plotting library needs to provide.

Clojurians Zulip discussion (requires login): #data-science>AlgebraOfGraphics.jl

Setup

We’ll use thi.ng/geom.viz for low-level SVG rendering. thi.ng/geom is part of a comprehensive ecosystem of computational design tools created by Karsten Schmidt (aka “toxi”), with the thi.ng collection established in 2006 and thi.ng/geom starting around 2011. The library provides ~320 sub-projects for geometry, visualization, and generative art, actively maintained for nearly two decades.

This notebook uses several libraries from the Clojure data science ecosystem:

(ns data-visualization.splom-tutorial
  (:require
   ;; Tablecloth - Dataset manipulation
   [tablecloth.api :as tc]
   [tablecloth.column.api :as tcc]

   ;; Kindly - Notebook visualization protocol
   [scicloj.kindly.v4.kind :as kind]

   ;; thi.ng/geom - SVG rendering and visualization
   [thi.ng.geom.viz.core :as viz]
   [thi.ng.geom.svg.core :as svg]

   ;; Fastmath - Statistical computations
   [fastmath.stats :as stats]
   [fastmath.ml.regression :as regr]

   ;; RDatasets - Example datasets
   [scicloj.metamorph.ml.rdatasets :as rdatasets]))

Tablecloth provides our dataset API, wrapping tech.ml.dataset with a friendly interface. We use it to load data, group by species, and select rows.

Kindly is the visualization protocol that lets this notebook render in different environments (Clay, Portal, etc.).

thi.ng/geom provides low-level SVG rendering primitives. We use geom.viz for creating axes, scales, and plot layouts, and geom.svg for SVG element construction.

Fastmath handles statistical computations, including histogram binning (Steps 4-7) and linear regression (Steps 8-10). It’s a comprehensive math library for Clojure.

RDatasets provides classic datasets for examples. It is made available in Clojure through metamorph.ml.

The Data

We’ll use the classic Iris dataset: 150 flowers, 4 measurements each, 3 species.

(def iris (rdatasets/datasets-iris))
iris

https://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv [150 6]:

:rownames :sepal-length :sepal-width :petal-length :petal-width :species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
140 6.9 3.1 5.4 2.1 virginica
141 6.7 3.1 5.6 2.4 virginica
142 6.9 3.1 5.1 2.3 virginica
143 5.8 2.7 5.1 1.9 virginica
144 6.8 3.2 5.9 2.3 virginica
145 6.7 3.3 5.7 2.5 virginica
146 6.7 3.0 5.2 2.3 virginica
147 6.3 2.5 5.0 1.9 virginica
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica

Three species, 50 flowers each:

(-> iris
    (tc/group-by [:species])
    (tc/aggregate {:count tc/row-count}))

_unnamed [3 2]:

:species :count
setosa 50
versicolor 50
virginica 50

Colors and Data Preparation

Here we define our color palette and derive species information from the data.

The species colors are inspired by ggplot2’s default discrete color scale.

(def colors
  {:grey-bg "#EBEBEB"
   :grid "#FFFFFF"
   :grey-points "#333333"
   :regression "#2C3E50" ; Dark blue-gray for regression lines
   :species ["#F8766D"
             "#619CFF"
             "#00BA38"]})

We derive species names from data:

(def species-names
  (-> iris :species distinct sort))

Create a species -> color mapping

(def species-color-map
  (zipmap species-names
          (:species colors)))
species-color-map
{"setosa" "#F8766D", "versicolor" "#619CFF", "virginica" "#00BA38"}

Group data by species for later use

(def species-groups
  (tc/group-by iris :species {:result-type :as-map}))

Compute domains from data We’ll use these throughout to avoid hard-coding ranges.

(defn compute-domain
  "Compute [min max] domain for a variable, with optional padding."
  ([var-data] (compute-domain var-data 0.05))
  ([var-data padding]
   (let [min-val (tcc/reduce-min var-data)
         max-val (tcc/reduce-max var-data)
         span (- max-val min-val)
         pad (* span padding)]
     [(- min-val pad) (+ max-val pad)])))
(def domains
  (-> iris
      (tc/select-columns :type/numerical)
      ;; We use the fact that a dataset is a map.
      (update-vals compute-domain)))
domains
{:rownames [-6.45 157.45],
 :sepal-length [4.12 8.08],
 :sepal-width [1.88 4.5200000000000005],
 :petal-length [0.705 7.195],
 :petal-width [-0.01999999999999999 2.62]}

Helper for integer axis labels:

(def int-label-fn (viz/default-svg-label
                   (fn [x]
                     (str (int x)))))

Common plotting constants

(def panel-size 400)
(def margin 60)

Grid constants (we’ll use these later for multi-panel layouts)

(def grid-panel-size 200)
(def grid-margin 40)

Step 0: SVG as Hiccup

Before we start plotting, let’s understand how we’ll render SVG.

Hiccup is a Clojure library for representing HTML and SVG as data structures using vectors like [:circle ...]. thi.ng/geom provides SVG functions that we can render as hiccup.

Let’s start with a simple example:

(kind/hiccup
 (svg/svg {:width 150
           :height 100}
          (svg/circle [30 50] 20 {:fill "#F8766D"
                                  :stroke "none"})
          (svg/circle [70 50] 20 {:fill "#619CFF"
                                  :stroke "none"})
          (svg/circle [110 50] 20 {:fill "#00BA38"
                                   :stroke "none"})))

This works because each circle is a valid hiccup element (vector starting with a keyword).

However, when we generate SVG elements programmatically, we often create collections of elements. Hiccup requires vectors to start with a tag keyword.

This won’t work:

[[:circle ...] [:circle ...] [:circle ...]]  ; No tag!

Our solution: wrap tagless vectors in [:g …], the SVG group element.

(require '[clojure.walk :as walk])
(defn hiccup-compatible
  "Make thi.ng/geom output hiccup-compatible.
  
  Wraps tagless vectors in `[:g ...]` (SVG group element):
  [[:tag1 ...] [:tag2 ...]] becomes [:g [:tag1 ...] [:tag2 ...]]"
  [form]
  (walk/postwalk
   (fn [x]
     (if (and (vector? x)
              (not (map-entry? x))
              (seq x)
              (not (keyword? (first x))))
       (vec (cons :g x))
       x))
   form))
(defn svg
  "Like thi.ng.geom.svg/svg, but hiccup-compatible.
  
  Wraps tagless vectors in [:g ...] and outputs kind/hiccup."
  [attrs & children]
  (-> (apply svg/svg attrs children)
      hiccup-compatible
      kind/hiccup))

Now we can generate circles programmatically:

(def three-circles
  (mapv (fn [[x color]]
          (svg/circle [x 50] 20 {:fill color
                                 :stroke "none"}))
        [[30 "#F8766D"]
         [70 "#619CFF"]
         [110 "#00BA38"]]))

Our helper automatically wraps the vector in [:g …]:

(svg {:width 150
      :height 100} three-circles)

Step 1: Single Scatter Plot (Ungrouped)

Let’s start with the simplest possible visualization: a scatter plot of sepal length vs. sepal width, all points grey.

Let’s create helpers for x and y axes.

(defn axis [{:keys [column rng pos label-style]
             :or {label-style {}}}]
  (viz/linear-axis
   {:domain (domains column)
    :range rng
    :major 2.0
    :pos pos
    :label-dist 12
    :label #'int-label-fn
    :label-style label-style
    :major-size 3
    :minor-size 0
    :attribs {:stroke "none"}}))
(defn x-axis [column]
  (axis {:column column :rng [margin (- panel-size margin)] :pos (- panel-size margin)}))
(defn y-axis [column]
  (axis {:column column :rng [(- panel-size margin) margin] :pos margin :label-style
         {:text-anchor "end"}}))

Let’s see what these axis helpers produce:

(x-axis :sepal-length)
{:scale
 #object[thi.ng.geom.viz.core$linear_scale$fn__82691 0x12724081 "thi.ng.geom.viz.core$linear_scale$fn__82691@12724081"],
 :major-size 3,
 :pos 340,
 :major (6.0 8.0),
 :label-dist 12,
 :attribs {:stroke "none"},
 :label #'data-visualization.splom-tutorial/int-label-fn,
 :label-style
 {:fill "black",
  :stroke "none",
  :font-family "Arial, sans-serif",
  :font-size 10,
  :text-anchor "middle"},
 :minor nil,
 :domain [4.12 8.08],
 :minor-size 0,
 :visible true,
 :range [60 340]}
(y-axis :sepal-width)
{:scale
 #object[thi.ng.geom.viz.core$linear_scale$fn__82691 0x6eceac81 "thi.ng.geom.viz.core$linear_scale$fn__82691@6eceac81"],
 :major-size 3,
 :pos 60,
 :major (2.0 4.0),
 :label-dist 12,
 :attribs {:stroke "none"},
 :label #'data-visualization.splom-tutorial/int-label-fn,
 :label-style
 {:fill "black",
  :stroke "none",
  :font-family "Arial, sans-serif",
  :font-size 10,
  :text-anchor "end"},
 :minor nil,
 :domain [1.88 4.5200000000000005],
 :minor-size 0,
 :visible true,
 :range [340 60]}

Now let’s assemble the complete plot specification.

(defn plot-spec [columns]
  (let [[x-col y-col] columns]
    {:x-axis (x-axis x-col)
     :y-axis (y-axis y-col)
     :grid {:attribs {:stroke (:grid colors)
                      :stroke-width 1}}
     :data [{:values (-> iris
                         (tc/select-columns columns)
                         tc/rows)
             :attribs {:fill (:grey-points colors)
                       :stroke "none"}
             :layout viz/svg-scatter-plot}]}))
(plot-spec [:sepal-length
            :sepal-width])
{:x-axis
 {:scale
  #object[thi.ng.geom.viz.core$linear_scale$fn__82691 0x935f500 "thi.ng.geom.viz.core$linear_scale$fn__82691@935f500"],
  :major-size 3,
  :pos 340,
  :major (6.0 8.0),
  :label-dist 12,
  :attribs {:stroke "none"},
  :label #'data-visualization.splom-tutorial/int-label-fn,
  :label-style
  {:fill "black",
   :stroke "none",
   :font-family "Arial, sans-serif",
   :font-size 10,
   :text-anchor "middle"},
  :minor nil,
  :domain [4.12 8.08],
  :minor-size 0,
  :visible true,
  :range [60 340]},
 :y-axis
 {:scale
  #object[thi.ng.geom.viz.core$linear_scale$fn__82691 0x66dff172 "thi.ng.geom.viz.core$linear_scale$fn__82691@66dff172"],
  :major-size 3,
  :pos 60,
  :major (2.0 4.0),
  :label-dist 12,
  :attribs {:stroke "none"},
  :label #'data-visualization.splom-tutorial/int-label-fn,
  :label-style
  {:fill "black",
   :stroke "none",
   :font-family "Arial, sans-serif",
   :font-size 10,
   :text-anchor "end"},
  :minor nil,
  :domain [1.88 4.5200000000000005],
  :minor-size 0,
  :visible true,
  :range [340 60]},
 :grid {:attribs {:stroke "#FFFFFF", :stroke-width 1}},
 :data
 [{:values
   [[5.1 3.5] [4.9 3.0] [4.7 3.2] [4.6 3.1] [5.0 3.6] [5.4 3.9] [4.6 3.4] [5.0 3.4] [4.4 2.9] [4.9 3.1] [5.4 3.7] [4.8 3.4] [4.8 3.0] [4.3 3.0] [5.8 4.0] [5.7 4.4] [5.4 3.9] [5.1 3.5] [5.7 3.8] [5.1 3.8] [5.4 3.4] [5.1 3.7] [4.6 3.6] [5.1 3.3] [4.8 3.4] [5.0 3.0] [5.0 3.4] [5.2 3.5] [5.2 3.4] [4.7 3.2] [4.8 3.1] [5.4 3.4] [5.2 4.1] [5.5 4.2] [4.9 3.1] [5.0 3.2] [5.5 3.5] [4.9 3.6] [4.4 3.0] [5.1 3.4] [5.0 3.5] [4.5 2.3] [4.4 3.2] [5.0 3.5] [5.1 3.8] [4.8 3.0] [5.1 3.8] [4.6 3.2] [5.3 3.7] [5.0 3.3] [7.0 3.2] [6.4 3.2] [6.9 3.1] [5.5 2.3] [6.5 2.8] [5.7 2.8] [6.3 3.3] [4.9 2.4] [6.6 2.9] [5.2 2.7] [5.0 2.0] [5.9 3.0] [6.0 2.2] [6.1 2.9] [5.6 2.9] [6.7 3.1] [5.6 3.0] [5.8 2.7] [6.2 2.2] [5.6 2.5] [5.9 3.2] [6.1 2.8] [6.3 2.5] [6.1 2.8] [6.4 2.9] [6.6 3.0] [6.8 2.8] [6.7 3.0] [6.0 2.9] [5.7 2.6] [5.5 2.4] [5.5 2.4] [5.8 2.7] [6.0 2.7] [5.4 3.0] [6.0 3.4] [6.7 3.1] [6.3 2.3] [5.6 3.0] [5.5 2.5] [5.5 2.6] [6.1 3.0] [5.8 2.6] [5.0 2.3] [5.6 2.7] [5.7 3.0] [5.7 2.9] [6.2 2.9] [5.1 2.5] [5.7 2.8] [6.3 3.3] [5.8 2.7] [7.1 3.0] [6.3 2.9] [6.5 3.0] [7.6 3.0] [4.9 2.5] [7.3 2.9] [6.7 2.5] [7.2 3.6] [6.5 3.2] [6.4 2.7] [6.8 3.0] [5.7 2.5] [5.8 2.8] [6.4 3.2] [6.5 3.0] [7.7 3.8] [7.7 2.6] [6.0 2.2] [6.9 3.2] [5.6 2.8] [7.7 2.8] [6.3 2.7] [6.7 3.3] [7.2 3.2] [6.2 2.8] [6.1 3.0] [6.4 2.8] [7.2 3.0] [7.4 2.8] [7.9 3.8] [6.4 2.8] [6.3 2.8] [6.1 2.6] [7.7 3.0] [6.3 3.4] [6.4 3.1] [6.0 3.0] [6.9 3.1] [6.7 3.1] [6.9 3.1] [5.8 2.7] [6.8 3.2] [6.7 3.3] [6.7 3.0] [6.3 2.5] [6.5 3.0] [6.2 3.4] [5.9 3.0]],
   :attribs {:fill "#333333", :stroke "none"},
   :layout
   #object[thi.ng.geom.viz.core$svg_scatter_plot 0x33789746 "thi.ng.geom.viz.core$svg_scatter_plot@33789746"]}]}

Render with grey background

(svg
 {:width panel-size
  :height panel-size}
 (svg/rect [0 0] panel-size panel-size {:fill (:grey-bg colors)})
 (viz/svg-plot2d-cartesian (plot-spec [:sepal-length
                                       :sepal-width])))
6824

We can see the relationship between sepal length and width!

Step 2: Color by Species

Now let’s color the points by species to see the three clusters.

Scatter plot with multiple colored series (one per species)

(defn colored-plot-spec [columns]
  (let [[x-col y-col] columns]
    {:x-axis (x-axis x-col)
     :y-axis (y-axis y-col)
     :grid {:attribs {:stroke (:grid colors)
                      :stroke-width 1.5}}
     :data (map (fn [species color]
                  (let [data (species-groups species)
                        points (-> data
                                   (tc/select-columns columns)
                                   tc/rows)]
                    {:values points
                     :attribs {:fill color
                               :stroke "none"}
                     :layout viz/svg-scatter-plot}))
                species-names
                (:species colors))}))
(colored-plot-spec [:sepal-length
                    :sepal-width])
{:x-axis
 {:scale
  #object[thi.ng.geom.viz.core$linear_scale$fn__82691 0x6bb421d2 "thi.ng.geom.viz.core$linear_scale$fn__82691@6bb421d2"],
  :major-size 3,
  :pos 340,
  :major (6.0 8.0),
  :label-dist 12,
  :attribs {:stroke "none"},
  :label #'data-visualization.splom-tutorial/int-label-fn,
  :label-style
  {:fill "black",
   :stroke "none",
   :font-family "Arial, sans-serif",
   :font-size 10,
   :text-anchor "middle"},
  :minor nil,
  :domain [4.12 8.08],
  :minor-size 0,
  :visible true,
  :range [60 340]},
 :y-axis
 {:scale
  #object[thi.ng.geom.viz.core$linear_scale$fn__82691 0x719bcb3e "thi.ng.geom.viz.core$linear_scale$fn__82691@719bcb3e"],
  :major-size 3,
  :pos 60,
  :major (2.0 4.0),
  :label-dist 12,
  :attribs {:stroke "none"},
  :label #'data-visualization.splom-tutorial/int-label-fn,
  :label-style
  {:fill "black",
   :stroke "none",
   :font-family "Arial, sans-serif",
   :font-size 10,
   :text-anchor "end"},
  :minor nil,
  :domain [1.88 4.5200000000000005],
  :minor-size 0,
  :visible true,
  :range [340 60]},
 :grid {:attribs {:stroke "#FFFFFF", :stroke-width 1.5}},
 :data
 ({:values
   [[5.1 3.5] [4.9 3.0] [4.7 3.2] [4.6 3.1] [5.0 3.6] [5.4 3.9] [4.6 3.4] [5.0 3.4] [4.4 2.9] [4.9 3.1] [5.4 3.7] [4.8 3.4] [4.8 3.0] [4.3 3.0] [5.8 4.0] [5.7 4.4] [5.4 3.9] [5.1 3.5] [5.7 3.8] [5.1 3.8] [5.4 3.4] [5.1 3.7] [4.6 3.6] [5.1 3.3] [4.8 3.4] [5.0 3.0] [5.0 3.4] [5.2 3.5] [5.2 3.4] [4.7 3.2] [4.8 3.1] [5.4 3.4] [5.2 4.1] [5.5 4.2] [4.9 3.1] [5.0 3.2] [5.5 3.5] [4.9 3.6] [4.4 3.0] [5.1 3.4] [5.0 3.5] [4.5 2.3] [4.4 3.2] [5.0 3.5] [5.1 3.8] [4.8 3.0] [5.1 3.8] [4.6 3.2] [5.3 3.7] [5.0 3.3]],
   :attribs {:fill "#F8766D", :stroke "none"},
   :layout
   #object[thi.ng.geom.viz.core$svg_scatter_plot 0x33789746 "thi.ng.geom.viz.core$svg_scatter_plot@33789746"]}
  {:values
   [[7.0 3.2] [6.4 3.2] [6.9 3.1] [5.5 2.3] [6.5 2.8] [5.7 2.8] [6.3 3.3] [4.9 2.4] [6.6 2.9] [5.2 2.7] [5.0 2.0] [5.9 3.0] [6.0 2.2] [6.1 2.9] [5.6 2.9] [6.7 3.1] [5.6 3.0] [5.8 2.7] [6.2 2.2] [5.6 2.5] [5.9 3.2] [6.1 2.8] [6.3 2.5] [6.1 2.8] [6.4 2.9] [6.6 3.0] [6.8 2.8] [6.7 3.0] [6.0 2.9] [5.7 2.6] [5.5 2.4] [5.5 2.4] [5.8 2.7] [6.0 2.7] [5.4 3.0] [6.0 3.4] [6.7 3.1] [6.3 2.3] [5.6 3.0] [5.5 2.5] [5.5 2.6] [6.1 3.0] [5.8 2.6] [5.0 2.3] [5.6 2.7] [5.7 3.0] [5.7 2.9] [6.2 2.9] [5.1 2.5] [5.7 2.8]],
   :attribs {:fill "#619CFF", :stroke "none"},
   :layout
   #object[thi.ng.geom.viz.core$svg_scatter_plot 0x33789746 "thi.ng.geom.viz.core$svg_scatter_plot@33789746"]}
  {:values
   [[6.3 3.3] [5.8 2.7] [7.1 3.0] [6.3 2.9] [6.5 3.0] [7.6 3.0] [4.9 2.5] [7.3 2.9] [6.7 2.5] [7.2 3.6] [6.5 3.2] [6.4 2.7] [6.8 3.0] [5.7 2.5] [5.8 2.8] [6.4 3.2] [6.5 3.0] [7.7 3.8] [7.7 2.6] [6.0 2.2] [6.9 3.2] [5.6 2.8] [7.7 2.8] [6.3 2.7] [6.7 3.3] [7.2 3.2] [6.2 2.8] [6.1 3.0] [6.4 2.8] [7.2 3.0] [7.4 2.8] [7.9 3.8] [6.4 2.8] [6.3 2.8] [6.1 2.6] [7.7 3.0] [6.3 3.4] [6.4 3.1] [6.0 3.0] [6.9 3.1] [6.7 3.1] [6.9 3.1] [5.8 2.7] [6.8 3.2] [6.7 3.3] [6.7 3.0] [6.3 2.5] [6.5 3.0] [6.2 3.4] [5.9 3.0]],
   :attribs {:fill "#00BA38", :stroke "none"},
   :layout
   #object[thi.ng.geom.viz.core$svg_scatter_plot 0x33789746 "thi.ng.geom.viz.core$svg_scatter_plot@33789746"]})}
(svg
 {:width panel-size
  :height panel-size}
 (svg/rect [0 0] panel-size panel-size {:fill (:grey-bg colors)})
 (viz/svg-plot2d-cartesian (colored-plot-spec [:sepal-length
                                               :sepal-width])))
6824

Now we can see a bit of the difference between the classes. Setosa (red) is clearly separated.

Step 3: Single Histogram

Before building a grid, let’s learn how to render a histogram. We’ll show the distribution of sepal width.

(-> (iris :sepal-width)
    (stats/histogram :sturges)
    :bins-maps)
({:min 2.0,
  :max 2.2666666666666666,
  :count 4,
  :step 0.2666666666666666,
  :mid 2.1333333333333333,
  :avg 2.1500000000000004,
  :probability 0.02666666666666667}
 {:min 2.2666666666666666,
  :max 2.533333333333333,
  :count 15,
  :step 0.2666666666666666,
  :mid 2.4,
  :avg 2.426666666666667,
  :probability 0.1}
 {:min 2.533333333333333,
  :max 2.8,
  :count 14,
  :step 0.2666666666666666,
  :mid 2.6666666666666665,
  :avg 2.664285714285714,
  :probability 0.09333333333333334}
 {:min 2.8,
  :max 3.066666666666667,
  :count 50,
  :step 0.26666666666666705,
  :mid 2.9333333333333336,
  :avg 2.924,
  :probability 0.3333333333333333}
 {:min 3.066666666666667,
  :max 3.3333333333333335,
  :count 30,
  :step 0.2666666666666666,
  :mid 3.2,
  :avg 3.1833333333333327,
  :probability 0.2}
 {:min 3.3333333333333335,
  :max 3.6,
  :count 18,
  :step 0.2666666666666666,
  :mid 3.466666666666667,
  :avg 3.4333333333333327,
  :probability 0.12}
 {:min 3.6,
  :max 3.866666666666667,
  :count 13,
  :step 0.26666666666666705,
  :mid 3.7333333333333334,
  :avg 3.7153846153846155,
  :probability 0.08666666666666667}
 {:min 3.866666666666667,
  :max 4.133333333333334,
  :count 4,
  :step 0.2666666666666666,
  :mid 4.0,
  :avg 3.975,
  :probability 0.02666666666666667}
 {:min 4.133333333333334,
  :max 4.4,
  :count 2,
  :step 0.2666666666666666,
  :mid 4.2666666666666675,
  :avg 4.300000000000001,
  :probability 0.013333333333333334})

The :sturges method automatically chooses a good number of bins based on the data size (Sturges’ rule: k = ⌈log₂(n) + 1⌉).

We need to manually render bars as SVG rectangles. We’ll use viz/linear-scale to map data values → pixel coordinates.

Let’s compute histogram data and scales in one go.

(defn compute-histogram-data [column]
  (let [hist (stats/histogram (iris column) :sturges)
        bins (:bins-maps hist)
        max-count (tcc/reduce-max (map :count bins))
        x-scale (viz/linear-scale (domains column) [margin (- panel-size margin)])
        y-scale (viz/linear-scale [0 max-count] [(- panel-size margin) margin])]
    {:bins bins
     :max-count max-count
     :x-scale x-scale
     :y-scale y-scale}))
(compute-histogram-data :sepal-width)
{:bins
 ({:min 2.0,
   :max 2.2666666666666666,
   :count 4,
   :step 0.2666666666666666,
   :mid 2.1333333333333333,
   :avg 2.1500000000000004,
   :probability 0.02666666666666667}
  {:min 2.2666666666666666,
   :max 2.533333333333333,
   :count 15,
   :step 0.2666666666666666,
   :mid 2.4,
   :avg 2.426666666666667,
   :probability 0.1}
  {:min 2.533333333333333,
   :max 2.8,
   :count 14,
   :step 0.2666666666666666,
   :mid 2.6666666666666665,
   :avg 2.664285714285714,
   :probability 0.09333333333333334}
  {:min 2.8,
   :max 3.066666666666667,
   :count 50,
   :step 0.26666666666666705,
   :mid 2.9333333333333336,
   :avg 2.924,
   :probability 0.3333333333333333}
  {:min 3.066666666666667,
   :max 3.3333333333333335,
   :count 30,
   :step 0.2666666666666666,
   :mid 3.2,
   :avg 3.1833333333333327,
   :probability 0.2}
  {:min 3.3333333333333335,
   :max 3.6,
   :count 18,
   :step 0.2666666666666666,
   :mid 3.466666666666667,
   :avg 3.4333333333333327,
   :probability 0.12}
  {:min 3.6,
   :max 3.866666666666667,
   :count 13,
   :step 0.26666666666666705,
   :mid 3.7333333333333334,
   :avg 3.7153846153846155,
   :probability 0.08666666666666667}
  {:min 3.866666666666667,
   :max 4.133333333333334,
   :count 4,
   :step 0.2666666666666666,
   :mid 4.0,
   :avg 3.975,
   :probability 0.02666666666666667}
  {:min 4.133333333333334,
   :max 4.4,
   :count 2,
   :step 0.2666666666666666,
   :mid 4.2666666666666675,
   :avg 4.300000000000001,
   :probability 0.013333333333333334}),
 :max-count 50,
 :x-scale
 #object[thi.ng.geom.viz.core$linear_scale$fn__82691 0x3af4a45f "thi.ng.geom.viz.core$linear_scale$fn__82691@3af4a45f"],
 :y-scale
 #object[thi.ng.geom.viz.core$linear_scale$fn__82691 0x7492f706 "thi.ng.geom.viz.core$linear_scale$fn__82691@7492f706"]}

Now we can render bars using the pre-computed data.

(defn histogram-bars [{:keys [bins x-scale y-scale]}]
  (map (fn [{:keys [min max count]}]
         (let [x1 (x-scale min)
               x2 (x-scale max)
               y (y-scale count)
               bar-width (- x2 x1)
               bar-height (- (- panel-size margin) y)]
           (svg/rect
            [x1 y]
            bar-width
            bar-height
            {:fill (:grey-points colors)
             :stroke "none"})))
       bins))

And create axes and grid using the same data.

(defn histogram-axes [column {:keys [max-count]}]
  (let [x-axis (viz/linear-axis
                {:domain (domains column)
                 :range [margin (- panel-size margin)]
                 :major 2.0
                 :pos (- panel-size margin)
                 :label-dist 12
                 :label int-label-fn
                 :major-size 3
                 :minor-size 0
                 :attribs {:stroke "none"}})
        y-axis (viz/linear-axis
                {:domain [0 max-count]
                 :label int-label-fn
                 :range [(- panel-size margin) margin]
                 :major 5
                 :pos margin
                 :label-dist 12
                 :label-style {:text-anchor "end"}
                 :major-size 3
                 :minor-size 0
                 :attribs {:stroke "none"}})]
    [(viz/svg-x-axis-cartesian x-axis)
     (viz/svg-y-axis-cartesian y-axis)
     (viz/svg-axis-grid2d-cartesian x-axis y-axis
                                    {:attribs {:stroke (:grid colors)
                                               :stroke-width 1.5}})]))

Usage: compute once, pass to both functions

(let [column :sepal-width
      hist-data (compute-histogram-data column)]
  (svg {:width panel-size
        :height panel-size}
       (svg/rect [0 0]
                 panel-size panel-size
                 {:fill (:grey-bg colors)})
       (histogram-axes column hist-data)
       (histogram-bars hist-data)))
2405101520253035404550

A simple histogram! Most flowers have sepal width around 3.0.

Step 4: Colored Histogram (Overlaid by Species)

Now let’s overlay three histograms (one per species) to see their different distributions.

Let’s compute histogram data for all three species.

(defn compute-colored-histogram-data [column]
  (let [;; Histogram for each species
        species-hists (mapv (fn [species]
                              {:species species
                               :hist (stats/histogram ((species-groups species) column) :sturges)})
                            species-names)
        ;; Find max count across all species for shared y-scale
        max-count (tcc/reduce-max
                   (mapcat (fn [{:keys [hist]}]
                             (map :count (:bins-maps hist)))
                           species-hists))
        x-scale (viz/linear-scale (domains column) [margin (- panel-size margin)])
        y-scale (viz/linear-scale [0 max-count] [(- panel-size margin) margin])]
    {:species-hists species-hists
     :max-count max-count
     :x-scale x-scale
     :y-scale y-scale}))
(compute-colored-histogram-data :sepal-width)
{:species-hists
 [{:species "setosa",
   :hist
   {:samples 50,
    :min 2.3,
    :bins-maps
    ({:min 2.3,
      :max 2.5999999999999996,
      :count 1,
      :step 0.2999999999999998,
      :mid 2.4499999999999997,
      :avg 2.3,
      :probability 0.02}
     {:min 2.5999999999999996,
      :max 2.9,
      :count 0,
      :step 0.30000000000000027,
      :mid 2.75,
      :avg ##NaN,
      :probability 0.0}
     {:min 2.9,
      :max 3.2,
      :count 11,
      :step 0.30000000000000027,
      :mid 3.05,
      :avg 3.027272727272728,
      :probability 0.22}
     {:min 3.2,
      :max 3.5,
      :count 16,
      :step 0.2999999999999998,
      :mid 3.35,
      :avg 3.325,
      :probability 0.32}
     {:min 3.5,
      :max 3.8000000000000003,
      :count 16,
      :step 0.30000000000000027,
      :mid 3.6500000000000004,
      :avg 3.63125,
      :probability 0.32}
     {:min 3.8000000000000003,
      :max 4.1,
      :count 3,
      :step 0.2999999999999994,
      :mid 3.95,
      :avg 3.9333333333333336,
      :probability 0.06}
     {:min 4.1,
      :max 4.4,
      :count 3,
      :step 0.3000000000000007,
      :mid 4.25,
      :avg 4.233333333333333,
      :probability 0.06}),
    :bins
    ([2.3 1]
     [2.5999999999999996 0]
     [2.9 11]
     [3.2 16]
     [3.5 16]
     [3.8000000000000003 3]
     [4.1 3]),
    :size 7,
    :frequencies
    {2.3 1,
     ##NaN 0,
     3.027272727272728 11,
     3.325 16,
     3.63125 16,
     3.9333333333333336 3,
     4.233333333333333 3},
    :max 4.4,
    :step 0.3000000000000001,
    :intervals
    (2.3 2.5999999999999996 2.9 3.2 3.5 3.8000000000000003 4.1 4.4)}}
  {:species "versicolor",
   :hist
   {:samples 50,
    :min 2.0,
    :bins-maps
    ({:min 2.0,
      :max 2.2,
      :count 1,
      :step 0.20000000000000018,
      :mid 2.1,
      :avg 2.0,
      :probability 0.02}
     {:min 2.2,
      :max 2.4,
      :count 5,
      :step 0.19999999999999973,
      :mid 2.3,
      :avg 2.2600000000000002,
      :probability 0.1}
     {:min 2.4,
      :max 2.6,
      :count 7,
      :step 0.20000000000000018,
      :mid 2.5,
      :avg 2.4571428571428577,
      :probability 0.14}
     {:min 2.6,
      :max 2.8,
      :count 8,
      :step 0.19999999999999973,
      :mid 2.7,
      :avg 2.6624999999999996,
      :probability 0.16}
     {:min 2.8,
      :max 3.0,
      :count 13,
      :step 0.20000000000000018,
      :mid 2.9,
      :avg 2.8538461538461535,
      :probability 0.26}
     {:min 3.0,
      :max 3.2,
      :count 11,
      :step 0.20000000000000018,
      :mid 3.1,
      :avg 3.027272727272727,
      :probability 0.22}
     {:min 3.2,
      :max 3.4,
      :count 5,
      :step 0.19999999999999973,
      :mid 3.3,
      :avg 3.2599999999999993,
      :probability 0.1}),
    :bins ([2.0 1] [2.2 5] [2.4 7] [2.6 8] [2.8 13] [3.0 11] [3.2 5]),
    :size 7,
    :frequencies
    {2.0 1,
     2.2600000000000002 5,
     2.4571428571428577 7,
     2.6624999999999996 8,
     2.8538461538461535 13,
     3.027272727272727 11,
     3.2599999999999993 5},
    :max 3.4,
    :step 0.19999999999999998,
    :intervals (2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4)}}
  {:species "virginica",
   :hist
   {:samples 50,
    :min 2.2,
    :bins-maps
    ({:min 2.2,
      :max 2.428571428571429,
      :count 1,
      :step 0.22857142857142865,
      :mid 2.3142857142857145,
      :avg 2.2,
      :probability 0.02}
     {:min 2.428571428571429,
      :max 2.657142857142857,
      :count 6,
      :step 0.2285714285714282,
      :mid 2.5428571428571427,
      :avg 2.533333333333333,
      :probability 0.12}
     {:min 2.657142857142857,
      :max 2.8857142857142857,
      :count 12,
      :step 0.22857142857142865,
      :mid 2.7714285714285714,
      :avg 2.766666666666667,
      :probability 0.24}
     {:min 2.8857142857142857,
      :max 3.1142857142857143,
      :count 18,
      :step 0.22857142857142865,
      :mid 3.0,
      :avg 3.011111111111111,
      :probability 0.36}
     {:min 3.1142857142857143,
      :max 3.3428571428571425,
      :count 8,
      :step 0.2285714285714282,
      :mid 3.2285714285714286,
      :avg 3.2375,
      :probability 0.16}
     {:min 3.3428571428571425,
      :max 3.571428571428571,
      :count 2,
      :step 0.22857142857142865,
      :mid 3.457142857142857,
      :avg 3.4,
      :probability 0.04}
     {:min 3.571428571428571,
      :max 3.8,
      :count 3,
      :step 0.22857142857142865,
      :mid 3.6857142857142855,
      :avg 3.733333333333333,
      :probability 0.06}),
    :bins
    ([2.2 1]
     [2.428571428571429 6]
     [2.657142857142857 12]
     [2.8857142857142857 18]
     [3.1142857142857143 8]
     [3.3428571428571425 2]
     [3.571428571428571 3]),
    :size 7,
    :frequencies
    {2.2 1,
     2.533333333333333 6,
     2.766666666666667 12,
     3.011111111111111 18,
     3.2375 8,
     3.4 2,
     3.733333333333333 3},
    :max 3.8,
    :step 0.2285714285714285,
    :intervals
    (2.2
     2.428571428571429
     2.657142857142857
     2.8857142857142857
     3.1142857142857143
     3.3428571428571425
     3.571428571428571
     3.8)}}],
 :max-count 18,
 :x-scale
 #object[thi.ng.geom.viz.core$linear_scale$fn__82691 0x440430a1 "thi.ng.geom.viz.core$linear_scale$fn__82691@440430a1"],
 :y-scale
 #object[thi.ng.geom.viz.core$linear_scale$fn__82691 0x2eb52007 "thi.ng.geom.viz.core$linear_scale$fn__82691@2eb52007"]}

Now we can render the overlaid colored bars.

(defn colored-histogram-bars [{:keys [species-hists x-scale y-scale]}]
  (mapcat
   (fn [idx {:keys [hist]}]
     (let [color ((:species colors) idx)]
       (map (fn [{:keys [min max count]}]
              (let [x1 (x-scale min)
                    x2 (x-scale max)
                    y (y-scale count)
                    bar-width (- x2 x1)
                    bar-height (- (- panel-size margin) y)]
                (svg/rect
                 [x1 y]
                 bar-width
                 bar-height
                 {:fill color
                  :stroke "none"
                  ;; semi-transparent for overlapping bars
                  :opacity 0.7})))
            (:bins-maps hist))))
   (range)
   species-hists))

And create the axes.

(defn colored-histogram-axes [column {:keys [max-count]}]
  (let [x-axis (viz/linear-axis
                {:domain (domains column)
                 :range [margin (- panel-size margin)]
                 :major 2.0
                 :pos (- panel-size margin)
                 :label-dist 12
                 :label int-label-fn
                 :major-size 3
                 :minor-size 0
                 :attribs {:stroke "none"}})
        y-axis (viz/linear-axis
                {:domain [0 max-count]
                 :label int-label-fn
                 :range [(- panel-size margin) margin]
                 :major 5
                 :pos margin
                 :label-dist 12
                 :label-style {:text-anchor "end"}
                 :major-size 3
                 :minor-size 0
                 :attribs {:stroke "none"}})]
    [(viz/svg-x-axis-cartesian x-axis)
     (viz/svg-y-axis-cartesian y-axis)
     (viz/svg-axis-grid2d-cartesian x-axis y-axis
                                    {:attribs {:stroke (:grid colors)
                                               :stroke-width 1.5}})]))

Usage: compute once, pass to both functions

(let [column :sepal-width
      hist-data (compute-colored-histogram-data column)]
  (svg
   {:width panel-size
    :height panel-size}
   (svg/rect [0 0] panel-size panel-size {:fill (:grey-bg colors)})
   (colored-histogram-axes column hist-data)
   (colored-histogram-bars hist-data)))
24051015

Beautiful! We can see Setosa (red) has wider sepals on average.

Step 5: 2×2 Grid of Scatter Plots

As preparation for the full 4×4 SPLOM we’ll build later, let’s start with a simpler 2×2 grid. This will let us explore the relationships between just two variables: sepal width and petal length.

Let’s create a helper to render a scatter panel at any grid position.

(defn make-grid-scatter-panel [x-col y-col row col]
  (let [x-offset (* col grid-panel-size)
        y-offset (* row grid-panel-size)
        x-axis (viz/linear-axis
                {:domain (domains x-col)
                 :range [(+ x-offset grid-margin)
                         (+ x-offset grid-panel-size (- grid-margin))]
                 :major 2.0
                 :label int-label-fn
                 :pos (+ y-offset grid-panel-size (- grid-margin))
                 :label-dist 12
                 :major-size 2
                 :minor-size 0
                 :attribs {:stroke "none"}})
        y-axis (viz/linear-axis
                {:domain (domains y-col)
                 :range [(+ y-offset grid-panel-size (- grid-margin))
                         (+ y-offset grid-margin)]
                 :major 2.0
                 :pos (+ x-offset grid-margin)
                 :label int-label-fn
                 :label-dist 12
                 :label-style {:text-anchor "end"}
                 :major-size 2
                 :minor-size 0
                 :attribs {:stroke "none"}})
        series (mapv (fn [species color]
                       (let [data (species-groups species)
                             ;; create [[x1 y1] [x2 y2] ...] point pairs
                             points (mapv vector
                                          (data x-col)
                                          (data y-col))]
                         {:values points
                          :attribs {:fill color
                                    :stroke "none"}
                          :layout viz/svg-scatter-plot}))
                     species-names
                     (:species colors))]
    (viz/svg-plot2d-cartesian
     {:x-axis x-axis
      :y-axis y-axis
      :grid {:attribs {:stroke (:grid colors)
                       :stroke-width 1.5}}
      :data series})))

Assemble the 2×2 grid

(let [grid-total-size (* 2 grid-panel-size)]
  (svg
   {:width grid-total-size
    :height grid-total-size}

   ;; Backgrounds first (z-order)
   (svg/rect [0 0] grid-panel-size grid-panel-size {:fill (:grey-bg colors)})
   (svg/rect [grid-panel-size 0] grid-panel-size grid-panel-size {:fill (:grey-bg colors)})
   (svg/rect [0 grid-panel-size] grid-panel-size grid-panel-size {:fill (:grey-bg colors)})
   (svg/rect [grid-panel-size grid-panel-size] grid-panel-size grid-panel-size {:fill (:grey-bg colors)})

   ;; The four scatter plots
   (make-grid-scatter-panel :sepal-width :sepal-width 0 0)
   (make-grid-scatter-panel :petal-length :sepal-width 0 1)
   (make-grid-scatter-panel :sepal-width :petal-length 1 0)
   (make-grid-scatter-panel :petal-length :petal-length 1 1)))
24242462424246246246

We can see relationships! Notice the diagonal panels show x=y which isn’t very informative. Let’s fix that next.

Step 6: 2×2 Grid with Diagonal Histograms

Our 2×2 grid currently shows x=y on the diagonal, which isn’t very informative. Let’s replace those panels with histograms showing the distribution of each variable.

Let’s do the same for histograms.

(defn make-grid-histogram-panel [column row col]
  (let [x-offset (* col grid-panel-size)
        y-offset (* row grid-panel-size)
        ;; Compute histogram data for all species
        species-hists (mapv (fn [species]
                              {:species species
                               ;; 12 bins for visual consistency across small grid panels
                               :hist (stats/histogram ((species-groups species) column) 12)})
                            species-names)
        max-count (tcc/reduce-max
                   (mapcat (fn [{:keys [hist]}]
                             (map :count (:bins-maps hist)))
                           species-hists))
        x-scale (viz/linear-scale (domains column)
                                  [(+ x-offset grid-margin)
                                   (+ x-offset grid-panel-size (- grid-margin))])
        y-scale (viz/linear-scale [0 max-count]
                                  [(+ y-offset grid-panel-size (- grid-margin))
                                   (+ y-offset grid-margin)])
        ;; Create axes
        x-axis (viz/linear-axis
                {:domain (domains column)
                 :range [(+ x-offset grid-margin) (+ x-offset grid-panel-size (- grid-margin))]
                 :major 2.0
                 :pos (+ y-offset grid-panel-size (- grid-margin))
                 :label-dist 12
                 :label int-label-fn
                 :major-size 3
                 :minor-size 0
                 :attribs {:stroke "none"}})
        y-axis (viz/linear-axis
                {:domain [0 max-count]
                 :range [(+ y-offset grid-panel-size (- grid-margin)) (+ y-offset grid-margin)]
                 :major (if (> max-count 20) 5 2)
                 :pos (+ x-offset grid-margin)
                 :label-dist 12
                 :label int-label-fn
                 :label-style {:text-anchor "end"}
                 :major-size 3
                 :minor-size 0
                 :attribs {:stroke "none"}})
        ;; Create bars
        bars (mapcat (fn [idx {:keys [hist]}]
                       (let [color ((:species colors) idx)]
                         (map (fn [{:keys [min max count]}]
                                (let [x1 (x-scale min)
                                      x2 (x-scale max)
                                      y (y-scale count)
                                      bar-width (- x2 x1)
                                      bar-height (- (+ y-offset grid-panel-size (- grid-margin)) y)]
                                  (svg/rect [x1 y] bar-width bar-height
                                            {:fill color
                                             :stroke "none"
                                             :opacity 0.7})))
                              (:bins-maps hist))))
                     (range)
                     species-hists)]
    (svg/group {}
               (viz/svg-x-axis-cartesian x-axis)
               (viz/svg-y-axis-cartesian y-axis)
               (viz/svg-axis-grid2d-cartesian x-axis y-axis
                                              {:attribs {:stroke (:grid colors)
                                                         :stroke-width 1.5}})
               bars)))

Now render the grid with colored histograms on the diagonal

(let [grid-total-size (* 2 grid-panel-size)]
  (svg
   {:width grid-total-size
    :height grid-total-size}

   ;; Background panels
   (svg/rect [0 0] grid-panel-size grid-panel-size {:fill (:grey-bg colors)})
   (svg/rect [grid-panel-size 0] grid-panel-size grid-panel-size {:fill (:grey-bg colors)})
   (svg/rect [0 grid-panel-size] grid-panel-size grid-panel-size {:fill (:grey-bg colors)})
   (svg/rect [grid-panel-size grid-panel-size] grid-panel-size grid-panel-size {:fill (:grey-bg colors)})

   ;; Top-left: histogram for sepal.width
   (make-grid-histogram-panel :sepal-width 0 0)

   ;; Top-right: sepal.width vs petal.length
   (make-grid-scatter-panel :petal-length :sepal-width 0 1)

   ;; Bottom-left: petal.length vs sepal.width  
   (make-grid-scatter-panel :sepal-width :petal-length 1 0)

   ;; Bottom-right: histogram for petal.length
   (make-grid-histogram-panel :petal-length 1 1)))
2402468101214162462424246246024681012

Perfect! Now we can see both relationships and distributions by species.

Step 7: Single Scatter with Regression Line

Beyond seeing the scatter of points, we often want to understand the overall trend. Let’s add a linear regression line to quantify the relationship.

First, let’s compute a linear regression.

(defn compute-regression [x-col y-col]
  (let [xs (iris x-col)
        ys (iris y-col)
        xss (mapv vector xs)
        model (regr/lm ys xss)
        slope (first (:beta model))
        intercept (:intercept model)]
    {:slope slope
     :intercept intercept}))

See what the regression coefficients look like:

(compute-regression :sepal-length :sepal-width)
{:slope -0.061884797964144256, :intercept 3.4189468361038178}

Now let’s turn those coefficients into an SVG line.

(defn regression-line [x-col y-col regression-data]
  (let [{:keys [slope intercept]} regression-data
        [x-min x-max] (domains x-col)
        x-scale (viz/linear-scale (domains x-col) [margin (- panel-size margin)])
        y-scale (viz/linear-scale (domains y-col) [(- panel-size margin) margin])
        y1 (+ intercept (* slope x-min))
        y2 (+ intercept (* slope x-max))]
    (svg/line [(x-scale x-min) (y-scale y1)]
              [(x-scale x-max) (y-scale y2)]
              {:stroke (:regression colors)
               :stroke-width 2})))

Render scatter plot with regression overlay

(let [x-col :sepal-length
      y-col :sepal-width
      regression-data (compute-regression x-col y-col)
      plot (colored-plot-spec [x-col y-col])]
  (svg
   {:width panel-size
    :height panel-size}
   (svg/rect [0 0] panel-size panel-size {:fill (:grey-bg colors)})
   (viz/svg-plot2d-cartesian plot)
   (regression-line x-col y-col regression-data)))
6824

The red line shows the linear trend: sepal width slightly decreases as sepal length increases.

Step 8: Single Scatter with Regression Lines by Species

In Step 7 we computed a single regression ignoring species. But what if each species has its own distinct relationship? Let’s find out by computing separate regression lines for each species group.

Let’s compute separate regressions for each species.

(defn compute-species-regressions [x-col y-col]
  (update-vals species-groups
               (fn [species-data]
                 (let [xs (species-data x-col)
                       ys (species-data y-col)
                       xss (mapv vector xs)
                       model (regr/lm ys xss)
                       slope (first (:beta model))
                       intercept (:intercept model)]
                   {:slope slope
                    :intercept intercept}))))

Per-species regression coefficients:

(compute-species-regressions :sepal-length :sepal-width)
{"setosa" {:slope 0.7985283006471516, :intercept -0.5694326730396391},
 "versicolor"
 {:slope 0.31971934554813414, :intercept 0.872145964826275},
 "virginica"
 {:slope 0.23189049503351364, :intercept 1.4463054187192146}}

Now let’s create the line SVGs for all species at once.

(defn species-regression-lines [x-col y-col species-regressions-data]
  (let [[x-min x-max] (domains x-col)
        x-scale (viz/linear-scale (domains x-col) [margin (- panel-size margin)])
        y-scale (viz/linear-scale (domains y-col) [(- panel-size margin) margin])]
    (mapv (fn [species]
            (let [{:keys [slope intercept]} (species-regressions-data species)
                  y1 (+ intercept (* slope x-min))
                  y2 (+ intercept (* slope x-max))]
              (svg/line [(x-scale x-min) (y-scale y1)]
                        [(x-scale x-max) (y-scale y2)]
                        {:stroke (species-color-map species)
                         :stroke-width 2
                         :opacity 0.7})))
          species-names)))

Render scatter plot with per-species regression lines

(let [x-col :sepal-length
      y-col :sepal-width
      regressions (compute-species-regressions x-col y-col)
      plot (colored-plot-spec [x-col y-col])]
  (svg
   {:width panel-size
    :height panel-size}
   (svg/rect [0 0] panel-size panel-size {:fill (:grey-bg colors)})
   (viz/svg-plot2d-cartesian plot)
   (species-regression-lines x-col y-col regressions)))
6824

Each species has its own trend! All three species show a positive relationship (wider sepals with longer sepals): Setosa has slope +0.80, Versicolor +0.32, Virginica +0.23.

This demonstrates Simpson’s Paradox: if we ignore species and fit a single regression to all data (as in Step 7), we get a negative slope (-0.06)! The overall trend reverses because the three species clusters are separated in feature space—Setosa has shorter sepals but wider widths, while Virginica has longer sepals but narrower widths. The between-group variation dominates the within-group pattern.

Step 9: 2×2 Grid with Regression Lines

Now let’s bring together what we learned: we’ll add per-species regression lines to our 2×2 grid from Step 6, combining diagonal histograms with regression-overlaid scatter plots.

We’ll need per-species regression lines positioned in the grid.

(defn make-grid-regression-lines [x-col y-col row col species-regressions-data]
  (let [x-offset (* col grid-panel-size)
        y-offset (* row grid-panel-size)
        [x-min x-max] (domains x-col)
        x-scale (viz/linear-scale (domains x-col)
                                  [(+ x-offset grid-margin)
                                   (+ x-offset grid-panel-size (- grid-margin))])
        y-scale (viz/linear-scale (domains y-col)
                                  [(+ y-offset grid-panel-size (- grid-margin))
                                   (+ y-offset grid-margin)])]
    (mapv (fn [species]
            (let [{:keys [slope intercept]} (species-regressions-data species)
                  y1 (+ intercept (* slope x-min))
                  y2 (+ intercept (* slope x-max))]
              (svg/line [(x-scale x-min) (y-scale y1)]
                        [(x-scale x-max) (y-scale y2)]
                        {:stroke (species-color-map species)
                         :stroke-width 2
                         :opacity 0.7})))
          species-names)))

Render grid with histograms and per-species regression lines

(let [grid-total-size (* 2 grid-panel-size)
      ;; Compute regressions for the two scatter panels (row-col naming: 01 = row 0 col 1, etc.)
      regressions-01 (compute-species-regressions :petal-length :sepal-width) ; top-right panel
      regressions-10 (compute-species-regressions :sepal-width :petal-length)] ; bottom-left panel
  (svg
   {:width grid-total-size
    :height grid-total-size}

   ;; Background panels
   (svg/rect [0 0] grid-panel-size grid-panel-size {:fill (:grey-bg colors)})
   (svg/rect [grid-panel-size 0] grid-panel-size grid-panel-size {:fill (:grey-bg colors)})
   (svg/rect [0 grid-panel-size] grid-panel-size grid-panel-size {:fill (:grey-bg colors)})
   (svg/rect [grid-panel-size grid-panel-size] grid-panel-size grid-panel-size {:fill (:grey-bg colors)})

   ;; Top-left: histogram for sepal.width
   (make-grid-histogram-panel :sepal-width 0 0)

   ;; Top-right: petal.length (x) vs sepal.width (y) with per-species regressions
   (make-grid-scatter-panel :petal-length :sepal-width 0 1)
   (make-grid-regression-lines :petal-length :sepal-width 0 1 regressions-01)

   ;; Bottom-left: sepal.width (x) vs petal.length (y) with per-species regressions
   (make-grid-scatter-panel :sepal-width :petal-length 1 0)
   (make-grid-regression-lines :sepal-width :petal-length 1 0 regressions-10)

   ;; Bottom-right: histogram for petal.length
   (make-grid-histogram-panel :petal-length 1 1)))
2402468101214162462424246246024681012

Beautiful! Each species shows its own trend in both scatter panels. Notice how the three colored regression lines reveal different relationships for each species across the grid.

We’ve now seen the same pattern several times: render scatter plots with regressions on off-diagonal panels, and histograms on the diagonal. Let’s abstract this.

Step 10: Extract Helper Function

Looking at our 2×2 grid rendering, we’re repeating a pattern:

  • Diagonal panels (row = col) → histogram
  • Off-diagonal panels (row ≠ col) → scatter + regressions

Let’s abstract this into a single helper function.

Here’s our abstraction: one function that decides what to render based on position.

(defn render-panel [x-col y-col row col species-regressions-data]
  (if (= row col)
    ;; Diagonal: histogram
    (make-grid-histogram-panel x-col row col)
    ;; Off-diagonal: scatter + regression lines
    (list
     (make-grid-scatter-panel x-col y-col row col)
     (make-grid-regression-lines x-col y-col row col species-regressions-data))))

Notice how render-panel chooses what to render: Diagonal (row=col): returns histogram Off-diagonal: returns scatter + regressions Now we can render the same 2×2 grid more concisely

(let [grid-total-size (* 2 grid-panel-size)
      cols [:sepal-width :petal-length]
      ;; Pre-compute regressions for all variable pairs
      regressions-map {[:petal-length :sepal-width] 
                       (compute-species-regressions :petal-length :sepal-width)
                       [:sepal-width :petal-length]
                       (compute-species-regressions :sepal-width :petal-length)}]
  (svg
   {:width grid-total-size
    :height grid-total-size}
   
   ;; Background panels
   (for [row (range 2)
         col (range 2)]
     (svg/rect [(* col grid-panel-size) (* row grid-panel-size)]
               grid-panel-size grid-panel-size
               {:fill (:grey-bg colors)}))
   
   ;; Render all panels using our helper
   (for [row (range 2)
         col (range 2)]
     (let [x-col (cols col)
           y-col (cols row)
           regressions (get regressions-map [x-col y-col])]
       (render-panel x-col y-col row col regressions)))))
2402468101214162462424246246024681012

Same result as before, but now the pattern is explicit and reusable!

Step 11: Scale to 4×4 Grid

Now that we have the abstraction, scaling to all 4 iris variables is straightforward. This creates a 4×4 grid = 16 panels total (4 histograms + 12 scatter plots).

All numerical columns from iris

(def all-cols [:sepal-length :sepal-width :petal-length :petal-width])

Pre-compute all pairwise regressions (we only need off-diagonal pairs)

(def all-regressions
  (into {}
        (for [row-idx (range 4)
              col-idx (range 4)
              :when (not= row-idx col-idx)]
          (let [x-col (all-cols col-idx)
                y-col (all-cols row-idx)]
            [[x-col y-col] (compute-species-regressions x-col y-col)]))))

That’s 12 regression pairs (4 choose 2 × 2 directions), one for each off-diagonal panel. Sample of pre-computed regressions (showing just two pairs):

(select-keys all-regressions [[:sepal-length :sepal-width] 
                              [:petal-length :petal-width]])
{[:sepal-length :sepal-width]
 {"setosa" {:slope 0.7985283006471516, :intercept -0.5694326730396391},
  "versicolor"
  {:slope 0.31971934554813414, :intercept 0.872145964826275},
  "virginica"
  {:slope 0.23189049503351364, :intercept 1.4463054187192146}},
 [:petal-length :petal-width]
 {"setosa"
  {:slope 0.20124509405873586, :intercept -0.04822032751387184},
  "versicolor"
  {:slope 0.3310536044362289, :intercept -0.0842883548983339},
  "virginica"
  {:slope 0.1602969554030887, :intercept 1.1360313036020537}}}

Render the complete 4×4 SPLOM

(let [n 4
      grid-total-size (* n grid-panel-size)]
  (svg
   {:width grid-total-size
    :height grid-total-size}
   
   ;; Background panels
   (for [row (range n)
         col (range n)]
     (svg/rect [(* col grid-panel-size) (* row grid-panel-size)]
               grid-panel-size grid-panel-size
               {:fill (:grey-bg colors)}))
   
   ;; Render all 16 panels
   (for [row (range n)
         col (range n)]
     (let [x-col (all-cols col)
           y-col (all-cols row)
           regressions (get all-regressions [x-col y-col])]
       (render-panel x-col y-col row col regressions)))))
680246810246824668026868242402468101214162462402246824624246246024681012022466802240224602020510152025

A complete scatter plot matrix!

  • 4 diagonal histograms show distributions
  • 12 off-diagonal scatter plots show all pairwise relationships
  • Per-species regression lines reveal species-specific trends

Notice the symmetry: the upper and lower triangles are mirror images of each other (x vs y in one panel corresponds to y vs x in the reflected panel). Some SPLOM designs only show one triangle to avoid redundancy.

Reflection: What We’ve Built

Over the past 12 steps, we’ve built a complete scatter plot matrix from scratch. We started with basic scatter plots, added color encoding by species, learned to render histograms manually, arranged everything in grids, overlaid regression lines, and finally abstracted the whole pattern to scale from 2×2 to 4×4.

The code here is deliberately explicit. An upcoming library API would handle these details for you with sensible defaults which are still extensible and composable. This will be the topic of another blogpost, coming soon.

We’re also exploring interactive features using D3.js, including brushable SPLOMs where selections in one panel highlight points across all panels.

source: src/data_visualization/splom_tutorial.clj