Draft

Visual data summaries

Can we plot interesting charts for all columns of a dataset?
Author
Affiliation
Published

January 11, 2026

Keywords

datavis, composition, operators

When exploring a new dataset, we face an immediate challenge: How do we quickly understand the structure and distribution of all our columns?

This notebook explores a “show everything” approach:

The goal is to enable rapid visual discovery of patterns and relationships.

Starting with a Complete Dataset

Let’s load a well-known dataset and explore how to present its columns effectively:

(def penguins
  (tc/drop-missing (rdatasets/palmerpenguins-penguins)))

Option 1: Print the data

We could just print the first few rows, but that only shows a small sample:

penguins

https://vincentarelbundock.github.io/Rdatasets/csv/palmerpenguins/penguins.csv [333 9]:

:rownames :species :island :bill-length-mm :bill-depth-mm :flipper-length-mm :body-mass-g :sex :year
1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
2 Adelie Torgersen 39.5 17.4 186 3800 female 2007
3 Adelie Torgersen 40.3 18.0 195 3250 female 2007
5 Adelie Torgersen 36.7 19.3 193 3450 female 2007
6 Adelie Torgersen 39.3 20.6 190 3650 male 2007
7 Adelie Torgersen 38.9 17.8 181 3625 female 2007
8 Adelie Torgersen 39.2 19.6 195 4675 male 2007
13 Adelie Torgersen 41.1 17.6 182 3200 female 2007
14 Adelie Torgersen 38.6 21.2 191 3800 male 2007
15 Adelie Torgersen 34.6 21.1 198 4400 male 2007
334 Chinstrap Dream 49.3 19.9 203 4050 male 2009
335 Chinstrap Dream 50.2 18.8 202 3800 male 2009
336 Chinstrap Dream 45.6 19.4 194 3525 female 2009
337 Chinstrap Dream 51.9 19.5 206 3950 male 2009
338 Chinstrap Dream 46.8 16.5 189 3650 female 2009
339 Chinstrap Dream 45.7 17.0 195 3650 female 2009
340 Chinstrap Dream 55.8 19.8 207 4000 male 2009
341 Chinstrap Dream 43.5 18.1 202 3400 female 2009
342 Chinstrap Dream 49.6 18.2 193 3775 male 2009
343 Chinstrap Dream 50.8 19.0 210 4100 male 2009
344 Chinstrap Dream 50.2 18.7 198 3775 female 2009

Option 2: Summary statistics

We could compute statistics for a single column:

(fms/stats-map (:bill-length-mm penguins))
{:MAD 4.700000000000003,
 :UOF 76.40000000000003,
 :Skewness 0.045340470420402026,
 :Max 59.6,
 :Variance 29.906333441875624,
 :Size 333,
 :LAV 32.1,
 :UIF 62.52500000000002,
 :Mode 41.1,
 :Mean 43.99279279279281,
 :Q1 39.4,
 :Q3 48.650000000000006,
 :Min 32.1,
 :LIF 25.524999999999988,
 :Range 27.5,
 :Total 14649.600000000006,
 :SD 5.468668342647561,
 :IQR 9.250000000000007,
 :Outliers (),
 :UAV 59.6,
 :LOF 11.649999999999977,
 :SEM 0.29968117914670855,
 :Kurtosis -0.8834182330572031,
 :Median 44.5}

But this requires mental effort to visualize what the numbers mean.

Option 3: Visual summaries

What if we automatically plot the distribution of every column? This lets us see patterns at a glance.

Visualization inference

(def plot-width 100)
(def plot-height 100)

Type detection: determines whether to show histograms (numeric) or bar charts (categorical)

(defn is-numeric-type? [col]
  (tcc/typeof? col :numerical))
(defn plot-basic [g]
  (let [{:keys [data mappings geometry]} (g 1)
        {:keys [x y]} mappings]
    (for [geom geometry]
      (case geom
        :bar (let [x-vals (remove nil? (data x))
                   categories (distinct x-vals)
                   counts (frequencies x-vals)
                   max-count (when (seq counts) (apply max (vals counts)))
                   bar-width (/ plot-width (count categories))]
               (when max-count
                 (for [[i cat] (map-indexed vector categories)]
                   (let [count (get counts cat 0)
                         bar-height (* (/ count max-count) plot-height)]
                     [:rect {:x (* i bar-width)
                             :y (- plot-height bar-height)
                             :width bar-width
                             :height bar-height
                             :fill "lightblue"
                             :stroke "gray"
                             :stroke-width 0.5}]))))
        :histogram (let [values (remove nil? (data x))
                         hist-result (when (seq values) (fms/histogram values))
                         bins (:bins-maps hist-result)]
                     (when (seq bins)
                       (let [max-count (apply max (map :count bins))
                             bin-width (/ plot-width (count bins))]
                         (for [[i bin] (map-indexed vector bins)]
                           (let [bar-height (* (/ (:count bin) max-count) plot-height)]
                             [:rect {:x (* i bin-width)
                                     :y (- plot-height bar-height)
                                     :width bin-width
                                     :height bar-height
                                     :fill "lightblue"
                                     :stroke "gray"
                                     :stroke-width 0.5}])))))
        :point (let [xys (mapv (juxt x y) data)]
                 (for [[x y] xys]
                   [:circle {:r 2, :cx x, :cy y, :fill "lightblue"}]))
        :line (let [xys (mapv (juxt x y) data)]
                [:path {:d (str "M " (str/join ","
                                               (first xys))
                                " L " (str/join " "
                                                (map #(str/join "," %)
                                                     (rest xys))))}])))))
(defn plot-distribution [ds column geom]
  ^:kind/hiccup
  [:svg {:width   100
         :viewBox (str/join " " [0 0 plot-width plot-height])
         :xmlns   "http://www.w3.org/2000/svg"
         :style {:border "solid 1px gray"}}
   [:g {:stroke "gray", :fill "none"}
    (plot-basic [:graphic {:data ds
                           :mappings {:x column}
                           :geometry geom}])]])
(plot-distribution penguins :bill-length-mm [:histogram])

Single Column Summaries

The summarize function automatically selects the right visualization type:

  • Numeric columns → histogram (shows distribution shape)
  • Categorical columns → bar chart (shows frequencies)
(defn summarize [ds column]
  (if (is-numeric-type? (ds column))
    (plot-distribution ds column [:histogram])
    (plot-distribution ds column [:bar])))

Companion function: provides numeric summaries alongside visualizations Shows count, mean, standard deviation, min/max for numeric data Shows count and unique values for categorical data

(defn get-summary-stats [ds column]
  (let [col (ds column)]
    (if (is-numeric-type? col)
      (let [stats (tcc/descriptive-statistics col)]
        (format "n: %d, μ: %.2f, σ: %.2f, min: %.2f, max: %.2f"
                (:n-elems stats)
                (:mean stats)
                (:standard-deviation stats)
                (:min stats)
                (:max stats)))
      (let [values (tcc/drop-missing col)
            counts (frequencies values)]
        (str "n: " (count values) ", unique: " (count counts))))))

Summary Table: All Columns at a Glance

Combines visualization + statistics for every column. This gives us a complete overview of the dataset’s structure.

(defn visual-summary [ds]
  (kind/table
   (doall (for [column-name (tc/column-names ds)]
            [column-name (summarize ds column-name) (get-summary-stats ds column-name)]))))
(visual-summary penguins)
rownames n: 333, μ: 174.32, σ: 98.39, min: 1.00, max: 344.00
species n: 333, unique: 3
island n: 333, unique: 3
bill-length-mm n: 333, μ: 43.99, σ: 5.47, min: 32.10, max: 59.60
bill-depth-mm n: 333, μ: 17.16, σ: 1.97, min: 13.10, max: 21.50
flipper-length-mm n: 333, μ: 200.97, σ: 14.02, min: 172.00, max: 231.00
body-mass-g n: 333, μ: 4207.06, σ: 805.22, min: 2700.00, max: 6300.00
sex n: 333, unique: 2
year n: 333, μ: 2008.04, σ: 0.81, min: 2007.00, max: 2009.00

Matrix View: All Column Combinations

The next step: instead of showing each column separately, what if we show how every column relates to every other column? This is the idea behind the scatterplot matrix.

The matrix automatically chooses the right chart for each combination:

  • Numeric × Numeric → scatter plot (reveal relationships)
  • Otherwise → bar chart (show distribution differences)
(defn matrix [ds]
  (let [column-names (tc/column-names ds)
        c (count column-names)]
    ^:kind/hiccup
    [:svg {:width   "100%"
           :viewBox (str/join " " [0 0 (* plot-width c) (* plot-height c)])
           :xmlns   "http://www.w3.org/2000/svg"
           :style {:border "solid 1px gray"}}
     [:g {:stroke "gray", :fill "none"}
      (for [[a-idx a] (map-indexed vector column-names)
            [b-idx b] (map-indexed vector column-names)]
        (let [col-a (ds a)
              col-b (ds b)
              a-numeric? (is-numeric-type? col-a)
              b-numeric? (is-numeric-type? col-b)]
          [:g {:transform (str "translate(" (* a-idx plot-width) "," (* b-idx plot-height) ")")}
           [:rect {:x 0 :y 0 :width plot-width :height plot-height
                   :fill "none" :stroke "gray" :stroke-width 1}]
           (plot-basic [:graphic {:data ds
                                  :mappings {:x a :y b}
                                  :geometry (cond
                                              (and a-numeric? b-numeric?) [:point]
                                              :else [:bar])}])]))]]))
(matrix penguins)
source: src/data_visualization/aog/column_combinations.clj