Data Visualization

From Data to Decisions

"Raw numbers are inert. A table of 10,000 pageview records tells you nothing until a human brain processes it. Visualization is the bridge between collected data and human understanding."

CSE 135 — Full Overview | Tutorial

← → sections • ↓ more detail

Section 1Why Visualize Data?

200 ms to see what a spreadsheet cannot tell you.

The Speed of Vision

The human visual system processes spatial patterns in 200–500 milliseconds — far faster than reading a table of numbers. This is not a minor efficiency gain; it is a qualitative difference in what the brain can discover.

A table with 10 rows is readable. 100 rows is tedious. 10,000 rows is useless without aggregation or visualization.
Trends, outliers, clusters, and gaps invisible in a spreadsheet become obvious in a well-chosen chart.

Anscombe's Quartet (1973)

Four datasets that are statistically identical (same mean, variance, correlation, regression line) but look completely different when plotted.

Dataset I: linear Dataset II: curve Dataset III: outlier Dataset IV: vertical * * * * * * * * * * * * * * * * (outlier) * * * * * * * * * * * * * * * *

Summary statistics can hide radically different data structures. Plotting the data reveals what the numbers conceal. This was demonstrated over fifty years ago — extended to absurdity in 2017 by the Datasaurus Dozen.

Visual Channel	Best For	Accuracy	Example
Position (x, y)	Quantitative	Highest	Scatter plot, line chart
Length	Quantitative	Very high	Bar chart
Angle / Slope	Quantitative	Moderate	Pie chart, line slope
Area	Quantitative	Low	Bubble chart, treemap
Color hue	Categorical	N/A	Legend groups
Color saturation	Ordered	Low	Heat map, choropleth
Shape	Categorical	N/A	Different point shapes

Dimension	Exploratory	Explanatory
Purpose	Discover what you don't know	Communicate what you do know
Audience	You (the analyst)	Others (stakeholders, team)
Volume	Many charts, most discarded	Few charts, carefully chosen
Polish	Messy, fast, disposable	Clean, labeled, annotated
Interaction	Filter, zoom, pivot freely	Guided narrative or fixed view
Tooling	Notebooks, ad hoc scripts	Dashboards, reports, presentations
Risk	Missing an insight	Miscommunicating an insight

Question	Chart Type	Watch Out For
How do categories compare?	Bar / Grouped bar	>15 bars unreadable; >4 groups confusing
How do categories compare?	Lollipop	Less familiar to some audiences
Parts of a whole?	Pie / Donut	>5 slices unreadable
Parts of a whole?	Stacked bar / Treemap	Interior segments hard to compare
Distribution?	Histogram / Box plot	Bin width changes story; box hides multi-modality
Change over time?	Line chart	Too many lines overlap
Change over time?	Area / Sparkline	Stacked areas distort upper layers
Relationships?	Scatter / Bubble	Overplotting; area encoding imprecise
Current status?	Gauge / KPI card	Gauge wastes space; KPI needs comparison

Factor	Client-Side	Server-Side	Hybrid
Interactivity	Full (hover, click, zoom)	None (static image)	Full
Data volume	Limited by browser	Limited by server	Aggregated — small
Latency	Fast after load	Round-trip per image	Fast after load
Rendering tech	Canvas, SVG	Image library	Canvas/SVG
Example tools	Chart.js, D3	matplotlib, QuickChart	Grafana, Metabase

Data Points	JSON Size	Transfer (3G)	JSON.parse()	Canvas Render
100	~10 KB	instant	< 1 ms	< 5 ms
10,000	~1 MB	3s	~20 ms	~30 ms
100,000	~10 MB	30s	~200 ms	~80 ms
1,000,000	~100 MB	5 min	~2,000 ms	~300 ms

Format	Size vs. JSON	Streamable	Parse Speed
CSV	~0.5×	Yes	Faster (no keys)
NDJSON	~0.95×	Yes	Row-at-a-time
MessagePack	~0.6×	No	2–5× faster
Apache Arrow	~0.3×	Yes	10–50× (zero-copy)
Protocol Buffers	~0.4×	With framing	5–10× faster

Surface	Model	Interactivity	Accessibility	Examples
Static image	Pre-rendered	None	`alt` text	matplotlib, node-canvas
CSS	Retained (DOM)	DOM events	Full semantic	CSS-only bars
Canvas 2D	Immediate	Manual hit-testing	Opaque (ARIA)	Chart.js, ZingChart
SVG	Retained (DOM)	Native per-element	Traversable	D3.js
WebGL/WebGPU	Immediate (GPU)	Raycasting	Opaque (ARIA)	deck.gl, Mapbox GL

Dimension	Static Img	CSS	Canvas	SVG	WebGL
Explanatory (reports, email)	Yes	Inline
Interactive analysis			Yes	Yes	Yes
< 50 data points		Yes		Yes
50 – 5K points			Yes	Yes
5K – 100K points			Yes
> 100K points					Yes
Real-time streaming			Yes		Yes
Works without JS	Yes	Yes

Factor	Declarative (Chart.js)	Imperative (D3)
Learning curve	Low — configure, not code	High — selections, joins, scales
Time to first chart	Minutes	Hours
Customization ceiling	Limited to library API	Unlimited — every pixel
Animation	Built-in (limited)	Full control (enter/update/exit)
Accessibility	Canvas (no DOM nodes)	SVG (DOM nodes, can add ARIA)
Best for	Standard charts, dashboards	Custom/novel visualizations

Phase	Mode	What Happens
1. Stem	Author-driven	Guided path through a focused narrative — annotations, curated views, controlled sequence
2. Transition	Handoff	Key conclusion delivered — the single insight the author wants to land
3. Bowl	Reader-driven	Opens up for free exploration — filter, drill-down, ask your own questions

Type	Purpose	Update Frequency	Example
Operational	Real-time health	Seconds–minutes	Error rates, active users
Analytical	Trends & patterns	Daily–weekly	Funnels, A/B tests
Strategic	High-level KPIs	Weekly–monthly	Revenue, retention

Technique	How It Misleads	How to Detect
Truncated y-axis	Starting above zero exaggerates bar differences	Check y-axis origin
Aspect ratio manipulation	Stretching/compressing changes perceived slope	Check axis interval consistency
3D distortion	Perspective makes front elements look larger	Ask: does the 3rd dimension encode data?
Dual y-axes	Scale choices fabricate apparent correlation	Re-scale one axis mentally
Cherry-picked time range	Shows only a favorable trend	Ask: why this start date?
Inverted axes	"Up" means "worse"	Check axis direction
Area/radius confusion	2× radius = 4× area	Check if size scales by area or radius

Feature	What It Does	Analytics Use Case
Column sorting	Click header to sort	Find slowest pages, top referrers
Filtering / search	Narrow rows to criteria	Errors from a specific URL
Pagination	Page through large sets	Navigate thousands of pageviews
Virtual scrolling	Render only visible rows	Handle 10K+ rows without DOM explosion
Conditional styling	Color cells by value	Red for CLS > 0.25, green for good LCP
CSV/JSON export	Download visible data	Share findings with team

#	Question	What to Look For
1	What are the axes?	Labels, units, scale (linear vs. log)
2	Does the y-axis start at zero?	Truncation exaggerates bar differences
3	What is the sample size?	"Up 50%" might mean 2 users → 3
4	What time period is shown?	Cherry-picked ranges create false narratives
5	Compared to what?	A number without baseline is meaningless
6	Correlation or causation?	Two lines moving together ≠ one causes the other
7	What is not shown?	Survivorship bias, excluded data
8	Who made this and why?	Advocacy vs. analytical intent
9	Can I access the raw data?	Unverifiable claims deserve caution
10	Would a different chart change the story?	If encoding seems designed to emphasize, consider alternatives

Term	Definition
Visual encoding	Mapping data values to visual properties (position, length, color, etc.)
Decoding	Extracting data values from visual marks; accuracy varies by channel
Anscombe's Quartet	Four datasets with identical statistics but different visual patterns
Cleveland & McGill	Empirical hierarchy: position > length > angle > area > color
Exploratory vs. Explanatory	Discover unknowns vs. communicate knowns
Declarative vs. Imperative	Chart.js (config) vs. D3 (code) — fast vs. powerful
Martini Glass	Author-driven narrative → conclusion → reader-driven exploration
Immediate vs. Retained	Canvas (pixels, forgotten) vs. SVG (DOM nodes, persistent)
Chartjunk	Visual elements not encoding data (Tufte); but memorability may counter
Data literacy	Ability to read, interpret, and critically evaluate visualizations

#	Section	Key Point
1	Why Visualize?	200 ms pattern recognition; Anscombe's Quartet; vision ≠ boolean
2	Visual Encoding	Bertin's channels; not all channels equally accurate
3	Decoding	Position > length > angle > area > color (measured, not opinion)
4	Explore vs. Explain	30 charts → 1 insight; don't dump notebooks into presentations
5	Conventions	240 years of shared visual language; breaking convention has a cost
6	Chart Inventory	Choose by question asked, not appearance; there is no "best chart"
7	Client vs. Server	Transfer + parse dwarfs render at scale; hybrid aggregation wins
8	Rendering	5 surfaces; immediate vs. retained; DOM explosion limits SVG at scale
9	Declarative vs. Coded	Chart.js = fast + standard; D3 = powerful + custom
10	Encoding Demo	Same data, different encoding = different story; editorial choice
11	Storytelling	Martini Glass; annotation is critical; guide then open
12	Dashboards	Visual hierarchy; five-second test; showing everything shows nothing
13	Misleading Viz	Truncated axes, 3D, cherry-picked ranges; chartjunk debate
14	Data Tables	Charts + tables = complete picture; drill-down; progressive enhancement
15	Data Literacy	10-question checklist; honest chart + illiterate viewer = greatest risk
16	D3 Demo	Primitives, not types; radial lollipop impossible in declarative libs
17	Pipeline	Viz is last mile; quality bounded by upstream; feedback loop

Data Visualization

Section 1Why Visualize Data?

The Speed of Vision

Anscombe's Quartet (1973)

Vision Isn't Boolean

Section 2Visual Encoding

Bertin's Visual Channels

Subjectivity & Culture

Section 3Decoding & Perception

Cleveland & McGill Ranking (1984)

Pie Charts: Not Forbidden

Section 4Exploratory vs. Explanatory

Comparison

Section 5Chart Conventions

Timeline of Dataviz Milestones

Charts as Visual Contracts

Dataviz Didn't Start with Computing

Section 6Chart Type Inventory

Question → Chart Type

Section 7Client vs. Server Charting

Architecture Comparison

The Data Transfer Problem

JSON Is the Problem Child

Alternative Wire Formats

Section 8Rendering Technologies

Five Rendering Surfaces

Immediate vs. Retained Mode

Performance Envelopes

Decision Framework

Section 9Declarative vs. Coded

Chart.js vs. D3 Side-by-Side

Chart.js (Declarative, ~15 lines)

D3 (Imperative, ~25 lines)

Comparison Table

Section 10Encoding Demo

Three Encodings, One Dataset

Grouped Bar

Stacked Bar

Line Chart

Section 11Data Storytelling

The Martini Glass (Segel & Heer, 2010)

Annotation Is Critical

Without Annotation

With Annotation

Section 12Dashboards

Design Principles

Dashboard Types

Dashboard Anti-Patterns

Section 13Misleading Visualizations

Common Misleading Techniques

Truncated Y-Axis

Y starts at 0 (honest)

Y starts at 95 (misleading)

Chartjunk & Tufte

Section 14Raw Data & Tables

Chart + Table Pairing

Interactive Table Features

Progressive Enhancement

Section 15Data Literacy

10-Question Checklist

The Greatest Risk

Deception by Defaults

Section 16D3 in Action

Radial Lollipop Chart

Section 17Analytics Pipeline

Pipeline: Collection to Decision

The Feedback Loop

SummaryKey Takeaways

Key Terms (Subset)

Section Summary