From Data to Decisions
"Raw numbers are inert. A table of 10,000 pageview records tells you nothing until a human brain processes it. Visualization is the bridge between collected data and human understanding."
CSE 135 — Full Overview | Tutorial
200 ms to see what a spreadsheet cannot tell you.
The human visual system processes spatial patterns in 200–500 milliseconds — far faster than reading a table of numbers. This is not a minor efficiency gain; it is a qualitative difference in what the brain can discover.
Four datasets that are statistically identical (same mean, variance, correlation, regression line) but look completely different when plotted.
Mapping data values to visual marks — the fundamental building block.
Jacques Bertin (Sémiologie Graphique, 1967) identified the core visual channels:
| Visual Channel | Best For | Accuracy | Example |
|---|---|---|---|
| Position (x, y) | Quantitative | Highest | Scatter plot, line chart |
| Length | Quantitative | Very high | Bar chart |
| Angle / Slope | Quantitative | Moderate | Pie chart, line slope |
| Area | Quantitative | Low | Bubble chart, treemap |
| Color hue | Categorical | N/A | Legend groups |
| Color saturation | Ordered | Low | Heat map, choropleth |
| Shape | Categorical | N/A | Different point shapes |
Encoding into visual forms has aspects that do not cross all cultures.
How accurately can humans extract values from visual marks?
Empirical ranking of perceptual accuracy, from most to least accurate:
Replicated by Heer & Bostock (2010) via crowdsourced experiments on Mechanical Turk.
Pie charts are legitimate when:
Two fundamentally different purposes for visualization.
| Dimension | Exploratory | Explanatory |
|---|---|---|
| Purpose | Discover what you don't know | Communicate what you do know |
| Audience | You (the analyst) | Others (stakeholders, team) |
| Volume | Many charts, most discarded | Few charts, carefully chosen |
| Polish | Messy, fast, disposable | Clean, labeled, annotated |
| Interaction | Filter, zoom, pivot freely | Guided narrative or fixed view |
| Tooling | Notebooks, ad hoc scripts | Dashboards, reports, presentations |
| Risk | Missing an insight | Miscommunicating an insight |
Charts are shared visual language — ~240 years of practice.
A bar chart implicitly promises: "the y-axis starts at zero, and bar height is proportional to the value." Breaking that promise violates the contract — even if you add a label.
Choose the chart by the question, not the appearance.
| Question | Chart Type | Watch Out For |
|---|---|---|
| How do categories compare? | Bar / Grouped bar | >15 bars unreadable; >4 groups confusing |
| Lollipop | Less familiar to some audiences | |
| Parts of a whole? | Pie / Donut | >5 slices unreadable |
| Stacked bar / Treemap | Interior segments hard to compare | |
| Distribution? | Histogram / Box plot | Bin width changes story; box hides multi-modality |
| Change over time? | Line chart | Too many lines overlap |
| Area / Sparkline | Stacked areas distort upper layers | |
| Relationships? | Scatter / Bubble | Overplotting; area encoding imprecise |
| Current status? | Gauge / KPI card | Gauge wastes space; KPI needs comparison |
Where the chart gets rendered — and the data transfer problem.
| Factor | Client-Side | Server-Side | Hybrid |
|---|---|---|---|
| Interactivity | Full (hover, click, zoom) | None (static image) | Full |
| Data volume | Limited by browser | Limited by server | Aggregated — small |
| Latency | Fast after load | Round-trip per image | Fast after load |
| Rendering tech | Canvas, SVG | Image library | Canvas/SVG |
| Example tools | Chart.js, D3 | matplotlib, QuickChart | Grafana, Metabase |
Client-side rendering has a hidden cost: every data point must be transferred, parsed, then rendered.
| Data Points | JSON Size | Transfer (3G) | JSON.parse() | Canvas Render |
|---|---|---|---|---|
| 100 | ~10 KB | instant | < 1 ms | < 5 ms |
| 10,000 | ~1 MB | 3s | ~20 ms | ~30 ms |
| 100,000 | ~10 MB | 30s | ~200 ms | ~80 ms |
| 1,000,000 | ~100 MB | 5 min | ~2,000 ms | ~300 ms |
JSON.parse() blocks until the entire string is processed| Format | Size vs. JSON | Streamable | Parse Speed |
|---|---|---|---|
| CSV | ~0.5× | Yes | Faster (no keys) |
| NDJSON | ~0.95× | Yes | Row-at-a-time |
| MessagePack | ~0.6× | No | 2–5× faster |
| Apache Arrow | ~0.3× | Yes | 10–50× (zero-copy) |
| Protocol Buffers | ~0.4× | With framing | 5–10× faster |
Five surfaces — how the browser draws charts.
| Surface | Model | Interactivity | Accessibility | Examples |
|---|---|---|---|---|
| Static image | Pre-rendered | None | alt text | matplotlib, node-canvas |
| CSS | Retained (DOM) | DOM events | Full semantic | CSS-only bars |
| Canvas 2D | Immediate | Manual hit-testing | Opaque (ARIA) | Chart.js, ZingChart |
| SVG | Retained (DOM) | Native per-element | Traversable | D3.js |
| WebGL/WebGPU | Immediate (GPU) | Raycasting | Opaque (ARIA) | deck.gl, Mapbox GL |
| Dimension | Static Img | CSS | Canvas | SVG | WebGL |
|---|---|---|---|---|---|
| Explanatory (reports, email) | Yes | Inline | |||
| Interactive analysis | Yes | Yes | Yes | ||
| < 50 data points | Yes | Yes | |||
| 50 – 5K points | Yes | Yes | |||
| 5K – 100K points | Yes | ||||
| > 100K points | Yes | ||||
| Real-time streaming | Yes | Yes | |||
| Works without JS | Yes | Yes |
Two programming models for driving rendering surfaces.
| Factor | Declarative (Chart.js) | Imperative (D3) |
|---|---|---|
| Learning curve | Low — configure, not code | High — selections, joins, scales |
| Time to first chart | Minutes | Hours |
| Customization ceiling | Limited to library API | Unlimited — every pixel |
| Animation | Built-in (limited) | Full control (enter/update/exit) |
| Accessibility | Canvas (no DOM nodes) | SVG (DOM nodes, can add ARIA) |
| Best for | Standard charts, dashboards | Custom/novel visualizations |
Same data, three encodings, three different stories.
Five pages measured across three days (Mon, Wed, Fri):
Best for: comparing individual pages on a given day
"Which page had the most traffic on Monday?"
Best for: total volume per day + each page's share
"How did overall traffic change?"
Best for: trend over time for each page
"Is traffic for each page going up, down, or flat?"
Data + visual + narrative = action.
Segel & Heer studied narrative visualization structures and identified a common three-phase pattern:
| Phase | Mode | What Happens |
|---|---|---|
| 1. Stem | Author-driven | Guided path through a focused narrative — annotations, curated views, controlled sequence |
| 2. Transition | Handoff | Key conclusion delivered — the single insight the author wants to land |
| 3. Bowl | Reader-driven | Opens up for free exploration — filter, drill-down, ask your own questions |
This pattern appears in the best data journalism (NYT, Guardian, FiveThirtyEight) and in well-designed analytics dashboards.
Research consistently shows that annotated charts are more effective than unannotated ones.
Persistent, multi-chart displays for ongoing monitoring.
| Type | Purpose | Update Frequency | Example |
|---|---|---|---|
| Operational | Real-time health | Seconds–minutes | Error rates, active users |
| Analytical | Trends & patterns | Daily–weekly | Funnels, A/B tests |
| Strategic | High-level KPIs | Weekly–monthly | Revenue, retention |
How charts deceive — deliberately or through ignorance.
| Technique | How It Misleads | How to Detect |
|---|---|---|
| Truncated y-axis | Starting above zero exaggerates bar differences | Check y-axis origin |
| Aspect ratio manipulation | Stretching/compressing changes perceived slope | Check axis interval consistency |
| 3D distortion | Perspective makes front elements look larger | Ask: does the 3rd dimension encode data? |
| Dual y-axes | Scale choices fabricate apparent correlation | Re-scale one axis mentally |
| Cherry-picked time range | Shows only a favorable trend | Ask: why this start date? |
| Inverted axes | "Up" means "worse" | Check axis direction |
| Area/radius confusion | 2× radius = 4× area | Check if size scales by area or radius |
Edward Tufte coined chartjunk: visual elements that do not encode data — 3D effects, gradient fills, decorative icons.
Charts summarize; tables preserve. Both are essential.
A chart is a lossy compression of data — it shows patterns at the cost of individual values. A table preserves every value but obscures patterns. Together, they are a complete picture.
<canvas> or <svg>. A data table provides the same information in a format assistive technology can read. A chart without a table is inaccessible by default.
| Feature | What It Does | Analytics Use Case |
|---|---|---|
| Column sorting | Click header to sort | Find slowest pages, top referrers |
| Filtering / search | Narrow rows to criteria | Errors from a specific URL |
| Pagination | Page through large sets | Navigate thousands of pageviews |
| Virtual scrolling | Render only visible rows | Handle 10K+ rows without DOM explosion |
| Conditional styling | Color cells by value | Red for CLS > 0.25, green for good LCP |
| CSV/JSON export | Download visible data | Share findings with team |
<table> — works without JS, accessible, printableThe greatest risk: an honest chart + an illiterate viewer.
Ask these when reading any chart:
| # | Question | What to Look For |
|---|---|---|
| 1 | What are the axes? | Labels, units, scale (linear vs. log) |
| 2 | Does the y-axis start at zero? | Truncation exaggerates bar differences |
| 3 | What is the sample size? | "Up 50%" might mean 2 users → 3 |
| 4 | What time period is shown? | Cherry-picked ranges create false narratives |
| 5 | Compared to what? | A number without baseline is meaningless |
| 6 | Correlation or causation? | Two lines moving together ≠ one causes the other |
| 7 | What is not shown? | Survivorship bias, excluded data |
| 8 | Who made this and why? | Advocacy vs. analytical intent |
| 9 | Can I access the raw data? | Unverifiable claims deserve caution |
| 10 | Would a different chart change the story? | If encoding seems designed to emphasize, consider alternatives |
Primitives, not chart types — unlimited ceiling, steep curve.
A visualization that no declarative charting library provides. Each spoke radiates from center, length proportional to pageviews, circle at the tip.
type: 'radialLollipop'. You compute angles, convert polar to Cartesian, draw SVG lines and circles, and animate with transitions. This is the power — and the cost — of imperative visualization.
Visualization is the last mile — quality bounded by upstream data.
18 sections of data visualization in two tables.
| Term | Definition |
|---|---|
| Visual encoding | Mapping data values to visual properties (position, length, color, etc.) |
| Decoding | Extracting data values from visual marks; accuracy varies by channel |
| Anscombe's Quartet | Four datasets with identical statistics but different visual patterns |
| Cleveland & McGill | Empirical hierarchy: position > length > angle > area > color |
| Exploratory vs. Explanatory | Discover unknowns vs. communicate knowns |
| Declarative vs. Imperative | Chart.js (config) vs. D3 (code) — fast vs. powerful |
| Martini Glass | Author-driven narrative → conclusion → reader-driven exploration |
| Immediate vs. Retained | Canvas (pixels, forgotten) vs. SVG (DOM nodes, persistent) |
| Chartjunk | Visual elements not encoding data (Tufte); but memorability may counter |
| Data literacy | Ability to read, interpret, and critically evaluate visualizations |
| # | Section | Key Point |
|---|---|---|
| 1 | Why Visualize? | 200 ms pattern recognition; Anscombe's Quartet; vision ≠ boolean |
| 2 | Visual Encoding | Bertin's channels; not all channels equally accurate |
| 3 | Decoding | Position > length > angle > area > color (measured, not opinion) |
| 4 | Explore vs. Explain | 30 charts → 1 insight; don't dump notebooks into presentations |
| 5 | Conventions | 240 years of shared visual language; breaking convention has a cost |
| 6 | Chart Inventory | Choose by question asked, not appearance; there is no "best chart" |
| 7 | Client vs. Server | Transfer + parse dwarfs render at scale; hybrid aggregation wins |
| 8 | Rendering | 5 surfaces; immediate vs. retained; DOM explosion limits SVG at scale |
| 9 | Declarative vs. Coded | Chart.js = fast + standard; D3 = powerful + custom |
| 10 | Encoding Demo | Same data, different encoding = different story; editorial choice |
| 11 | Storytelling | Martini Glass; annotation is critical; guide then open |
| 12 | Dashboards | Visual hierarchy; five-second test; showing everything shows nothing |
| 13 | Misleading Viz | Truncated axes, 3D, cherry-picked ranges; chartjunk debate |
| 14 | Data Tables | Charts + tables = complete picture; drill-down; progressive enhancement |
| 15 | Data Literacy | 10-question checklist; honest chart + illiterate viewer = greatest risk |
| 16 | D3 Demo | Primitives, not types; radial lollipop impossible in declarative libs |
| 17 | Pipeline | Viz is last mile; quality bounded by upstream; feedback loop |