Data Visualization

From Data to Decisions

"Raw numbers are inert. A table of 10,000 pageview records tells you nothing until a human brain processes it. Visualization is the bridge between collected data and human understanding."

CSE 135 — Full Overview | Tutorial

Section 1Why Visualize Data?

200 ms to see what a spreadsheet cannot tell you.

The Speed of Vision

The human visual system processes spatial patterns in 200–500 milliseconds — far faster than reading a table of numbers. This is not a minor efficiency gain; it is a qualitative difference in what the brain can discover.

  • A table with 10 rows is readable. 100 rows is tedious. 10,000 rows is useless without aggregation or visualization.
  • Trends, outliers, clusters, and gaps invisible in a spreadsheet become obvious in a well-chosen chart.

Anscombe's Quartet (1973)

Four datasets that are statistically identical (same mean, variance, correlation, regression line) but look completely different when plotted.

Dataset I: linear Dataset II: curve Dataset III: outlier Dataset IV: vertical * * * * * * * * * * * * * * * * (outlier) * * * * * * * * * * * * * * * *
Summary statistics can hide radically different data structures. Plotting the data reveals what the numbers conceal. This was demonstrated over fifty years ago — extended to absurdity in 2017 by the Datasaurus Dozen.

Vision Isn't Boolean

  • Visual interpretation is not always accurate — optical illusions are real and charts can resemble one
  • Visual interpretation can be misleading — encoding choices shape the story (more in Section 13)
  • Visual interpretation can be subjective — culture, context, and familiarity matter
Accessibility: Vision is a spectrum, not a boolean. People with variable vision need alternative ways to interpret data. Data tables, ARIA labels, and CSV exports are not "nice to have" — they are requirements.

Section 2Visual Encoding

Mapping data values to visual marks — the fundamental building block.

Bertin's Visual Channels

Jacques Bertin (Sémiologie Graphique, 1967) identified the core visual channels:

Visual ChannelBest ForAccuracyExample
Position (x, y)QuantitativeHighestScatter plot, line chart
LengthQuantitativeVery highBar chart
Angle / SlopeQuantitativeModeratePie chart, line slope
AreaQuantitativeLowBubble chart, treemap
Color hueCategoricalN/ALegend groups
Color saturationOrderedLowHeat map, choropleth
ShapeCategoricalN/ADifferent point shapes
Not all channels are equal. Choosing a bar chart encodes count as length (high accuracy). Choosing a pie chart encodes the same count as angle (lower accuracy). The data is the same — the legibility is not.

Subjectivity & Culture

Encoding into visual forms has aspects that do not cross all cultures.

  • An icon, shape, or color might imply one thing in one culture and something else in another
  • Japan is awash in radar/spider charts (even on food packages); rare in the west
  • The "big three" (line, bar, pie) far exceed other chart types — familiarity, not just accuracy
Dataviz is less Math than you think. It is communication, complete with subjectivity — personal and cultural. No amount of coding or platform complexity changes that truth.

Section 3Decoding & Perception

How accurately can humans extract values from visual marks?

Cleveland & McGill Ranking (1984)

Empirical ranking of perceptual accuracy, from most to least accurate:

Most Accurate ├── 1. Position on common scale (scatter, dot plot) ├── 2. Position on non-aligned scale (small multiples) ├── 3. Length (bar chart) ├── 4. Angle / Slope (pie chart, line slope) ├── 5. Area (bubble chart, treemap) ├── 6. Color saturation / lightness (heat map, choropleth) └── 7. Volume / curvature (3D charts) Least Accurate

Replicated by Heer & Bostock (2010) via crowdsourced experiments on Mechanical Turk.

Bar charts beat pie charts because humans decode length more accurately than angle. This is a measured, empirical fact — not an aesthetic preference.

Pie Charts: Not Forbidden

Pie charts are legitimate when:

  • Very few slices (2–3)
  • Emphasizing a single dominant proportion (e.g., "mobile is 72% of traffic")
  • Exact comparison between slices is not important
Norms matter too. The big three (line, bar, pie) far exceed usage of other viz types. Much of this is familiarity rather than accuracy — social expectations play a part in appropriate dataviz.

Section 4Exploratory vs. Explanatory

Two fundamentally different purposes for visualization.

Comparison

DimensionExploratoryExplanatory
PurposeDiscover what you don't knowCommunicate what you do know
AudienceYou (the analyst)Others (stakeholders, team)
VolumeMany charts, most discardedFew charts, carefully chosen
PolishMessy, fast, disposableClean, labeled, annotated
InteractionFilter, zoom, pivot freelyGuided narrative or fixed view
ToolingNotebooks, ad hoc scriptsDashboards, reports, presentations
RiskMissing an insightMiscommunicating an insight
The Anti-Pattern: Dumping a Jupyter notebook of 30 charts into a presentation and expecting the audience to find the insight. If you made 30 charts to find one insight, show the one chart that communicates it.

Section 5Chart Conventions

Charts are shared visual language — ~240 years of practice.

Timeline of Dataviz Milestones

1786 William Playfair First line & bar charts 1858 Florence Nightingale Polar area diagram (Crimean War mortality) 1861 Charles Minard Napoleon's March — "the best statistical graphic ever drawn" 1914 Willard Brinton Graphic Methods for Presenting Facts 1967 Jacques Bertin Sémiologie Graphique — visual channel theory 1977 John Tukey Exploratory Data Analysis, box plot 1983 Edward Tufte The Visual Display of Quantitative Information 2010 Mike Bostock D3.js — web-native, data-driven documents

Charts as Visual Contracts

A bar chart implicitly promises: "the y-axis starts at zero, and bar height is proportional to the value." Breaking that promise violates the contract — even if you add a label.

Breaking convention forces every viewer to re-learn the visual language. That cognitive cost should be paid only when the payoff is significant.

Dataviz Didn't Start with Computing

  • Maps, commerce artifacts, cave paintings, hieroglyphics — visual communication may have been the earliest form
  • Tablets with grids, inventories, and what appear to be line/scatter chart ideas predate modern history
Dataviz isn't something that started with computing, but it did get more powerful because of it.

Section 6Chart Type Inventory

Choose the chart by the question, not the appearance.

Question → Chart Type

QuestionChart TypeWatch Out For
How do categories compare?Bar / Grouped bar>15 bars unreadable; >4 groups confusing
LollipopLess familiar to some audiences
Parts of a whole?Pie / Donut>5 slices unreadable
Stacked bar / TreemapInterior segments hard to compare
Distribution?Histogram / Box plotBin width changes story; box hides multi-modality
Change over time?Line chartToo many lines overlap
Area / SparklineStacked areas distort upper layers
Relationships?Scatter / BubbleOverplotting; area encoding imprecise
Current status?Gauge / KPI cardGauge wastes space; KPI needs comparison
There is no "best chart." If someone says "always use bar charts" or "never use pie charts," they are oversimplifying. The right chart depends on the question, the data, and the audience.

Section 7Client vs. Server Charting

Where the chart gets rendered — and the data transfer problem.

Architecture Comparison

CLIENT-SIDE: DB → Server → JSON → Browser → Canvas/SVG chart SERVER-SIDE: DB → Server → PNG/SVG image → Browser → <img> HYBRID: DB → Server (aggregate) → small JSON → Browser → Canvas chart
FactorClient-SideServer-SideHybrid
InteractivityFull (hover, click, zoom)None (static image)Full
Data volumeLimited by browserLimited by serverAggregated — small
LatencyFast after loadRound-trip per imageFast after load
Rendering techCanvas, SVGImage libraryCanvas/SVG
Example toolsChart.js, D3matplotlib, QuickChartGrafana, Metabase

The Data Transfer Problem

Client-side rendering has a hidden cost: every data point must be transferred, parsed, then rendered.

Data PointsJSON SizeTransfer (3G)JSON.parse()Canvas Render
100~10 KBinstant< 1 ms< 5 ms
10,000~1 MB3s~20 ms~30 ms
100,000~10 MB30s~200 ms~80 ms
1,000,000~100 MB5 min~2,000 ms~300 ms
At scale, the bottleneck is not rendering — it is getting the data to the renderer. Canvas can paint a million points in under a second. Shipping 100 MB of JSON and parsing it takes orders of magnitude longer.

JSON Is the Problem Child

  • Verbose: Every record repeats every key name — 100K rows = millions of redundant strings
  • Parse-all-or-nothing: JSON.parse() blocks until the entire string is processed
  • No streaming: Cannot render data as it arrives
  • Text overhead: Numbers stored as text; timestamps especially wasteful

Alternative Wire Formats

FormatSize vs. JSONStreamableParse Speed
CSV~0.5×YesFaster (no keys)
NDJSON~0.95×YesRow-at-a-time
MessagePack~0.6×No2–5× faster
Apache Arrow~0.3×Yes10–50× (zero-copy)
Protocol Buffers~0.4×With framing5–10× faster
The real question: does the client need all the data? Server-side aggregation (SQL GROUP BY) reduces 100K rows to 50 — the hybrid approach gets fast data reduction and interactive rendering.

Section 8Rendering Technologies

Five surfaces — how the browser draws charts.

Five Rendering Surfaces

SurfaceModelInteractivityAccessibilityExamples
Static imagePre-renderedNonealt textmatplotlib, node-canvas
CSSRetained (DOM)DOM eventsFull semanticCSS-only bars
Canvas 2DImmediateManual hit-testingOpaque (ARIA)Chart.js, ZingChart
SVGRetained (DOM)Native per-elementTraversableD3.js
WebGL/WebGPUImmediate (GPU)RaycastingOpaque (ARIA)deck.gl, Mapbox GL

Immediate vs. Retained Mode

IMMEDIATE MODE (Canvas, WebGL) RETAINED MODE (SVG, CSS) ────────────────────────────── ─────────────────────────── fillRect(10, 20, 100, 50) <rect x="10" y="20" ...> │ │ ▼ ▼ Pixels painted → object GONE DOM node PERSISTS To change: clear + redraw ALL To change: modify attribute Memory: O(pixels) — constant Memory: O(elements) — grows

Performance Envelopes

Data Points (log scale) 10 100 1K 5K 10K 50K 100K 1M ────────────────────────────────────────────────────────── CSS: ██████████ SVG: ████████████████████████████████ Canvas: ███████████████████████████████████████ WebGL: ████████████████████████████
DOM Explosion: At 5,000+ SVG elements, reflow and style recalculation become measurable. At 100,000+, the page freezes. This is why Chart.js chose Canvas internally.

Decision Framework

DimensionStatic ImgCSSCanvasSVGWebGL
Explanatory (reports, email)YesInline
Interactive analysisYesYesYes
< 50 data pointsYesYes
50 – 5K pointsYesYes
5K – 100K pointsYes
> 100K pointsYes
Real-time streamingYesYes
Works without JSYesYes
For the dashboard project: Canvas via Chart.js. Standard chart types, moderate data, interactivity matters. For D3 modules: SVG — custom visualizations with modest element counts.

Section 9Declarative vs. Coded

Two programming models for driving rendering surfaces.

Chart.js vs. D3 Side-by-Side

Chart.js (Declarative, ~15 lines)

new Chart(ctx, { type: 'bar', data: { labels: ['Home','About','Blog'], datasets: [{ label: 'Pageviews', data: [420, 280, 350], backgroundColor: '#16a085' }] }, options: { scales: { y: { beginAtZero: true } } } });

D3 (Imperative, ~25 lines)

const x = d3.scaleBand() .domain(data.map(d => d.page)) .range([40, 380]).padding(0.2); const y = d3.scaleLinear() .domain([0, d3.max(data, d => d.views)]) .range([220, 10]); svg.selectAll('rect') .data(data).join('rect') .attr('x', d => x(d.page)) .attr('y', d => y(d.views)) .attr('height', d => 220-y(d.views)) .attr('fill', '#16a085');

Comparison Table

FactorDeclarative (Chart.js)Imperative (D3)
Learning curveLow — configure, not codeHigh — selections, joins, scales
Time to first chartMinutesHours
Customization ceilingLimited to library APIUnlimited — every pixel
AnimationBuilt-in (limited)Full control (enter/update/exit)
AccessibilityCanvas (no DOM nodes)SVG (DOM nodes, can add ARIA)
Best forStandard charts, dashboardsCustom/novel visualizations
Course recommendation: Use Chart.js for the dashboard project (fast, standard). Study D3 to understand how visualization works at a fundamental level — scales, selections, data binding.

Section 10Encoding Demo

Same data, three encodings, three different stories.

Three Encodings, One Dataset

Five pages measured across three days (Mon, Wed, Fri):

Grouped Bar

Best for: comparing individual pages on a given day

"Which page had the most traffic on Monday?"

Stacked Bar

Best for: total volume per day + each page's share

"How did overall traffic change?"

Line Chart

Best for: trend over time for each page

"Is traffic for each page going up, down, or flat?"

Encoding choice is editorial choice. None of these charts is wrong, but each emphasizes a different aspect. Choosing one encoding over another is a framing decision — similar to choosing which quote to lead a news article with.

Section 11Data Storytelling

Data + visual + narrative = action.

The Martini Glass (Segel & Heer, 2010)

Segel & Heer studied narrative visualization structures and identified a common three-phase pattern:

PhaseModeWhat Happens
1. StemAuthor-drivenGuided path through a focused narrative — annotations, curated views, controlled sequence
2. TransitionHandoffKey conclusion delivered — the single insight the author wants to land
3. BowlReader-drivenOpens up for free exploration — filter, drill-down, ask your own questions

This pattern appears in the best data journalism (NYT, Guardian, FiveThirtyEight) and in well-designed analytics dashboards.

Dashboard connection: Summary view (KPI cards, headline charts) = author-driven stem. Drill-down into filtered detail views = reader-driven bowl.

Annotation Is Critical

Research consistently shows that annotated charts are more effective than unannotated ones.

Without Annotation

  • Viewer must discover the spike
  • Viewer must calculate magnitude
  • Viewer must guess at the cause

With Annotation

  • "Bounce rate spiked 40% after the redesign" written directly on the data point
  • Insight communicated instantly
A chart by itself is not a story. A story requires three components: data (what happened), visual (the chart), and narrative (why it matters). Data storytelling combines all three to move an audience from observation to action.

Section 12Dashboards

Persistent, multi-chart displays for ongoing monitoring.

Design Principles

  • Visual hierarchy: Most important metrics most visually prominent. KPI cards at top, details below.
  • Progressive disclosure: Summary first, detail on demand.
  • Consistent color: Blue = "Chrome" everywhere. Don't reuse colors with different meanings.
  • Appropriate chart types: Use the inventory from Section 6.
  • Five-second test: A new viewer identifies purpose and main finding within 5 seconds.

Dashboard Types

TypePurposeUpdate FrequencyExample
OperationalReal-time healthSeconds–minutesError rates, active users
AnalyticalTrends & patternsDaily–weeklyFunnels, A/B tests
StrategicHigh-level KPIsWeekly–monthlyRevenue, retention

Dashboard Anti-Patterns

  • The "everything" dashboard: 20+ charts on one screen — if everything is highlighted, nothing is
  • Pie overload: Multiple pie charts side by side — cannot compare angles across circles
  • Decorative gauges: Large space for a single number a KPI card shows in 1/10 the space
  • No comparison period: "12,450 sessions" — up, down, or flat?
  • Rainbow colors: Every color in the palette with no semantic meaning
A dashboard that shows everything shows nothing. If you cannot articulate what decision each chart supports, remove it. Dashboard design is editing, not adding.

Section 13Misleading Visualizations

How charts deceive — deliberately or through ignorance.

Common Misleading Techniques

TechniqueHow It MisleadsHow to Detect
Truncated y-axisStarting above zero exaggerates bar differencesCheck y-axis origin
Aspect ratio manipulationStretching/compressing changes perceived slopeCheck axis interval consistency
3D distortionPerspective makes front elements look largerAsk: does the 3rd dimension encode data?
Dual y-axesScale choices fabricate apparent correlationRe-scale one axis mentally
Cherry-picked time rangeShows only a favorable trendAsk: why this start date?
Inverted axes"Up" means "worse"Check axis direction
Area/radius confusion2× radius = 4× areaCheck if size scales by area or radius

Truncated Y-Axis

Y starts at 0 (honest)

100 │ █ █ █ █ █ │ █ █ █ █ █ 50 │ █ █ █ █ █ │ █ █ █ █ █ 0 └──────────────── A B C D E Values: 97 100 98 102 99 Looks: tiny fluctuations

Y starts at 95 (misleading)

102 │ ▄▄ │ ▄▄ ██ ▄▄ 100 │ ██ ▄▄ ██ ██ │ ██ ██ ██ ▄▄ ██ 97 │ ██ ██ ██ ██ ██ 95 └──────────────── A B C D E Values: 97 100 98 102 99 Looks: DRAMATIC swings!
A bar that appears twice as tall is decoded as "twice the value." Truncation breaks this assumption. Bar charts should almost always start at zero. Line charts have more flexibility (viewers decode slope, not height).

Chartjunk & Tufte

Edward Tufte coined chartjunk: visual elements that do not encode data — 3D effects, gradient fills, decorative icons.

The standard view: 3D bar charts are the canonical example of chartjunk. The third dimension encodes no data but distorts perception.
The counterpoint: Studies show "monster chart junk" examples have better memorability than clean Tufte-style charts. The enticing visual draws readers in.
Be wary of people who claim "true ways" in a subjective field like dataviz.

Section 14Raw Data & Tables

Charts summarize; tables preserve. Both are essential.

Chart + Table Pairing

A chart is a lossy compression of data — it shows patterns at the cost of individual values. A table preserves every value but obscures patterns. Together, they are a complete picture.

Aggregated Chart → click segment → Detail Table → download → CSV / JSON (patterns) (drill-down) (individual values) (export) (verification)
Accessibility: Screen readers cannot interpret <canvas> or <svg>. A data table provides the same information in a format assistive technology can read. A chart without a table is inaccessible by default.

Interactive Table Features

FeatureWhat It DoesAnalytics Use Case
Column sortingClick header to sortFind slowest pages, top referrers
Filtering / searchNarrow rows to criteriaErrors from a specific URL
PaginationPage through large setsNavigate thousands of pageviews
Virtual scrollingRender only visible rowsHandle 10K+ rows without DOM explosion
Conditional stylingColor cells by valueRed for CLS > 0.25, green for good LCP
CSV/JSON exportDownload visible dataShare findings with team

Progressive Enhancement

  1. Semantic HTML <table> — works without JS, accessible, printable
  2. Vanilla JavaScript — add sorting, filtering, pagination with zero dependencies
  3. Grid component — optionally upgrade (ZingGrid, AG Grid) for virtual scrolling
Shneiderman's mantra: "Overview first, zoom and filter, then details on demand" — maps directly to chart → click → table.

Section 15Data Literacy

The greatest risk: an honest chart + an illiterate viewer.

10-Question Checklist

Ask these when reading any chart:

#QuestionWhat to Look For
1What are the axes?Labels, units, scale (linear vs. log)
2Does the y-axis start at zero?Truncation exaggerates bar differences
3What is the sample size?"Up 50%" might mean 2 users → 3
4What time period is shown?Cherry-picked ranges create false narratives
5Compared to what?A number without baseline is meaningless
6Correlation or causation?Two lines moving together ≠ one causes the other
7What is not shown?Survivorship bias, excluded data
8Who made this and why?Advocacy vs. analytical intent
9Can I access the raw data?Unverifiable claims deserve caution
10Would a different chart change the story?If encoding seems designed to emphasize, consider alternatives

The Greatest Risk

The greatest risk in data visualization is not a dishonest chart — it is an honest chart read by an illiterate viewer. A viewer who cannot detect a truncated y-axis is vulnerable to any chart, regardless of intent. Data literacy is a defensive skill.

Deception by Defaults

  • What appears to be purposeful chart deception is often just trusting the defaults
  • GenAI-based visualization and decoding produce errors in the ~30% range
  • Always check your defaults and AI outputs
Data literacy must be widespread in any organization that claims to be "data-driven."

Section 16D3 in Action

Primitives, not chart types — unlimited ceiling, steep curve.

Radial Lollipop Chart

A visualization that no declarative charting library provides. Each spoke radiates from center, length proportional to pageviews, circle at the tip.

// Radial scale: pageviews → spoke length const r = d3.scaleLinear().domain([0, maxViews]).range([innerR, outerR]); // Angular scale: page index → angle in radians const angle = d3.scalePoint().domain(pages).range([0, 2 * Math.PI]); // Polar → Cartesian conversion function polarX(a, rad) { return cx + rad * Math.sin(a); } function polarY(a, rad) { return cy - rad * Math.cos(a); } // Draw spokes + animated transitions svg.selectAll('.spoke').data(data).join('line') .transition().duration(800) .attr('x2', d => polarX(angle(d.page), r(d.views))) .attr('y2', d => polarY(angle(d.page), r(d.views)));
D3 gives you primitives, not chart types. There is no type: 'radialLollipop'. You compute angles, convert polar to Cartesian, draw SVG lines and circles, and animate with transitions. This is the power — and the cost — of imperative visualization.

Section 17Analytics Pipeline

Visualization is the last mile — quality bounded by upstream data.

Pipeline: Collection to Decision

Collect -----> Process -----> Store -----> Query -----> Visualize (JS/PHP) (validate, (MySQL) (GROUP BY) (Chart.js/D3) sessionize) | | | | | v beacons validate & time-series aggregate Decide events sessionize persist reduce (human)
  • If the collector doesn't capture a dimension, you cannot visualize it
  • If processing drops malformed events, they disappear from charts
  • If the schema doesn't support an aggregation, the query can't produce it
Visualization quality is bounded by upstream data quality. A beautiful chart built on garbage data is worse than no chart — it creates false confidence.

The Feedback Loop

┌────────────────────────────────────────────────────────────────┐ │ FEEDBACK LOOP │ │ │ │ Dashboard reveals drop-off on /checkout │ │ │ │ │ ▼ │ │ Add granular event tracking to /checkout │ │ │ │ │ ▼ │ │ New data flows through pipeline │ │ │ │ │ ▼ │ │ Dashboard reveals root cause → decision → improvement │ └────────────────────────────────────────────────────────────────┘
Everything upstream exists to serve the visualization moment — and the visualization moment exists to serve the decision.

SummaryKey Takeaways

18 sections of data visualization in two tables.

Key Terms (Subset)

TermDefinition
Visual encodingMapping data values to visual properties (position, length, color, etc.)
DecodingExtracting data values from visual marks; accuracy varies by channel
Anscombe's QuartetFour datasets with identical statistics but different visual patterns
Cleveland & McGillEmpirical hierarchy: position > length > angle > area > color
Exploratory vs. ExplanatoryDiscover unknowns vs. communicate knowns
Declarative vs. ImperativeChart.js (config) vs. D3 (code) — fast vs. powerful
Martini GlassAuthor-driven narrative → conclusion → reader-driven exploration
Immediate vs. RetainedCanvas (pixels, forgotten) vs. SVG (DOM nodes, persistent)
ChartjunkVisual elements not encoding data (Tufte); but memorability may counter
Data literacyAbility to read, interpret, and critically evaluate visualizations

Section Summary

#SectionKey Point
1Why Visualize?200 ms pattern recognition; Anscombe's Quartet; vision ≠ boolean
2Visual EncodingBertin's channels; not all channels equally accurate
3DecodingPosition > length > angle > area > color (measured, not opinion)
4Explore vs. Explain30 charts → 1 insight; don't dump notebooks into presentations
5Conventions240 years of shared visual language; breaking convention has a cost
6Chart InventoryChoose by question asked, not appearance; there is no "best chart"
7Client vs. ServerTransfer + parse dwarfs render at scale; hybrid aggregation wins
8Rendering5 surfaces; immediate vs. retained; DOM explosion limits SVG at scale
9Declarative vs. CodedChart.js = fast + standard; D3 = powerful + custom
10Encoding DemoSame data, different encoding = different story; editorial choice
11StorytellingMartini Glass; annotation is critical; guide then open
12DashboardsVisual hierarchy; five-second test; showing everything shows nothing
13Misleading VizTruncated axes, 3D, cherry-picked ranges; chartjunk debate
14Data TablesCharts + tables = complete picture; drill-down; progressive enhancement
15Data Literacy10-question checklist; honest chart + illiterate viewer = greatest risk
16D3 DemoPrimitives, not types; radial lollipop impossible in declarative libs
17PipelineViz is last mile; quality bounded by upstream; feedback loop