Module 12: Force-Directed Graph — Referral Network

Standard bar charts, line charts, and doughnut charts handle most analytics dashboards. But some questions are about relationships, not measurements. When the data is a network of connections between entities — pages linking to pages, referrers driving traffic to destinations — a force-directed graph reveals structure that no table or bar chart can.

This capstone module builds an interactive referral network visualization that answers: "Which pages refer traffic to which other pages, and where does external traffic enter the site?"

Why a force-directed graph? A force-directed graph answers questions about relationships and networks. When the data has connections between entities rather than independent measurements, network visualization reveals structure that tabular data hides. Clusters of tightly connected pages emerge. Hub pages that funnel traffic become visually obvious. Dead-end pages with no outgoing links stand out. None of this is apparent from a spreadsheet of referrer/destination pairs.

Demo Files

Run: Open force-graph.html in a browser. No server or build step required — D3 v7 loads from CDN.

When Standard Charts Are Not Enough

Consider a typical analytics report showing top referral sources:

Referrer          Destination    Count
─────────────────────────────────────
google.com        /home          1800
(direct)          /home          1500
/home             /products      1200
google.com        /products       900
/home             /about          800

This table tells you the individual numbers, but it cannot answer structural questions: Which page is the biggest hub? Are there clusters of pages that users navigate between? Do social referrers drive traffic to different pages than search engines? A force-directed graph makes these patterns visible at a glance.

google.com ───────────── /home ─────────── /about / \ / | \ / \ / | \ bing.com /products ◄──┘ | /pricing ──── /contact ▲ | ▲ │ /blog ──── /docs ──────┘ reddit.com ▲ \ │ └── /products twitter.com Nodes sized by traffic volume. Links weighted by referral count. Colors encode category: internal, search, social, referral, direct.

Force Simulation Concepts

D3's force simulation models the graph as a physics system. Each node is a charged particle. Each link is a spring. The simulation runs iterative steps, applying forces until the system reaches equilibrium — a layout where connected nodes are near each other and unconnected nodes are pushed apart.

The Five Core Forces

Force D3 Method Effect
Link d3.forceLink() Pulls connected nodes toward each other (spring). Distance and strength are configurable.
Many-Body d3.forceManyBody() Repels all nodes from each other (electrostatic charge). Negative strength = repulsion.
Center d3.forceCenter() Pulls the entire graph toward a center point, preventing drift.
Collision d3.forceCollide() Prevents nodes from overlapping by treating each as a circle with a radius.
Positioning d3.forceX() / d3.forceY() Gently pushes nodes toward a target x or y coordinate. Useful for semi-structured layouts.

Creating the Simulation

const simulation = d3.forceSimulation(nodes)
    .force('link', d3.forceLink(links)
        .id(d => d.id)              // Match links by node id
        .distance(100)               // Target link length
        .strength(d => Math.min(d.value / 1000, 1))  // Stronger for heavier links
    )
    .force('charge', d3.forceManyBody()
        .strength(-300)              // Repulsion strength
    )
    .force('center', d3.forceCenter(width / 2, height / 2))
    .force('collide', d3.forceCollide()
        .radius(d => nodeRadius(d) + 2)   // Prevent overlap
    );

On each tick of the simulation, D3 updates the x and y properties of every node and the source and target objects of every link. Your rendering code reads these properties and moves the SVG elements accordingly.

Graph Data Structure

A force graph needs two arrays: nodes and links. Each node has an id and whatever attributes you want to encode visually. Each link has a source and target (matching node IDs) and a value for weight.

// sample-referrals.json
{
  "nodes": [
    { "id": "/home",      "label": "Home",    "views": 4500, "category": "internal" },
    { "id": "google.com", "label": "Google",  "views": 3200, "category": "search" },
    { "id": "(direct)",   "label": "Direct",  "views": 2100, "category": "direct" }
    // ...
  ],
  "links": [
    { "source": "google.com", "target": "/home", "value": 1800 },
    { "source": "(direct)",   "target": "/home", "value": 1500 }
    // ...
  ]
}
Node IDs must be unique. D3's forceLink().id() uses the id field to resolve link source/target strings to actual node objects. Duplicate IDs will cause links to attach to the wrong node or throw errors.

Visual Encoding

The force graph encodes four dimensions of data simultaneously:

Visual Channel Data Dimension Implementation
Node size (radius) Traffic volume (pageviews) d3.scaleSqrt() mapping views to radius 8–35
Node color Category Internal = teal, Search = blue, Social = pink, Referral = orange, Direct = gray
Link thickness Referral count d3.scaleSqrt() mapping value to width 1–8
Link opacity Referral count Higher count = more opaque (0.2–0.7)

Using scaleSqrt() instead of scaleLinear() for radius is important: circle area is proportional to radius squared, so a linear radius scale would over-represent large values. A square-root scale maps values to area, which is what human perception actually compares.

// Node radius: sqrt scale maps views to 8–35px radius
const radiusScale = d3.scaleSqrt()
    .domain([0, d3.max(nodes, d => d.views)])
    .range([8, 35]);

// Link width: sqrt scale maps referral count to 1–8px
const linkWidthScale = d3.scaleSqrt()
    .domain([0, d3.max(links, d => d.value)])
    .range([1, 8]);

Color Encoding by Category

Nodes are colored by their category to distinguish traffic sources at a glance:

const categoryColors = {
    internal: '#16a085',   // Teal — your site's pages
    search:   '#2980b9',   // Blue — Google, Bing, etc.
    social:   '#e84393',   // Pink — Twitter, LinkedIn, etc.
    referral: '#e67e22',   // Orange — GitHub, Reddit, blogs
    direct:   '#95a5a6'    // Gray — direct/bookmarks
};

This palette follows the convention of using warm colors (pink, orange) for external sources and cool colors (teal, blue) for internal pages and search engines. The gray for direct traffic makes it visually recede, since direct traffic has no referral information.

Drag Interaction

Dragging lets the user manually reposition nodes to explore the graph structure. D3's d3.drag() integrates with the force simulation:

function drag(simulation) {
    function dragStarted(event, d) {
        if (!event.active) simulation.alphaTarget(0.3).restart();
        d.fx = d.x;   // Fix the node at its current position
        d.fy = d.y;
    }

    function dragged(event, d) {
        d.fx = event.x;   // Move the fixed position to the cursor
        d.fy = event.y;
    }

    function dragEnded(event, d) {
        if (!event.active) simulation.alphaTarget(0);
        d.fx = null;   // Release the node back to the simulation
        d.fy = null;
    }

    return d3.drag()
        .on('start', dragStarted)
        .on('drag', dragged)
        .on('end', dragEnded);
}

The key mechanism: setting d.fx and d.fy "pins" a node at a fixed position, overriding the simulation's calculated coordinates. Setting them back to null releases the node. The alphaTarget(0.3) call on drag start reheats the simulation so other nodes react to the dragged node's movement.

Zoom and Pan

For graphs with many nodes, zoom and pan are essential. D3's d3.zoom() applies a geometric transform to an SVG group:

const svg = d3.select('#graph')
    .attr('viewBox', [0, 0, width, height]);

const g = svg.append('g');   // All graph elements go inside this group

svg.call(d3.zoom()
    .scaleExtent([0.25, 5])   // Min 25% zoom, max 500%
    .on('zoom', (event) => {
        g.attr('transform', event.transform);
    })
);

The zoom behavior handles mouse wheel (zoom in/out), click-and-drag on the background (pan), and pinch gestures on touch devices. The scaleExtent prevents users from zooming in so far that they lose context or zooming out so far that the graph becomes invisible.

Zoom vs. drag conflict: Both zoom (pan) and node drag listen for mouse drag events. D3 resolves this by having node drag stop propagation so that dragging a node does not pan the background. This works automatically when you apply d3.drag() to node elements and d3.zoom() to the SVG container.

Hover to Highlight Connected Nodes

Hovering over a node should highlight it and all of its direct neighbors while dimming unconnected nodes. This focus+context technique makes it easy to trace individual referral paths:

// Build an adjacency set for fast lookup
const adjacency = new Set();
links.forEach(l => {
    adjacency.add(`${l.source.id}-${l.target.id}`);
    adjacency.add(`${l.target.id}-${l.source.id}`);
});

function isConnected(a, b) {
    return a === b || adjacency.has(`${a}-${b}`);
}

node.on('mouseenter', function(event, d) {
    // Dim everything
    node.style('opacity', n => isConnected(d.id, n.id) ? 1 : 0.15);
    link.style('opacity', l =>
        (l.source.id === d.id || l.target.id === d.id) ? 0.8 : 0.05
    );
    label.style('opacity', n => isConnected(d.id, n.id) ? 1 : 0.1);
})
.on('mouseleave', function() {
    // Restore all opacities
    node.style('opacity', 1);
    link.style('opacity', d => opacityScale(d.value));
    label.style('opacity', 1);
});

This technique reveals which nodes are connected to the hovered node while preserving the overall graph layout as context. The dimmed elements are still visible enough to maintain spatial awareness.

Tooltip on Hover

A tooltip provides detailed information that the visual encoding alone cannot convey — exact pageview counts, the node's category label, and the number of connections:

// HTML tooltip (positioned absolutely over the SVG)
const tooltip = d3.select('body').append('div')
    .attr('class', 'tooltip')
    .style('opacity', 0);

node.on('mouseenter', function(event, d) {
    const connections = links.filter(
        l => l.source.id === d.id || l.target.id === d.id
    ).length;

    tooltip.transition().duration(150).style('opacity', 1);
    tooltip.html(`
        <strong>${d.label}</strong><br>
        Category: ${d.category}<br>
        Views: ${d.views.toLocaleString()}<br>
        Connections: ${connections}
    `)
    .style('left', (event.pageX + 12) + 'px')
    .style('top', (event.pageY - 10) + 'px');
});

Building the Graph from Raw Data

The referral-data.js file provides functions that transform raw pageview/referrer records into the nodes+links format the force graph expects:

import { buildGraph, categorizeNode } from './referral-data.js';

// Raw analytics data: each record is a referrer→page pair with a count
const pageviews = [
    { page: '/home',     referrer: 'google.com', count: 1800 },
    { page: '/home',     referrer: '(direct)',   count: 1500 },
    { page: '/products', referrer: '/home',      count: 1200 },
    // ...
];

const graph = buildGraph(pageviews);
// graph.nodes → array of { id, label, views, category }
// graph.links → array of { source, target, value }

The buildGraph function deduplicates nodes, sums traffic volumes, categorizes each URL, and produces the exact format that d3.forceSimulation consumes. The categorizeNode function classifies URLs as internal, search, social, referral, or direct.

The Simulation Tick Loop

The simulation fires a tick event on each iteration. Your rendering code updates SVG element positions to match the simulation's calculated coordinates:

simulation.on('tick', () => {
    // Update link positions
    link
        .attr('x1', d => d.source.x)
        .attr('y1', d => d.source.y)
        .attr('x2', d => d.target.x)
        .attr('y2', d => d.target.y);

    // Update node positions
    node
        .attr('cx', d => d.x)
        .attr('cy', d => d.y);

    // Update label positions
    label
        .attr('x', d => d.x)
        .attr('y', d => d.y + radiusScale(d.views) + 14);
});

This runs hundreds of times as the simulation converges. D3's force simulation is optimized with a Barnes-Hut approximation for the many-body force, making it efficient even with hundreds of nodes.

Why This Capstone

The referral network visualization brings together every skill from this tutorial:

Skill Module Application Here
Coordinate systems 01 (Canvas) SVG viewBox and the simulation's x/y coordinate space
Scale functions 02 (Line Charts) scaleSqrt for radius, scaleLinear for opacity
Data transformation 03 (Data Shaping) buildGraph() converts raw records to graph format
Color encoding 05–06 (Chart.js) Category colors for node types
Selections and data binding 10 (Hello D3) selectAll().data().join() for nodes, links, labels
Transitions and interactivity 11 (Transitions) Hover highlights, drag, zoom/pan, tooltips

More importantly, this is a chart that no standard charting library provides. There is no type: 'forceGraph' in Chart.js. D3 gives you the building blocks — force simulation, scales, selections, interaction handlers — and you assemble them into something custom. That is the power of imperative visualization.

Why SVG works here: This graph uses SVG because the node count is modest (typically under 100 referral sources). For a graph with thousands of nodes, you would render on Canvas instead — d3-force works with any rendering target, not just SVG. The force simulation computes x/y coordinates; you can draw them with ctx.arc() on a Canvas just as easily as with <circle> elements in SVG. See Module 09 for the performance envelopes that guide this decision.

Summary