Web Analytics

Measuring the Web

"Analytics answers the questions you cannot answer just by looking at your own code. They address what happens when users meet the executing code."

CSE 135 — Full Overview | Review Questions

← → sections • ↓ more detail

Section 1Why Analytics & The Stack

Three purposes, one pipeline — from collection to insight.

Three Purposes of Analytics

Purpose	Who Cares	What It Answers
Error Tracking	Developers	Broken pages, failed requests, JS exceptions users never report
User Behavior	Business / Product	Which features are used, where users abandon tasks, conversion rates
Usability Measurement	Users (indirectly)	Does the interface work? Does content resonate? Do changes help or hurt?

The Analytics Pipeline

┌──────────┐ ┌────────────┐ ┌──────────┐ ┌───────────┐ │ Client │──────▶ │ Collection │──────▶ │ Storage │──────▶ │ Reporting │ │ (Browser)│ HTTP │ (Server) │ INSERT │(Database)│ SELECT │(Dashboard)│ └──────────┘ POST └────────────┘ └──────────┘ └───────────┘ │ │ │ │ JS events, Receives & Time-series Charts, tables, performance validates queries alerts, exports timing, errors payloads

Analytics applies to all three participant groups: developers track errors, business tracks conversions, users benefit from a better experience. The same data answers different questions depending on who is asking.

Aspect	Server Logs	Network Capture	Client-Side Scripts
What it captures	HTTP requests received	All traffic on the wire	Any browser event or state
Requires code changes?	No — built in	No — passive	Yes — must add JS
Client events?	No — requests only	No — network only	Yes — clicks, scrolls, errors
Works with HTTPS?	Yes — on the server	Metadata only	Yes — in the browser
Privacy concerns	Moderate	High	Very High

Source	Examples
HTTP Headers	IP address, User-Agent, Referer, Accept-Language, Cookies
URL	Path, query params, UTM codes (`utm_source`, `utm_campaign`)
Server	Timestamp, status code, response size, processing time

Data	API
Viewport dimensions	`window.innerWidth/Height`
Scroll depth	Scroll events
Click coordinates	Click events
Performance timing	Performance API
JS errors	`window.onerror`
Custom events	Anything you define

Log Format	Fields	Analytics Value
Common (CLF)	IP, identity, user, timestamp, request, status, size	Basic hit counting only
Combined	CLF + Referer + User-Agent	Traffic sources, browser breakdown — the analytics minimum
Custom / Extended	Any header, cookie, variable, timing	Rich analytics: response time, viewport, device hints

Hint Header	Provides	Example
`Sec-CH-UA`	Browser brand + version	`"Chrome";v="124"`
`Sec-CH-UA-Mobile`	Mobile device?	`?0` / `?1`
`Sec-CH-UA-Platform`	Operating system	`"macOS"`
`ECT`	Connection type	`4g`, `3g`, `slow-2g`
`RTT`	Round-trip time (ms)	`50`

Aspect	First-Party	Third-Party
Data ownership	You own it completely	Vendor stores and may use it
Cookie scope	Your domain only	Vendor's domain (cross-site capable)
Privacy compliance	Simpler — you control data flows	Complex — data leaves your control
Ad blocker impact	Usually unblocked (same domain)	Often blocked (known tracking domains)
Setup effort	High — build and maintain	Low — add a script tag
Cost	Infrastructure + dev time	Perceptually "free" (you pay with data)

Attribute	Source	How It Helps
User-Agent	HTTP header	Browser + OS + version
Screen resolution	`screen.width/height`	Device type
Canvas rendering	Canvas API	GPU/driver-specific pixel differences
WebGL renderer	WebGL API	GPU hardware identifier
Installed fonts	Canvas/JS	Highly variable across systems
Timezone	`Intl.DateTimeFormat`	Geographic signal
Hardware concurrency	`navigator.hardwareConcurrency`	CPU core count
Audio fingerprint	AudioContext API	Audio stack differences

Aspect	Cookies	Fingerprinting
User can clear	Yes	No
Blocked by ad blockers	Sometimes	Harder to block
Stability	Until cleared	Until browser/OS updates
Accuracy	Exact (unique ID)	Probabilistic (~90%+)
Privacy regulation	Covered by GDPR	Also covered by GDPR
Cross-device	No (unless synced)	No (different hardware)

Aspect	Broad ("collect everything")	Targeted ("collect specific things")
Data volume	Very high	Low to moderate
Discovering unknowns	Strong — data is already there	Weak — must add instrumentation
Privacy risk	High — may collect PII unknowingly	Low — you know exactly what you have
GDPR alignment	Poor — violates data minimization	Good — purpose-limited collection

Activity	Consent?	Reasoning
Server access logs	Usually not	Operational logging = legitimate interest
First-party analytics cookies	Yes (ePrivacy)	Non-essential cookie
Fingerprinting	Yes (GDPR)	Creates personal data through processing
Session replay	Yes (GDPR)	Records behavior, may capture PII
Third-party analytics (GA)	Yes (both)	Data leaves your control; third-party cookies
Aggregate, cookie-free metrics	Often not	No personal data — no individual tracking

Method	On Page Unload
`fetch()`	Cancelled by browser
`XMLHttpRequest`	Cancelled by browser
`sendBeacon()`	Guaranteed delivery

Metric	Measures	Good Threshold	Why It Matters
LCP (Largest Contentful Paint)	Loading — main content visible	≤ 2.5s	Users see content quickly
INP (Interaction to Next Paint)	Responsiveness — input to visual update	≤ 200ms	Feels snappy to interact
CLS (Cumulative Layout Shift)	Stability — unexpected layout movement	≤ 0.1	Nothing jumps around

Aspect	RUM (Real User Monitoring)	Synthetic Monitoring
Data source	Actual user sessions	Scripted bots from controlled locations
Variability	High — real devices, real networks	Low — consistent environment
Coverage	Only pages users visit	Any page you configure
Use case	Analytics — real experience	Testing — baseline performance

Aspect	Detail
How	DOM mutations, not video
Size	~100–500KB JSON/session
Must mask	Passwords, PII, payments
Consent	Required (GDPR)
Tools	rrweb, FullStory, Hotjar

Category	Examples	Analytics Impact
Good bots	Googlebot, Bingbot, uptime monitors	Inflate page views, but wanted
Bad bots	Scrapers, spam bots, DDoS, click fraud	Corrupt all metrics, stress servers
Gray area	AI crawlers (GPTBot, ClaudeBot)	High volume, active debate on value

Concept	Key Takeaway
Why analytics	Find errors users never report, understand behavior, measure what works
Analytics stack	Collector → Storage → Reporting — you build all three
Collection methods	Server logs (automatic), network capture (obsolete), client-side (flexible)
What to collect	Headers & URLs automatically; JS adds clicks, scrolls, errors, timing
Enriching logs	Custom formats + Client Hints + script-to-header = rich log data
1st vs 3rd party	First-party = you own data; third-party = vendor builds cross-site profiles
Fingerprinting	Identifies users without cookies — effective but ethically fraught
Collection philosophy	Collect what you need, not what you might — GDPR requires purpose
Privacy & consent	Consent required for cookies, fingerprinting, replay; aggregate metrics may be exempt
Error tracking	`window.onerror` + `sendBeacon()` captures silent failures
Performance	LCP, INP, CLS — Google ranking signals, measured via RUM
User behavior	Funnel analysis reveals drop-off; absence of action is data
Session replay	DOM reconstruction, not video — powerful but privacy-sensitive
Observability	Logs + Metrics + Traces; OpenTelemetry is the vendor-neutral standard
Bot traffic	30–50% of traffic is automated — filter or metrics are meaningless
Data quality	Every method has blind spots — combine methods, accept approximation

Web Analytics

Section 1Why Analytics & The Stack

Three Purposes of Analytics

The Analytics Pipeline

Section 2Data Collection Methods

Three Collection Methods Compared

Section 3What Can Be Collected

Four Data Sources

Automatic (no code needed)

Requires JavaScript

Section 4Enriching Server Logs

Log Formats & Client Hints

Client Hints — Structured Device Info

Script-to-Header & Log Forwarding

Script-to-Header Technique

Log Forwarding

Section 5First-Party vs Third-Party

Ownership Comparison

Section 6Fingerprinting & User ID

Browser Fingerprint Attributes

Cookies vs Fingerprinting

Section 7Privacy, Consent & Collection Philosophy

Broad vs Targeted Collection

Six GDPR Principles for Analytics

What Requires Consent?

Consent Mechanisms

Section 8Error Tracking & Performance

Error Detection & sendBeacon

Server-Side

Client-Side

Why sendBeacon?

Core Web Vitals & Performance Monitoring

RUM vs Synthetic Monitoring

Section 9User Behavior & Session Replay

The Conversion Funnel

Key Behavioral Metrics

Session Replay

Section 10Observability, Bots & Data Quality

Observability & OpenTelemetry

Bot Traffic & Data Quality

Bot Types & Impact

Bot Detection Flow

Data Quality Threats

SummaryKey Takeaways

Web Analytics at a Glance