Web Analytics

Measuring the Web

"Analytics answers the questions you cannot answer just by looking at your own code. They address what happens when users meet the executing code."

CSE 135 — Full Overview | Review Questions

Section 1Why Analytics & The Stack

Three purposes, one pipeline — from collection to insight.

Three Purposes of Analytics

PurposeWho CaresWhat It Answers
Error TrackingDevelopersBroken pages, failed requests, JS exceptions users never report
User BehaviorBusiness / ProductWhich features are used, where users abandon tasks, conversion rates
Usability MeasurementUsers (indirectly)Does the interface work? Does content resonate? Do changes help or hurt?

The Analytics Pipeline

┌──────────┐ ┌────────────┐ ┌──────────┐ ┌───────────┐ │ Client │──────▶ │ Collection │──────▶ │ Storage │──────▶ │ Reporting │ │ (Browser)│ HTTP │ (Server) │ INSERT │(Database)│ SELECT │(Dashboard)│ └──────────┘ POST └────────────┘ └──────────┘ └───────────┘ │ │ │ │ JS events, Receives & Time-series Charts, tables, performance validates queries alerts, exports timing, errors payloads
Analytics applies to all three participant groups: developers track errors, business tracks conversions, users benefit from a better experience. The same data answers different questions depending on who is asking.

Section 2Data Collection Methods

Three approaches capturing data at different points in client-server communication.

Three Collection Methods Compared

AspectServer LogsNetwork CaptureClient-Side Scripts
What it capturesHTTP requests receivedAll traffic on the wireAny browser event or state
Requires code changes?No — built inNo — passiveYes — must add JS
Client events?No — requests onlyNo — network onlyYes — clicks, scrolls, errors
Works with HTTPS?Yes — on the serverMetadata onlyYes — in the browser
Privacy concernsModerateHighVery High
HTTPS makes network capture mostly obsolete for content analysis. You can see metadata (IP, timing), but not request/response content. Modern analytics relies on server logs and client-side scripts.

Section 3What Can Be Collected

Four data sources — from automatic headers to custom JavaScript events.

Four Data Sources

Automatic (no code needed)

SourceExamples
HTTP HeadersIP address, User-Agent, Referer, Accept-Language, Cookies
URLPath, query params, UTM codes (utm_source, utm_campaign)
ServerTimestamp, status code, response size, processing time

Requires JavaScript

DataAPI
Viewport dimensionswindow.innerWidth/Height
Scroll depthScroll events
Click coordinatesClick events
Performance timingPerformance API
JS errorswindow.onerror
Custom eventsAnything you define
Combine both sources. Server logs capture every request (including bots), client-side scripts capture user interaction (but require JS). Neither alone gives the full picture.

Section 4Enriching Server Logs

From 7 fields to rich analytics — log formats, Client Hints, and script-to-header.

Log Formats & Client Hints

Log FormatFieldsAnalytics Value
Common (CLF)IP, identity, user, timestamp, request, status, sizeBasic hit counting only
CombinedCLF + Referer + User-AgentTraffic sources, browser breakdown — the analytics minimum
Custom / ExtendedAny header, cookie, variable, timingRich analytics: response time, viewport, device hints

Client Hints — Structured Device Info

Hint HeaderProvidesExample
Sec-CH-UABrowser brand + version"Chrome";v="124"
Sec-CH-UA-MobileMobile device??0 / ?1
Sec-CH-UA-PlatformOperating system"macOS"
ECTConnection type4g, 3g, slow-2g
RTTRound-trip time (ms)50
Client Hints are a philosophical shift: instead of the client announcing everything (bloated User-Agent), the server asks for what it needs. Better for privacy, better for analytics. Chromium-only — Firefox/Safari don't support them.

Script-to-Header & Log Forwarding

Script-to-Header Technique

JavaScript sets cookie Browser sends cookie Server logs cookie value ─────────────────────────▶ on next request from access log format ─────────────────────▶ ─────────────────────▶ document.cookie = Cookie: _viewport= %{_viewport}C (Apache) '_viewport=1440x900' 1440x900 $cookie__viewport (Nginx)

Client data appears in server logs without a separate beacon endpoint. Limitation: cookie-based data is one request behind — works best for data that changes infrequently (viewport, timezone, DPR).

Log Forwarding

Web Server ──writes──▶ access.log ──▶ Log Shipper ──HTTPS──▶ Analysis Service (Filebeat, (ELK, Splunk, Fluentd, Datadog, or Fluent Bit) custom)

Collection is first-party; analysis may be third-party. The browser never talks to the third party — you decide what fields to forward and what to redact.

Section 5First-Party vs Third-Party

Who collects the data and where does it go?

Ownership Comparison

AspectFirst-PartyThird-Party
Data ownershipYou own it completelyVendor stores and may use it
Cookie scopeYour domain onlyVendor's domain (cross-site capable)
Privacy complianceSimpler — you control data flowsComplex — data leaves your control
Ad blocker impactUsually unblocked (same domain)Often blocked (known tracking domains)
Setup effortHigh — build and maintainLow — add a script tag
CostInfrastructure + dev timePerceptually "free" (you pay with data)
When a service is free, you or your users are the product. Third-party vendors aggregate data across all their customers' sites, building cross-site user profiles. This is exactly what GDPR targets.

Section 6Fingerprinting & User ID

Identifying users without cookies — powerful but ethically fraught.

Browser Fingerprint Attributes

AttributeSourceHow It Helps
User-AgentHTTP headerBrowser + OS + version
Screen resolutionscreen.width/heightDevice type
Canvas renderingCanvas APIGPU/driver-specific pixel differences
WebGL rendererWebGL APIGPU hardware identifier
Installed fontsCanvas/JSHighly variable across systems
TimezoneIntl.DateTimeFormatGeographic signal
Hardware concurrencynavigator.hardwareConcurrencyCPU core count
Audio fingerprintAudioContext APIAudio stack differences

Collect attributes → concatenate into string → SHA-256 hash = fingerprint ID. Over 90% of desktop browsers can be uniquely identified.

Fingerprinting works precisely because every browser is slightly different. The same features that let websites adapt to your device can be combined to identify you.

Cookies vs Fingerprinting

AspectCookiesFingerprinting
User can clearYesNo
Blocked by ad blockersSometimesHarder to block
StabilityUntil clearedUntil browser/OS updates
AccuracyExact (unique ID)Probabilistic (~90%+)
Privacy regulationCovered by GDPRAlso covered by GDPR
Cross-deviceNo (unless synced)No (different hardware)
Incognito mode does NOT defeat fingerprinting — hardware, fonts, and GPU are still the same. But unlike cookies, users cannot see or clear a fingerprint. GDPR treats fingerprints as personal data requiring consent.

Section 7Privacy, Consent & Collection Philosophy

Broad vs targeted collection, GDPR principles, and consent requirements.

Broad vs Targeted Collection

AspectBroad ("collect everything")Targeted ("collect specific things")
Data volumeVery highLow to moderate
Discovering unknownsStrong — data is already thereWeak — must add instrumentation
Privacy riskHigh — may collect PII unknowinglyLow — you know exactly what you have
GDPR alignmentPoor — violates data minimizationGood — purpose-limited collection

Six GDPR Principles for Analytics

  1. Lawfulness — Need a legal basis: consent or legitimate interest
  2. Purpose limitation — Collect for a stated purpose; don't repurpose later
  3. Data minimization — Collect only what is necessary
  4. Accuracy — Keep data correct and up to date
  5. Storage limitation — Don't keep data longer than needed
  6. Integrity & confidentiality — Protect the data you hold
GDPR's data minimization: "We might need it someday" is not a purpose. Define why you need data, how long you'll keep it, and who will access it before collecting.

What Requires Consent?

ActivityConsent?Reasoning
Server access logsUsually notOperational logging = legitimate interest
First-party analytics cookiesYes (ePrivacy)Non-essential cookie
FingerprintingYes (GDPR)Creates personal data through processing
Session replayYes (GDPR)Records behavior, may capture PII
Third-party analytics (GA)Yes (both)Data leaves your control; third-party cookies
Aggregate, cookie-free metricsOften notNo personal data — no individual tracking

Consent Mechanisms

  • Cookie banners / CMPs — Present choices before analytics loads (EU required)
  • Do Not Track (DNT) — Browser signal, universally ignored, effectively dead
  • Global Privacy Control (GPC) — Successor to DNT, legally recognized under CCPA
  • Privacy-preserving tools — Plausible, Fathom, Umami — designed to avoid consent entirely

Section 8Error Tracking & Performance

Finding silent failures and measuring real user experience.

Error Detection & sendBeacon

Server-Side

  • Filter logs for 4xx (broken links, missing resources)
  • Filter logs for 5xx (crashes, timeouts)
  • No code changes needed — just log analysis

Client-Side

  • window.addEventListener('error', ...) catches JS exceptions
  • unhandledrejection catches Promise failures
  • Resource load failures (images, scripts, CSS)

Why sendBeacon?

MethodOn Page Unload
fetch()Cancelled by browser
XMLHttpRequestCancelled by browser
sendBeacon()Guaranteed delivery
navigator.sendBeacon() was designed for analytics — it guarantees delivery even during page unload. Essential for capturing exit events and errors.

Core Web Vitals & Performance Monitoring

MetricMeasuresGood ThresholdWhy It Matters
LCP (Largest Contentful Paint)Loading — main content visible≤ 2.5sUsers see content quickly
INP (Interaction to Next Paint)Responsiveness — input to visual update≤ 200msFeels snappy to interact
CLS (Cumulative Layout Shift)Stability — unexpected layout movement≤ 0.1Nothing jumps around

RUM vs Synthetic Monitoring

AspectRUM (Real User Monitoring)Synthetic Monitoring
Data sourceActual user sessionsScripted bots from controlled locations
VariabilityHigh — real devices, real networksLow — consistent environment
CoverageOnly pages users visitAny page you configure
Use caseAnalytics — real experienceTesting — baseline performance
Performance analytics is where developer, business, and user perspectives converge. Developers see slow queries. Business sees lost conversions. Users see a sluggish experience. A single LCP measurement captures all three.

Section 9User Behavior & Session Replay

From aggregate funnels to individual story — what users do (and don't do).

The Conversion Funnel

Page Views (100%) ──▶ Engagement (60%) ──▶ Action (15%) ──▶ Conversion (5%) All visitors Scroll, click, Add to cart, Purchase, spend >10s start form submit, sign up ~60% continue ~25% continue ~30% continue

Key Behavioral Metrics

  • Bounce rate — leave after one page
  • Scroll depth — how far they read
  • Click patterns — what they interact with
  • Rage clicks — frustration signals
  • Form abandonment — sticky fields

Session Replay

AspectDetail
HowDOM mutations, not video
Size~100–500KB JSON/session
Must maskPasswords, PII, payments
ConsentRequired (GDPR)
Toolsrrweb, FullStory, Hotjar
The biggest usability insights come from what users do NOT do. 1000 views but 0 CTA clicks means it's invisible or irrelevant.

Section 10Observability, Bots & Data Quality

System telemetry, automated traffic, and the limits of measurement.

Observability & OpenTelemetry

┌───────────────┐ │ Observability │ └──────┬────────┘ ┌─────────────────┼─────────────────┐ ▼ ▼ ▼ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ Logs │ │ Metrics │ │ Traces │ │ │ │ │ │ │ │ What │ │ How is │ │ Where │ │ happened │ │ it doing │ │ did the │ │ │ │ │ │ request │ │ Discrete │ │ Numeric │ │ go │ │ events │ │ time- │ │ │ │ with │ │ series │ │ End-to- │ │ context │ │ data │ │ end path │ └──────────┘ └──────────┘ └──────────┘
  • Analytics = observability for user behavior
  • Observability = analytics for system behavior
  • OpenTelemetry (OTel) = vendor-neutral standard for traces, metrics, logs
OpenTelemetry is to observability what HTTP is to the web: a shared protocol that lets different tools work together. Instrument once, send data to any backend. Prefer protocols over platforms.

Bot Traffic & Data Quality

Bot Types & Impact

CategoryExamplesAnalytics Impact
Good botsGooglebot, Bingbot, uptime monitorsInflate page views, but wanted
Bad botsScrapers, spam bots, DDoS, click fraudCorrupt all metrics, stress servers
Gray areaAI crawlers (GPTBot, ClaudeBot)High volume, active debate on value

Bot Detection Flow

Request ─┬── Check IP reputation ──▶ Known bot IP? ──▶ Flag/block ├── Check User-Agent ────▶ Known bot UA? ──▶ Flag or block ├── Check rate ──────────▶ Abnormal freq? ──▶ Rate limit ├── Check JS execution ──▶ Beacon fired? ──▶ No = likely bot └── Check behavior ──────▶ Mouse/scroll? ──▶ No = likely bot

Data Quality Threats

  • Bots inflate views • Ad blockers block scripts • Cookie clearing breaks sessions
  • CDN caching hides requests • Device switching splits identity • VPN skews geo
  • JS disabled = no client analytics • Incognito = no persistence
No analytics system gives you ground truth. Every method has blind spots. The simplest bot filter: compare server logs to client-side beacon data — no beacon = likely bot. Combine methods and accept approximation.

SummaryKey Takeaways

17 sections of web analytics in one table.

Web Analytics at a Glance

ConceptKey Takeaway
Why analyticsFind errors users never report, understand behavior, measure what works
Analytics stackCollector → Storage → Reporting — you build all three
Collection methodsServer logs (automatic), network capture (obsolete), client-side (flexible)
What to collectHeaders & URLs automatically; JS adds clicks, scrolls, errors, timing
Enriching logsCustom formats + Client Hints + script-to-header = rich log data
1st vs 3rd partyFirst-party = you own data; third-party = vendor builds cross-site profiles
FingerprintingIdentifies users without cookies — effective but ethically fraught
Collection philosophyCollect what you need, not what you might — GDPR requires purpose
Privacy & consentConsent required for cookies, fingerprinting, replay; aggregate metrics may be exempt
Error trackingwindow.onerror + sendBeacon() captures silent failures
PerformanceLCP, INP, CLS — Google ranking signals, measured via RUM
User behaviorFunnel analysis reveals drop-off; absence of action is data
Session replayDOM reconstruction, not video — powerful but privacy-sensitive
ObservabilityLogs + Metrics + Traces; OpenTelemetry is the vendor-neutral standard
Bot traffic30–50% of traffic is automated — filter or metrics are meaningless
Data qualityEvery method has blind spots — combine methods, accept approximation