Measuring the Web
"Analytics answers the questions you cannot answer just by looking at your own code. They address what happens when users meet the executing code."
CSE 135 — Full Overview | Review Questions
Three purposes, one pipeline — from collection to insight.
| Purpose | Who Cares | What It Answers |
|---|---|---|
| Error Tracking | Developers | Broken pages, failed requests, JS exceptions users never report |
| User Behavior | Business / Product | Which features are used, where users abandon tasks, conversion rates |
| Usability Measurement | Users (indirectly) | Does the interface work? Does content resonate? Do changes help or hurt? |
Three approaches capturing data at different points in client-server communication.
| Aspect | Server Logs | Network Capture | Client-Side Scripts |
|---|---|---|---|
| What it captures | HTTP requests received | All traffic on the wire | Any browser event or state |
| Requires code changes? | No — built in | No — passive | Yes — must add JS |
| Client events? | No — requests only | No — network only | Yes — clicks, scrolls, errors |
| Works with HTTPS? | Yes — on the server | Metadata only | Yes — in the browser |
| Privacy concerns | Moderate | High | Very High |
Four data sources — from automatic headers to custom JavaScript events.
| Source | Examples |
|---|---|
| HTTP Headers | IP address, User-Agent, Referer, Accept-Language, Cookies |
| URL | Path, query params, UTM codes (utm_source, utm_campaign) |
| Server | Timestamp, status code, response size, processing time |
| Data | API |
|---|---|
| Viewport dimensions | window.innerWidth/Height |
| Scroll depth | Scroll events |
| Click coordinates | Click events |
| Performance timing | Performance API |
| JS errors | window.onerror |
| Custom events | Anything you define |
From 7 fields to rich analytics — log formats, Client Hints, and script-to-header.
| Log Format | Fields | Analytics Value |
|---|---|---|
| Common (CLF) | IP, identity, user, timestamp, request, status, size | Basic hit counting only |
| Combined | CLF + Referer + User-Agent | Traffic sources, browser breakdown — the analytics minimum |
| Custom / Extended | Any header, cookie, variable, timing | Rich analytics: response time, viewport, device hints |
| Hint Header | Provides | Example |
|---|---|---|
Sec-CH-UA | Browser brand + version | "Chrome";v="124" |
Sec-CH-UA-Mobile | Mobile device? | ?0 / ?1 |
Sec-CH-UA-Platform | Operating system | "macOS" |
ECT | Connection type | 4g, 3g, slow-2g |
RTT | Round-trip time (ms) | 50 |
Client data appears in server logs without a separate beacon endpoint. Limitation: cookie-based data is one request behind — works best for data that changes infrequently (viewport, timezone, DPR).
Collection is first-party; analysis may be third-party. The browser never talks to the third party — you decide what fields to forward and what to redact.
Who collects the data and where does it go?
| Aspect | First-Party | Third-Party |
|---|---|---|
| Data ownership | You own it completely | Vendor stores and may use it |
| Cookie scope | Your domain only | Vendor's domain (cross-site capable) |
| Privacy compliance | Simpler — you control data flows | Complex — data leaves your control |
| Ad blocker impact | Usually unblocked (same domain) | Often blocked (known tracking domains) |
| Setup effort | High — build and maintain | Low — add a script tag |
| Cost | Infrastructure + dev time | Perceptually "free" (you pay with data) |
Identifying users without cookies — powerful but ethically fraught.
| Attribute | Source | How It Helps |
|---|---|---|
| User-Agent | HTTP header | Browser + OS + version |
| Screen resolution | screen.width/height | Device type |
| Canvas rendering | Canvas API | GPU/driver-specific pixel differences |
| WebGL renderer | WebGL API | GPU hardware identifier |
| Installed fonts | Canvas/JS | Highly variable across systems |
| Timezone | Intl.DateTimeFormat | Geographic signal |
| Hardware concurrency | navigator.hardwareConcurrency | CPU core count |
| Audio fingerprint | AudioContext API | Audio stack differences |
Collect attributes → concatenate into string → SHA-256 hash = fingerprint ID. Over 90% of desktop browsers can be uniquely identified.
| Aspect | Cookies | Fingerprinting |
|---|---|---|
| User can clear | Yes | No |
| Blocked by ad blockers | Sometimes | Harder to block |
| Stability | Until cleared | Until browser/OS updates |
| Accuracy | Exact (unique ID) | Probabilistic (~90%+) |
| Privacy regulation | Covered by GDPR | Also covered by GDPR |
| Cross-device | No (unless synced) | No (different hardware) |
Broad vs targeted collection, GDPR principles, and consent requirements.
| Aspect | Broad ("collect everything") | Targeted ("collect specific things") |
|---|---|---|
| Data volume | Very high | Low to moderate |
| Discovering unknowns | Strong — data is already there | Weak — must add instrumentation |
| Privacy risk | High — may collect PII unknowingly | Low — you know exactly what you have |
| GDPR alignment | Poor — violates data minimization | Good — purpose-limited collection |
| Activity | Consent? | Reasoning |
|---|---|---|
| Server access logs | Usually not | Operational logging = legitimate interest |
| First-party analytics cookies | Yes (ePrivacy) | Non-essential cookie |
| Fingerprinting | Yes (GDPR) | Creates personal data through processing |
| Session replay | Yes (GDPR) | Records behavior, may capture PII |
| Third-party analytics (GA) | Yes (both) | Data leaves your control; third-party cookies |
| Aggregate, cookie-free metrics | Often not | No personal data — no individual tracking |
Finding silent failures and measuring real user experience.
4xx (broken links, missing resources)5xx (crashes, timeouts)window.addEventListener('error', ...) catches JS exceptionsunhandledrejection catches Promise failures| Method | On Page Unload |
|---|---|
fetch() | Cancelled by browser |
XMLHttpRequest | Cancelled by browser |
sendBeacon() | Guaranteed delivery |
navigator.sendBeacon() was designed for analytics — it guarantees delivery even during page unload. Essential for capturing exit events and errors.
| Metric | Measures | Good Threshold | Why It Matters |
|---|---|---|---|
| LCP (Largest Contentful Paint) | Loading — main content visible | ≤ 2.5s | Users see content quickly |
| INP (Interaction to Next Paint) | Responsiveness — input to visual update | ≤ 200ms | Feels snappy to interact |
| CLS (Cumulative Layout Shift) | Stability — unexpected layout movement | ≤ 0.1 | Nothing jumps around |
| Aspect | RUM (Real User Monitoring) | Synthetic Monitoring |
|---|---|---|
| Data source | Actual user sessions | Scripted bots from controlled locations |
| Variability | High — real devices, real networks | Low — consistent environment |
| Coverage | Only pages users visit | Any page you configure |
| Use case | Analytics — real experience | Testing — baseline performance |
From aggregate funnels to individual story — what users do (and don't do).
| Aspect | Detail |
|---|---|
| How | DOM mutations, not video |
| Size | ~100–500KB JSON/session |
| Must mask | Passwords, PII, payments |
| Consent | Required (GDPR) |
| Tools | rrweb, FullStory, Hotjar |
System telemetry, automated traffic, and the limits of measurement.
| Category | Examples | Analytics Impact |
|---|---|---|
| Good bots | Googlebot, Bingbot, uptime monitors | Inflate page views, but wanted |
| Bad bots | Scrapers, spam bots, DDoS, click fraud | Corrupt all metrics, stress servers |
| Gray area | AI crawlers (GPTBot, ClaudeBot) | High volume, active debate on value |
17 sections of web analytics in one table.
| Concept | Key Takeaway |
|---|---|
| Why analytics | Find errors users never report, understand behavior, measure what works |
| Analytics stack | Collector → Storage → Reporting — you build all three |
| Collection methods | Server logs (automatic), network capture (obsolete), client-side (flexible) |
| What to collect | Headers & URLs automatically; JS adds clicks, scrolls, errors, timing |
| Enriching logs | Custom formats + Client Hints + script-to-header = rich log data |
| 1st vs 3rd party | First-party = you own data; third-party = vendor builds cross-site profiles |
| Fingerprinting | Identifies users without cookies — effective but ethically fraught |
| Collection philosophy | Collect what you need, not what you might — GDPR requires purpose |
| Privacy & consent | Consent required for cookies, fingerprinting, replay; aggregate metrics may be exempt |
| Error tracking | window.onerror + sendBeacon() captures silent failures |
| Performance | LCP, INP, CLS — Google ranking signals, measured via RUM |
| User behavior | Funnel analysis reveals drop-off; absence of action is data |
| Session replay | DOM reconstruction, not video — powerful but privacy-sensitive |
| Observability | Logs + Metrics + Traces; OpenTelemetry is the vendor-neutral standard |
| Bot traffic | 30–50% of traffic is automated — filter or metrics are meaningless |
| Data quality | Every method has blind spots — combine methods, accept approximation |