Behavioral Analytics

From Clicks to Decisions

"Behavioral analytics sits at the intersection of engineering, product management, marketing, and ethics. The same infrastructure that helps a developer discover a confusing checkout flow can also enable surveillance advertising."

CSE 135 — Full Overview

Section 1The Site Visitation Model

The shared vocabulary for measuring user activity — from raw hits to actionable segments.

The Measurement Hierarchy

+-----------+ | Segments | Clusters sharing traits +-----+-----+ | +-----+-----+ | Users | Ideally a person; really an identifier +-----+-----+ | +-----+-----+ | Sessions | 30 min inactivity timeout (convention) +-----+-----+ | +-----+-----+ | Events | Clicks, scrolls, form submits +-----+-----+ | +-----+-----+ | Pageviews | Page load or SPA virtual navigation +-----+-----+ | +-----+-----+ | Hits | Any HTTP request (images, CSS, JS...) +-----+-----+
Bottom up: Each layer builds on the one below. Hits are noise; pageviews approximate attention; events capture action; sessions group visits; users attempt identity; segments enable action.

Key Definitions

LevelWhat It IsWatch Out
HitAny server request (1 page = dozens of hits)Useless for behavior; useful for capacity
PageviewOne HTML document loadSPAs break this; bots inflate it
EventDiscrete user action within a pageGA4 treats everything as events — including pageviews
SessionGroup of interactions; 30 min timeoutTimeout is convention, not law; can be tuned to shape narrative
UserAn identifier (cookie, login, fingerprint)Same person = 2–4 "unique users" across devices
SegmentFiltered subset by traitsAggregate numbers hide the story; segments reveal it
"Unique visitor" is a useful fiction. Cookie-based counts are inflated 20–40%. Treat them as rough order of magnitude, never headcounts.

Connecting the Hierarchy to Outcomes

Activity data is meaningless without outcomes — the conversions and KPIs that give it purpose:

  • Hit-level: Did the tracking beacon fire?
  • Pageview-level: Which pages appear in converting sessions?
  • Event-level: Events are the micro-conversions (add to cart, form complete)
  • Session-level: Conversion rate per session; sessions before first conversion
  • User-level: Lifetime value; which channels produce converting users?
  • Segment-level: Which segments convert at higher rates? Where to invest?
Jargon is rhetoric. Choosing to report "1.5 million hits" vs. "1,133 visitors" is the same data told two ways. Data stories can be shaped as easily as normal stories — data !== truth.

Section 2Engagement Metrics

Measuring the quality of a visit, not just the quantity.

Scroll Depth & Time Metrics

MetricDefinitionMeasurement
Scroll DepthHow far down the page a user scrolls (25/50/75/100%)Intersection Observer API (async, performant)
Time on PageGap between consecutive pageview timestampsUndefined for the last page in a session
Dwell TimeSERP click to SERP returnSearch engine only — not available to site owners
Attention TimePage visible + user activePage Visibility API + interaction heartbeat
The last-page problem: The page where the user found their answer and left satisfied has no "time on page" measurement at all — there is no next pageview to compute the gap.

Bounce Rate

Bounce rate = percentage of single-page sessions with no further interaction.

Classic GA (Universal Analytics)

Single-pageview session = bounce. Simple count.

GA4

Bounce = 1 − engagement rate. Engaged = >10s, or conversion, or 2+ pageviews.

Same user behavior, different bounce rates depending on tool.

Satisfied bounce vs. dissatisfied bounce: User finds pharmacy hours in 10 seconds and leaves = success. User clicks misleading ad and leaves in frustration = failure. Both register as identical bounces. Context is everything.

Click Patterns & Engagement Traps

  • Rage clicks: Rapid repeated clicking on unresponsive elements — frustration signal
  • Dead clicks: Clicks on non-interactive elements users expected to be clickable
  • Ghost clicks: Accidental taps on mobile from scrolling
Engagement is not goodness. A 10-second visit where the user finds their answer instantly is a better experience than a 5-minute visit where they struggle through confusing navigation. High engagement can mean frustration, not satisfaction.

Section 3Outcomes & Dark Patterns

Conversions, KPIs, vanity metrics — and how organizations game them.

Conversions & the Funnel

  • Macro conversions: Primary objectives — purchase, signup, subscription
  • Micro conversions: Progress indicators — add to cart, watch video, download whitepaper
Awareness 100,000 visitors Impressions, reach +-------------------------------------------------+ | Interest 20,000 engaged Scroll, time | | +-------------------------------------------+ | | | Consideration 5,000 signups Form starts | | | | +-------------------------------------+ | | | | | Conversion 500 purchases Orders | | | | | | +-----------------------------+ | | | | | | | Retention 150 repeat | | | | | | | +-----------------------------+ | | | | | +-------------------------------------+ | | | +-------------------------------------------+ | +-------------------------------------------------+ Conv Rate = 500 / 100,000 = 0.5%

KPIs vs. Vanity Metrics

Vanity MetricActionable KPI
Total pageviewsPageviews per session
Total registered usersMonthly active users (MAU)
Social media followersConversion rate from social
App downloadsDay-7 retention rate
Email list sizeEmail click-through rate
The vanity metric trap: Adding pop-ups increases email signups (vanity) while decreasing conversion rate (KPI). Always ask: "If this number goes up, do we know what to do differently?"

Dark Patterns & Analytics Manipulation

A dark pattern is UI design that deliberately manipulates users into unintended actions.

PatternHow It WorksMetric Inflated
Confirmshaming"No thanks, I don't want to save money"Email signups
Roach motelEasy signup, impossible cancellationRegistered users, retention
Sneak into basketPre-added insurance/warranties in cartAverage order value
Disguised adsAds styled as content or download buttonsClick-through rate
Infinite scrollUnnecessary pagination / content splittingPageviews, time on site
Dark patterns corrupt the feedback loop. You end up measuring your own manipulation, not user intent. Metrics go green; customer trust erodes invisibly.

Sections 4–5Attribution & Its Challenges

Who gets credit for a conversion? Everyone claims it; nobody deserves all of it.

Attribution Models

User journey: Social → Search → Email → Paid Ad → Conversion

Model Social Search Email Paid Ad Logic -------------- ------ ------ ----- ------- ------------------------- Last-Click 0% 0% 0% 100% Last touch gets all First-Click 100% 0% 0% 0% First touch gets all Linear 25% 25% 25% 25% Equal credit Time-Decay 10% 20% 30% 40% Recent touches weighted Position-Based 40% 10% 10% 40% 40/20/40 first-last-middle Data-Driven 15% 35% 20% 30% ML black box
Attribution is a comforting fiction. It shows which touchpoints preceded a conversion, not which caused it. A user going to buy anyway still clicks an ad — the ad gets credit.

Attribution Challenges

  • Cross-device: Phone, laptop, desktop = three "users" unless login-based identity
  • ITP (Safari): First-party JS cookies capped at 7 days; 24 hours if from a known tracker
  • Walled gardens: Facebook + Google + email each claim full credit → total exceeds actual conversions by 30–60%
  • Third-party cookie death: Safari/Firefox already block; Chrome restricting. Alternatives: Privacy Sandbox, server-side tracking, contextual ads
Incrementality testing is the gold standard: withhold ads from a holdout group, compare conversion rates. Almost nobody does it — because the results are sobering.

Sections 6–7Google Analytics & Tag Managers

The ~85% monopoly and the script injection layer that feeds it.

Google Analytics: History & Conflict

  • Urchin (pre-2005): Software on your server — you owned the data
  • GA Classic (2005): SaaS — Google hosts your data
  • Universal Analytics (2012): Session-based, deep Ads integration
  • GA4 (2020): Event-based, ML, even deeper Ads integration

Google is simultaneously the largest ad seller, the dominant analytics platform, and the dominant search engine. Each role creates conflicts with the others.

"Free" doesn't mean free. Every conversion you track enriches the system that sells ads back to you. You are simultaneously the customer and the product.

Tag Managers: Dev-Configured vs. TMS

Developer-Configured Tag Management System ======================== ======================== analytics.track( <script> GTM </script> 'add_to_cart', | { sku, price }) v | GTM Container (Web UI) v +------------------+ Your /collect endpoint | GA4 pageview | - validate | FB pixel | - sessionize | Hotjar, LinkedIn | - store | ...15 more tags | +------------------+ Pros: full control, Pros: marketing autonomy minimal JS, no 3P Cons: performance tax Cons: requires dev work (500KB-2MB JS), security for every change risk, "tag soup"
Treat GTM publish access like production deploy access. Anyone with GTM access can inject arbitrary JS into your production site — XSS, data exfiltration, and broken functionality are all one careless publish away.

Section 8A/B Testing & Optimization

The scientific method applied to web design.

A/B Test Lifecycle

1. HYPOTHESIS "Changing CTA to 'Get Started' will increase signups by 10%" | 2. DESIGN Control (A): "Submit" | Treatment (B): "Get Started" | 3. RANDOMIZE Cookie/user-ID bucketing, 50/50 split | 4. RUN Collect data until statistical significance | 5. ANALYZE Compare rates. p < 0.05? Confidence interval? | 6. DECIDE Significant → ship winner Not significant → inconclusive (not "A wins")

Client-Side Split

JS modifies DOM after load. Flicker problem: original flashes before variant appears.

Server-Side Split

Server renders variant before sending HTML. No flicker, full control, requires dev integration.

Statistical Rigor

Statistical significance is not optional. Most tests need thousands of conversions (not visits). Peeking early and declaring a winner invalidates the statistics. If your site has low traffic, you probably cannot run valid A/B tests — and invalid ones are worse than not testing.

Multivariate testing (MVT): Test multiple elements simultaneously (3 headlines × 2 images × 2 buttons = 12 combinations). Requires enormous traffic. Impractical for most sites.

Sections 9–10Session Replay & Usability

Watching what users do (DOM reconstruction, not video) and measuring UX at scale.

Session Replay Pipeline

1. CAPTURE (Browser) 2. TRANSMIT 3. RECONSTRUCT +-----------------------+ +----------------+ +---------------------+ | Initial DOM snapshot | --> | Compressed | -> | Rebuild initial DOM | | MutationObserver | --> | event stream | | Apply mutations | | (DOM changes) | | ~50-200 KB/min | | in sequence | | Mouse/scroll/input | --> | per session | | Overlay mouse/scroll| | CSS snapshots | --> | | | Redact PII | +-----------------------+ +----------------+ +---------------------+

Fidelity challenges: CSS-in-JS (runtime classes), Shadow DOM (invisible to MutationObserver), Canvas/WebGL (no DOM to capture), cross-origin iframes (sandboxed).

Nobody watches replays at scale. 10,000 daily sessions = 10,000 recordings. The value is automated analysis: frustration scoring, error correlation, aggregated heatmaps. Watch replays to investigate hypotheses, not to fish for insights.

The HEART Framework

Google's structured approach to measuring UX at scale:

DimensionWhat It MeasuresExample Metric
HappinessSatisfaction, attitudesCSAT score after task
EngagementDepth & frequency of use7-day active users / total
AdoptionNew users picking up features% tried feature X in 7 days
RetentionUsers coming backDay-30 retention rate
Task SuccessCan users accomplish goals?Checkout completion rate
Analytics is the smoke detector, not the investigator. It tells you that something changed. Session replay, interviews, and usability testing tell you why.

Sections 11–12Voice of Customer & Interpretation Pitfalls

The qualitative "why" that analytics cannot provide — and the ways we misread the data we do have.

The Quant/Qual Matrix

Quantitative Qualitative (how many, how much) (why, how it feels) +------------------------+-------------------------+ | | | Behavioral| Analytics, A/B tests, | Session replay, | (what | heatmaps, funnels | usability testing, | users do)| | field/diary studies | +------------------------+-------------------------+ | | | Attitudi- | Surveys (CSAT, NPS), | User interviews, focus | nal (what | card sorting | groups, open-ended | users | | feedback, contextual | say) | | inquiry | +------------------------+-------------------------+

Strongest insights come from combining quadrants. Analytics alone knows what but not why.

NPS has become a cult metric. It collapses a rich distribution into 3 buckets, is culturally biased, and "would you recommend?" doesn't apply to many products. A single number cannot capture user sentiment.

Data Interpretation Pitfalls

  • Unique visitor illusion: Cookie-based counts inflated 20–40% by cross-device/browser
  • Inferring intent from clicks: Clicks ≠ interest — accidental, comparison, Back button
  • Single-metric thinking: Optimizing bounce rate with interstitials → satisfaction drops
  • Averages hiding distributions: 2:30 avg session = bimodal (60% at 10s, 40% at 5min) — describes nobody
  • Denominator problem: 3% conversion of all visitors vs. 65% of cart additions = same event, wildly different rates

Simpson's Paradox

A metric can go up in every segment but down overall:

SegmentWeek 1Week 2Trend
Mobile (small → large volume)2% of 1,0003% of 8,000Up
Desktop (large → small volume)8% of 9,0009% of 2,000Up
Overall7.4%3.6%Down
Bounce rate is the most misunderstood metric in web analytics. 90% bounce on a blog post that answers the user's question in 10 seconds is a success. Bounce rate without context is noise.

Sections 13–14JS Availability & Data Quality

The blind spots baked into every behavioral analytics system.

JavaScript Availability

Gov.uk study: ~1.1% of users did not receive JS-enhanced pages. The causes:

CauseNature
Network interruption (JS failed to download)Delivery failure (most common)
Corporate proxy/firewall stripping scriptsDelivery failure
Prior script error breaking subsequent scriptsCascading failure
Browser extension blocking (ad blocker)Deliberate (~15–30% desktop)
User deliberately disabled JSNegligible (<0.1%)
The "2% have JS disabled" talking point is wrong about the cause. Almost nobody deliberately disables JS. The issue is delivery failure. Your analytics script is subject to the same failures — systematic blind spot for users having the worst experience.

Behavioral Data Quality

  • Identity resolution: Cookie-based (fails on clear/ITP), login-based (only when authenticated), fingerprinting (unstable, privacy-hostile)
  • Session stitching: Connecting pre-login anonymous activity to authenticated identity — requires rewriting historical records
  • Consent-biased sampling: Users who accept cookies differ systematically from those who reject. Analytics measures a filtered population.
  • Survivorship bias: Funnel analysis only shows users who entered — can't show those blocked before step 1
  • Client clock drift: User device clocks can be wrong by minutes or hours — affects duration and ordering
Consent bias is systematic sampling error. If 60% consent, you measure a biased 60%, not a random 60%. Consent rates: 90%+ in some markets, 40% in Germany.

Sections 15–17Privacy, Infrastructure & Abuse

From consent banners to data warehouses to surveillance profiles.

GDPR & Consent Reality

Dark pattern consent banners:

  • "Accept All" prominent and colored; "Reject" hidden or absent
  • Pre-checked boxes (explicitly illegal under GDPR)
  • Cookie walls: "accept tracking or leave"
  • Consent theater: user clicks "reject," tracking fires anyway

Data Minimization Tension

GoalData NeededTension
Page popularityURL + countLow
User journeysSession-level sequenceMedium
Cross-session behaviorPersistent user IDHigh
Session replayFull DOM + interactionsVery high

Data Infrastructure at Scale

COLLECTION INGESTION STORAGE ACTIVATION +-----------+ +----------+ +--------------+ +-----------+ | Web beacon| ---> | | --> | Data Lake | | Dashboards| | Mobile SDK| ---> | ETL/ELT | --> | (S3/GCS) | | ML Models | | Server log| ---> | Fivetran | --> | raw, cheap | | | | CRM | ---> | Airbyte | +------+-------+ +-----------+ | Ad data | ---> | | v +-----------+ +-----------+ +----------+ +--------------+ | Ad sync | | Warehouse |-->| Email/CRM | | (BigQuery, | +-----------+ | Snowflake) | +------+-------+ v +--------------+ | CDP (Segment)| | Identity + | | activation | +--------------+

The Privacy Abuse Trajectory

Level 1: Anonymous Pageviews Server logs. Harmless. | Level 2: Behavioral Tracking Session replay, clicks. Still first-party. | Level 3: Persistent Identity Cookies, fingerprinting. Profile grows. | Level 4: Cross-Site Tracking 3P cookies, ad networks. Off-site history. | Level 5: Data Enrichment Brokers: income, health, politics. Matched. | Level 6: Advertising Sync Facebook/Google audience sync. Retargeting. | Level 7: Surveillance Profile Identity graphs. 1000+ attributes. Sold.
The gentle slope from analytics to surveillance. Each step is individually justifiable. The cumulative result is a profile compiled without meaningful consent. No single act of malice — this is what makes it insidious.

Sections 18–19Choosing a Stack & The Zero-Click Future

Practical choices today — and why the entire model may be changing.

Choosing Your Analytics Stack

FactorSimple / PrivacyProduct AnalyticsEnterprise
ToolsPlausible, FathomPostHog, AmplitudeAdobe, GA4 + BigQuery
Budget$0–$20/mo$0–$2K/mo$10K+/mo
SetupSingle script tagEvent taxonomy + SDKWarehouse + ETL + team
GDPRCookie-free, no consentConsent requiredDPA + legal + CMP
Key strengthPageviews, referrersFunnels, cohorts, retentionAttribution, segmentation
The course project exposes what commercial tools hide. Building a collector teaches you sendBeacon() silently drops oversized payloads. Building a sessionizer exposes the arbitrary 30-minute timeout. Building a dashboard shows how easy it is to present misleading aggregations.

LLMs & the Zero-Click Future

Traditional Model LLM-Mediated Model ========================= ========================= User has question User has question | | v v Search engine results LLM synthesizes answer | from your content + others v | User visits your site User reads answer | <-- measurable | <-- invisible v v Analytics captures it Analytics captures nothing
  • No visit = no data. No pageview, no event, no session. Your analytics shows nothing.
  • Funnel top vanishes. Awareness/interest stages are most affected — answered by AI before discovery.
  • Content ROI unmeasurable. Your content has impact via LLM citations, but no tool can measure it.
  • Precedent: Google News already did this to journalism — summaries replace visits. LLMs do it at larger scale.
Analytics built for the click-through era may not survive the answer era. ~60% of Google searches already result in zero clicks. The users you can measure are increasingly an unrepresentative sample.

Summary

Key takeaways from behavioral analytics.

Key Takeaways

TopicTakeaway
Visitation ModelHits → pageviews → events → sessions → users → segments. "Unique user" is a fiction.
EngagementScroll depth, time, clicks measure quality — but high engagement can mean frustration
OutcomesKPIs connect behavior to value; vanity metrics and dark patterns corrupt the feedback loop
AttributionAll models are simplifications. Walled gardens double-count. Incrementality testing is the only causal answer.
GA & Tag Managers~85% monopoly; "free" = you train ad models. TMS adds performance/security risk.
A/B TestingScientific method for the web — but requires statistical rigor most sites lack
Replay & UsabilityDOM reconstruction, not video. HEART framework. Analytics detects; qualitative research diagnoses.
VoC & PitfallsCombine quant + qual. Single-metric thinking, Simpson's paradox, averages — the default is to misread data.
Data QualityJS delivery failures, consent bias, identity resolution — every source has systematic blind spots
Privacy & AbuseGentle slope from analytics to surveillance. Each step individually justifiable; cumulative result is insidious.
LLMs & Zero-ClickWhen answers happen off-site, visit-based analytics has a structural blind spot for your most valuable audience