Making Decisions with Data

You have built the entire pipeline — collector, ingestion, storage, API, and dashboard. Now comes the part that justifies all of it: using the data to make better engineering decisions. Analytics is not about collecting data; it is about acting on it.

1. The Purpose of Analytics

Analytics exists to answer questions and drive action. If no one looks at the dashboard or changes behavior because of it, the entire pipeline is wasted engineering effort. Every metric should connect to a decision.

Before you collect a single data point, you should be able to answer: what will I do differently if this number goes up? What will I do if it goes down? If you cannot answer that question, you do not need that metric.

Consider the pipeline you built:

The collector captures pageviews, performance, errors, and custom events
The server processing layer validates, enriches, and sessionizes the data
The storage layer organizes it into queryable tables
The reporting API exposes it as JSON endpoints
The dashboard renders it as charts, tables, and summary cards

All of that infrastructure exists to support one thing: a human making a better decision than they would have made without the data. The decision might be "we need to optimize images on the landing page" or "we should fix the TypeError that is hitting 50 users per day" or "the redesign improved bounce rate by 12%, keep it." Without data, these are guesses. With data, they are informed engineering choices.

2. Actionable vs Vanity Metrics

Vanity metrics look impressive in reports but do not drive decisions. They go up and to the right, which feels good, but they do not tell you what to change. Actionable metrics connect directly to specific actions you can take.

Examples of Actionable Metrics

"Avg load time increased 500ms this week" — investigate what changed, review recent deployments, optimize the bottleneck
"Bounce rate on /pricing is 80%" — redesign the page, improve content clarity, test different layouts
"TypeError in checkout flow: 50 occurrences/day" — find the bug, fix it, deploy, verify the count drops to zero

Vanity vs Actionable

Vanity Metric	Actionable Alternative
Total pageviews (all time)	Pageviews per session trend (week over week)
Total users ever	New sessions this week vs last week
Total errors logged	Error rate per 1,000 pageviews (trending up or down?)
Average load time (all time)	p75 load time this week vs last week
Total pages on the site	Pages with zero views in the last 30 days
Number of API endpoints	API error rate by endpoint (which ones are failing?)

The pattern is clear: actionable metrics have a time dimension and a comparison. "Total pageviews" is a number. "Pageviews per session, this week vs last week" is a signal that tells you whether engagement is improving or declining. One is a trophy; the other is a compass.

The test for actionable metrics: If the number changes, do you know what to do next? If the answer is "celebrate" or "worry" but not a specific action, it is a vanity metric.

3. Performance Budgets

A performance budget is a set of quantitative thresholds that your site must not exceed. Use your analytics data to set realistic budgets and your dashboard to monitor compliance. The Performance Budget Calculator can help you define these targets.

Setting Budgets from Real Data

Do not invent budgets from thin air. Use your analytics baseline:

Query your performance table for the current p75 values of each metric
Compare against Google's Web Vitals thresholds (LCP < 2500ms, CLS < 0.1, INP < 200ms)
Set your budget at the better of: your current p75 or the "good" threshold
Track progress weekly on the dashboard

Example

"Our p75 LCP is 3200ms. The budget is 2500ms. We need to optimize images on the landing page." That is a complete decision chain: metric (LCP), current value (3200ms), target (2500ms), action (optimize images), and the page to focus on (landing page). The dashboard shows whether you are hitting your targets after each change.

Metric	Current p75	Budget	Status
LCP	3200ms	2500ms	Over budget — action needed
CLS	0.05	0.1	Under budget — healthy
INP	180ms	200ms	Under budget — but close, monitor
Load Time	4100ms	3000ms	Over budget — investigate

Budgets are not aspirational goals. They are hard limits. When you exceed a budget, it triggers an investigation the same way a failing test triggers a bug fix. If a deployment pushes LCP above the budget, you either optimize until it is back under, or you explicitly raise the budget with documented justification.

4. Error Triage

Not all errors are equal. A TypeError that fires once a month on an obscure browser is not worth the same attention as a ReferenceError that fires 200 times per day on the checkout page. Triage is the process of prioritizing errors by their impact.

Triage Dimensions

Frequency: How often does this error occur? Errors per day or per 1,000 pageviews gives you a normalized rate.
Impact: Which pages are affected? An error on the homepage or checkout page matters more than one on a rarely-visited help article.
Severity: Does the error block user action? A broken checkout form is critical. A console warning about a deprecated API is low priority.

Triage Workflow

┌──────────────────────────────────────────────────────┐ │ Error Report (Dashboard) │ │ │ │ Rank errors by: frequency x impact x severity │ └──────────────────────┬───────────────────────────────┘ │ v ┌──────────────────────────────────────────────────────┐ │ For each top error: │ │ │ │ 1. Read error_message and stack_trace │ │ 2. Identify affected pages (url column) │ │ 3. Check browser distribution (user_agent column) │ │ 4. Find first/last occurrence (server_timestamp) │ │ 5. Correlate with recent deployments │ └──────────────────────┬───────────────────────────────┘ │ v ┌──────────────────────────────────────────────────────┐ │ Decide: │ │ │ │ Critical --> Fix immediately, deploy hotfix │ │ High --> Schedule for current sprint │ │ Medium --> Add to backlog, fix when time allows │ │ Low --> Log and ignore (or suppress in report) │ └──────────────────────────────────────────────────────┘ │ v ┌──────────────────────────────────────────────────────┐ │ After fix: │ │ │ │ Monitor error count on dashboard │ │ Verify count drops to 0 within 24 hours │ │ If error persists, fix was incomplete │ └──────────────────────────────────────────────────────┘

Example Triage Table

Error	Count/Day	Page	Blocks User?	Priority
`TypeError: Cannot read properties of null (reading 'submit')`	200	/checkout	Yes	Critical
`ReferenceError: gtag is not defined`	85	All pages	No	Medium
`ResizeObserver loop completed with undelivered notifications`	340	All pages	No	Low (browser noise)
`SyntaxError: Unexpected token '<'`	12	/app	Yes	High

Do not chase every error. Some errors are browser noise that cannot be fixed (ResizeObserver warnings, third-party script failures, browser extensions injecting errors). Learn to recognize these and filter them out so your triage focuses on errors you can actually fix.

5. A/B Testing Concepts

Analytics data can power simple A/B tests. The idea is straightforward: serve variant A to some visitors and variant B to others, then compare the metrics between the two groups to determine which performs better.

How It Works

Split traffic: Use the session_id to deterministically assign visitors to groups. A simple approach: hash the session_id and check if the result is even or odd. Even = Group A, Odd = Group B.
Serve variants: Group A sees the original page. Group B sees the modified version (different headline, different layout, different CTA button).
Measure: Both groups generate analytics data (pageviews, performance, errors, events). Your dashboard already tracks all of this.
Compare: After enough traffic, query the reporting API filtered by group: "What is the bounce rate for Group A vs Group B?" or "What is the average pages-per-session for each group?"

// Simple A/B assignment using session_id hash
function getVariant(sessionId) {
    let hash = 0;
    for (let i = 0; i < sessionId.length; i++) {
        hash = ((hash << 5) - hash) + sessionId.charCodeAt(i);
        hash |= 0;  // Convert to 32-bit integer
    }
    return (Math.abs(hash) % 2 === 0) ? 'A' : 'B';
}

// Send variant as a custom event so it appears in analytics
const variant = getVariant(sessionId);
collector.trackEvent('ab_test', {
    test_name: 'pricing_page_redesign',
    variant: variant
});

Caution: Statistical Significance

Comparing two numbers is not enough. If Group A has a 4.2% conversion rate and Group B has a 4.5%, is that a real difference or random noise? Statistical significance requires a minimum sample size that depends on the baseline conversion rate and the minimum detectable effect you care about. For most web experiments, you need thousands of visitors per variant to reach confidence.

Full A/B testing is a course in itself. Topics like statistical power, multi-armed bandits, experiment duration, novelty effects, and Simpson's paradox are beyond the scope of this project. The point here is simpler: you already have the data infrastructure to support experiments. The collector captures events, the storage layer holds them, the API serves them, and the dashboard displays them. Adding proper A/B testing is an extension of the pipeline you built, not a separate system.

6. Alerting

Do not wait for someone to check the dashboard. Set thresholds, and when those thresholds are crossed, send a notification automatically. A dashboard that nobody checks is the same as no dashboard at all.

What to Alert On

Error rate: If JavaScript errors exceed X per hour, something broke in production
Performance regression: If average load time exceeds Y ms, a recent deployment may have introduced a bottleneck
Traffic anomaly: If pageviews drop more than Z% from the same hour last week, the site may be down or a critical page may be broken
Zero traffic: If zero pageviews are recorded for 30 minutes during business hours, the collector or server may be down

Implementation: Cron + Summary Query

The simplest alerting system is a cron job that runs a summary query against the database every N minutes and sends a notification if a threshold is exceeded:

#!/bin/bash
# alert-check.sh - run via cron every 15 minutes
# crontab: */15 * * * * /path/to/alert-check.sh

THRESHOLD_ERRORS=50
THRESHOLD_LOAD_MS=5000

# Count errors in the last 15 minutes
ERROR_COUNT=$(mysql -N -e "
    SELECT COUNT(*) FROM errors
    WHERE server_timestamp > NOW() - INTERVAL 15 MINUTE
" analytics_db)

# Average load time in the last 15 minutes
AVG_LOAD=$(mysql -N -e "
    SELECT COALESCE(ROUND(AVG(load_time)), 0) FROM performance
    WHERE server_timestamp > NOW() - INTERVAL 15 MINUTE
" analytics_db)

# Check thresholds and alert
if [ "$ERROR_COUNT" -gt "$THRESHOLD_ERRORS" ]; then
    curl -X POST https://hooks.slack.com/your-webhook \
        -d "{\"text\": \"ALERT: $ERROR_COUNT errors in last 15 min (threshold: $THRESHOLD_ERRORS)\"}"
fi

if [ "$AVG_LOAD" -gt "$THRESHOLD_LOAD_MS" ]; then
    curl -X POST https://hooks.slack.com/your-webhook \
        -d "{\"text\": \"ALERT: Avg load time ${AVG_LOAD}ms in last 15 min (threshold: ${THRESHOLD_LOAD_MS}ms)\"}"
fi

Start simple. A cron job and a Slack webhook is a legitimate alerting system. Do not over-engineer this with Prometheus, Grafana Alerting, and PagerDuty before you have defined what you actually want to alert on. Get the thresholds right first with a simple script, then migrate to a more robust tool when the simple approach hits its limits.

7. Continuous Improvement Cycle

Analytics is not a one-time setup. It is an ongoing feedback loop. Each sprint, each release, each deployment should include a check against the dashboard: did performance improve? Did error rates drop? Are users engaging differently?

┌─────────┐ │ MEASURE │ │ │ │ Collect │ │ data via│ │ pipeline│ └────┬────┘ │ v ┌─────────┐ ┌─────────┐ │ ACT │ │ ANALYZE │ │ │ │ │ │ Deploy │ │ Review │ │ changes,│ │dashboard│ │ ship │ │ trends, │ │ fixes │ │ reports │ └────┬────┘ └────┬────┘ │ │ │ v │ ┌─────────┐ │ │ DECIDE │ └─────────│ │ │ Set │ │priority,│ │ choose │ │ action │ └─────────┘ Measure --> Analyze --> Decide --> Act --> Measure The cycle never stops.

What to Check After Each Release

Performance: Did LCP, CLS, or INP change? Compare the 7 days before the release to the 7 days after.
Errors: Did any new error messages appear? Did existing error counts spike?
Traffic patterns: Did bounce rate change on affected pages? Did pageviews per session shift?
Regressions: If any metric got worse, investigate immediately. Do not wait for the next sprint.

The improvement cycle turns analytics from a passive monitoring tool into an active engineering practice. It is the difference between having a dashboard and using a dashboard.

8. Future Direction: OpenTelemetry

The custom pipeline you built maps directly to industry standards. OpenTelemetry (OTel) is an open-source observability framework that provides standardized collector SDKs, a transport protocol (OTLP), and integrations with popular backends. It is becoming the industry standard for collecting and transmitting telemetry data.

Your Pipeline vs OpenTelemetry

Your Component	OTel Equivalent
`collector.js` (browser SDK)	OTel Browser SDK (`@opentelemetry/sdk-trace-web`)
`POST /collect` (ingestion endpoint)	OTLP/HTTP (standardized transport protocol)
Server processing (validation, enrichment)	OTel Collector (processors, exporters, pipelines)
MySQL (storage)	Tempo (traces), Prometheus (metrics), Loki (logs)
Reporting API (JSON endpoints)	PromQL / TraceQL (query languages for backends)
Dashboard (charts and tables)	Grafana (open-source dashboarding)

OTel is the future of observability, but understanding the fundamentals makes you effective with any tool. You know what a collector does because you built one. You know what server-side enrichment means because you wrote the middleware. You understand schema design because you created the tables. When you encounter OpenTelemetry, Datadog, or New Relic in a production environment, you will understand what they are doing under the hood — and why.

9. What You Built

Over six phases, you built a complete, production-architecture analytics system from scratch. Here is the full pipeline, end to end:

Phase 1 Phase 2 Phase 3 ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ Collector │ │ Server │ │ Storage │ │ │ │ Processing │ │ │ │ collector.js │────>│ Node / PHP │────>│ MySQL │ │ │ │ │ │ │ │ pageviews │ │ validate │ │ 6 tables │ │ performance │ │ enrich │ │ indexes │ │ errors │ │ sessionize │ │ partitions │ │ events │ │ store │ │ retention │ └──────────────┘ └──────────────┘ └──────────────┘ │ Phase 6 Phase 5 Phase 4 ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ Decisions │ │ Dashboard │ │ Reporting │ │ │ │ │ │ API │ │ performance │<────│ SPA with │<────│ │ │ budgets │ │ charts, │ │ auth, roles │ │ error triage │ │ tables, │ │ JSON endpts │ │ alerting │ │ filters │ │ rate limits │ │ improvement │ │ │ │ │ └──────────────┘ └──────────────┘ └──────────────┘

What Each Phase Produced

A JavaScript collector that captures pageviews, performance timing, Core Web Vitals, JavaScript errors, and custom events — all from the browser, with no third-party dependencies
Server-side endpoints (Node.js + PHP) that validate incoming beacons, reject malformed data, enrich payloads with server-derived fields, and assign session IDs
A MySQL schema with 6 tables, carefully chosen data types, composite indexes for analytics queries, and date-range partitioning for retention management
A reporting API with session-based authentication, role-based access control, rate limiting, and consistent JSON response formatting
A dashboard built as a single-page application with summary cards, line charts, bar charts, data tables, date-range filtering, and XSS-safe DOM rendering
The knowledge to use data for engineering decisions — performance budgets, error triage, alerting, A/B testing concepts, and a continuous improvement cycle

You now understand every layer of a production analytics system. Whether you use Google Analytics, Plausible, PostHog, or build your own, you know what is happening under the hood. You know why the collector batches requests, why the server enriches data instead of trusting the client, why the schema uses DATETIME instead of TIMESTAMP, why the API returns JSON and never HTML, and why the dashboard uses textContent instead of innerHTML. These are not tool-specific skills. They are engineering fundamentals that transfer to any analytics platform, any observability stack, and any data pipeline you will encounter in your career.