Making Decisions with Data

You have built the entire pipeline — collector, ingestion, storage, API, and dashboard. Now comes the part that justifies all of it: using the data to make better engineering decisions. Analytics is not about collecting data; it is about acting on it.

1. The Purpose of Analytics

Analytics exists to answer questions and drive action. If no one looks at the dashboard or changes behavior because of it, the entire pipeline is wasted engineering effort. Every metric should connect to a decision.

Before you collect a single data point, you should be able to answer: what will I do differently if this number goes up? What will I do if it goes down? If you cannot answer that question, you do not need that metric.

Consider the pipeline you built:

All of that infrastructure exists to support one thing: a human making a better decision than they would have made without the data. The decision might be "we need to optimize images on the landing page" or "we should fix the TypeError that is hitting 50 users per day" or "the redesign improved bounce rate by 12%, keep it." Without data, these are guesses. With data, they are informed engineering choices.

2. Actionable vs Vanity Metrics

Vanity metrics look impressive in reports but do not drive decisions. They go up and to the right, which feels good, but they do not tell you what to change. Actionable metrics connect directly to specific actions you can take.

Examples of Actionable Metrics

Vanity vs Actionable

Vanity Metric Actionable Alternative
Total pageviews (all time) Pageviews per session trend (week over week)
Total users ever New sessions this week vs last week
Total errors logged Error rate per 1,000 pageviews (trending up or down?)
Average load time (all time) p75 load time this week vs last week
Total pages on the site Pages with zero views in the last 30 days
Number of API endpoints API error rate by endpoint (which ones are failing?)

The pattern is clear: actionable metrics have a time dimension and a comparison. "Total pageviews" is a number. "Pageviews per session, this week vs last week" is a signal that tells you whether engagement is improving or declining. One is a trophy; the other is a compass.

The test for actionable metrics: If the number changes, do you know what to do next? If the answer is "celebrate" or "worry" but not a specific action, it is a vanity metric.

3. Performance Budgets

A performance budget is a set of quantitative thresholds that your site must not exceed. Use your analytics data to set realistic budgets and your dashboard to monitor compliance. The Performance Budget Calculator can help you define these targets.

Setting Budgets from Real Data

Do not invent budgets from thin air. Use your analytics baseline:

  1. Query your performance table for the current p75 values of each metric
  2. Compare against Google's Web Vitals thresholds (LCP < 2500ms, CLS < 0.1, INP < 200ms)
  3. Set your budget at the better of: your current p75 or the "good" threshold
  4. Track progress weekly on the dashboard

Example

"Our p75 LCP is 3200ms. The budget is 2500ms. We need to optimize images on the landing page." That is a complete decision chain: metric (LCP), current value (3200ms), target (2500ms), action (optimize images), and the page to focus on (landing page). The dashboard shows whether you are hitting your targets after each change.

Metric Current p75 Budget Status
LCP 3200ms 2500ms Over budget — action needed
CLS 0.05 0.1 Under budget — healthy
INP 180ms 200ms Under budget — but close, monitor
Load Time 4100ms 3000ms Over budget — investigate
Budgets are not aspirational goals. They are hard limits. When you exceed a budget, it triggers an investigation the same way a failing test triggers a bug fix. If a deployment pushes LCP above the budget, you either optimize until it is back under, or you explicitly raise the budget with documented justification.

4. Error Triage

Not all errors are equal. A TypeError that fires once a month on an obscure browser is not worth the same attention as a ReferenceError that fires 200 times per day on the checkout page. Triage is the process of prioritizing errors by their impact.

Triage Dimensions

Triage Workflow

┌──────────────────────────────────────────────────────┐ │ Error Report (Dashboard) │ │ │ │ Rank errors by: frequency x impact x severity │ └──────────────────────┬───────────────────────────────┘ │ v ┌──────────────────────────────────────────────────────┐ │ For each top error: │ │ │ │ 1. Read error_message and stack_trace │ │ 2. Identify affected pages (url column) │ │ 3. Check browser distribution (user_agent column) │ │ 4. Find first/last occurrence (server_timestamp) │ │ 5. Correlate with recent deployments │ └──────────────────────┬───────────────────────────────┘ │ v ┌──────────────────────────────────────────────────────┐ │ Decide: │ │ │ │ Critical --> Fix immediately, deploy hotfix │ │ High --> Schedule for current sprint │ │ Medium --> Add to backlog, fix when time allows │ │ Low --> Log and ignore (or suppress in report) │ └──────────────────────────────────────────────────────┘ │ v ┌──────────────────────────────────────────────────────┐ │ After fix: │ │ │ │ Monitor error count on dashboard │ │ Verify count drops to 0 within 24 hours │ │ If error persists, fix was incomplete │ └──────────────────────────────────────────────────────┘

Example Triage Table

Error Count/Day Page Blocks User? Priority
TypeError: Cannot read properties of null (reading 'submit') 200 /checkout Yes Critical
ReferenceError: gtag is not defined 85 All pages No Medium
ResizeObserver loop completed with undelivered notifications 340 All pages No Low (browser noise)
SyntaxError: Unexpected token '<' 12 /app Yes High
Do not chase every error. Some errors are browser noise that cannot be fixed (ResizeObserver warnings, third-party script failures, browser extensions injecting errors). Learn to recognize these and filter them out so your triage focuses on errors you can actually fix.

5. A/B Testing Concepts

Analytics data can power simple A/B tests. The idea is straightforward: serve variant A to some visitors and variant B to others, then compare the metrics between the two groups to determine which performs better.

How It Works

  1. Split traffic: Use the session_id to deterministically assign visitors to groups. A simple approach: hash the session_id and check if the result is even or odd. Even = Group A, Odd = Group B.
  2. Serve variants: Group A sees the original page. Group B sees the modified version (different headline, different layout, different CTA button).
  3. Measure: Both groups generate analytics data (pageviews, performance, errors, events). Your dashboard already tracks all of this.
  4. Compare: After enough traffic, query the reporting API filtered by group: "What is the bounce rate for Group A vs Group B?" or "What is the average pages-per-session for each group?"
// Simple A/B assignment using session_id hash
function getVariant(sessionId) {
    let hash = 0;
    for (let i = 0; i < sessionId.length; i++) {
        hash = ((hash << 5) - hash) + sessionId.charCodeAt(i);
        hash |= 0;  // Convert to 32-bit integer
    }
    return (Math.abs(hash) % 2 === 0) ? 'A' : 'B';
}

// Send variant as a custom event so it appears in analytics
const variant = getVariant(sessionId);
collector.trackEvent('ab_test', {
    test_name: 'pricing_page_redesign',
    variant: variant
});

Caution: Statistical Significance

Comparing two numbers is not enough. If Group A has a 4.2% conversion rate and Group B has a 4.5%, is that a real difference or random noise? Statistical significance requires a minimum sample size that depends on the baseline conversion rate and the minimum detectable effect you care about. For most web experiments, you need thousands of visitors per variant to reach confidence.

Full A/B testing is a course in itself. Topics like statistical power, multi-armed bandits, experiment duration, novelty effects, and Simpson's paradox are beyond the scope of this project. The point here is simpler: you already have the data infrastructure to support experiments. The collector captures events, the storage layer holds them, the API serves them, and the dashboard displays them. Adding proper A/B testing is an extension of the pipeline you built, not a separate system.

6. Alerting

Do not wait for someone to check the dashboard. Set thresholds, and when those thresholds are crossed, send a notification automatically. A dashboard that nobody checks is the same as no dashboard at all.

What to Alert On

Implementation: Cron + Summary Query

The simplest alerting system is a cron job that runs a summary query against the database every N minutes and sends a notification if a threshold is exceeded:

#!/bin/bash
# alert-check.sh - run via cron every 15 minutes
# crontab: */15 * * * * /path/to/alert-check.sh

THRESHOLD_ERRORS=50
THRESHOLD_LOAD_MS=5000

# Count errors in the last 15 minutes
ERROR_COUNT=$(mysql -N -e "
    SELECT COUNT(*) FROM errors
    WHERE server_timestamp > NOW() - INTERVAL 15 MINUTE
" analytics_db)

# Average load time in the last 15 minutes
AVG_LOAD=$(mysql -N -e "
    SELECT COALESCE(ROUND(AVG(load_time)), 0) FROM performance
    WHERE server_timestamp > NOW() - INTERVAL 15 MINUTE
" analytics_db)

# Check thresholds and alert
if [ "$ERROR_COUNT" -gt "$THRESHOLD_ERRORS" ]; then
    curl -X POST https://hooks.slack.com/your-webhook \
        -d "{\"text\": \"ALERT: $ERROR_COUNT errors in last 15 min (threshold: $THRESHOLD_ERRORS)\"}"
fi

if [ "$AVG_LOAD" -gt "$THRESHOLD_LOAD_MS" ]; then
    curl -X POST https://hooks.slack.com/your-webhook \
        -d "{\"text\": \"ALERT: Avg load time ${AVG_LOAD}ms in last 15 min (threshold: ${THRESHOLD_LOAD_MS}ms)\"}"
fi
Start simple. A cron job and a Slack webhook is a legitimate alerting system. Do not over-engineer this with Prometheus, Grafana Alerting, and PagerDuty before you have defined what you actually want to alert on. Get the thresholds right first with a simple script, then migrate to a more robust tool when the simple approach hits its limits.

7. Continuous Improvement Cycle

Analytics is not a one-time setup. It is an ongoing feedback loop. Each sprint, each release, each deployment should include a check against the dashboard: did performance improve? Did error rates drop? Are users engaging differently?

┌─────────┐ │ MEASURE │ │ │ │ Collect │ │ data via│ │ pipeline│ └────┬────┘ │ v ┌─────────┐ ┌─────────┐ │ ACT │ │ ANALYZE │ │ │ │ │ │ Deploy │ │ Review │ │ changes,│ │dashboard│ │ ship │ │ trends, │ │ fixes │ │ reports │ └────┬────┘ └────┬────┘ │ │ │ v │ ┌─────────┐ │ │ DECIDE │ └─────────│ │ │ Set │ │priority,│ │ choose │ │ action │ └─────────┘ Measure --> Analyze --> Decide --> Act --> Measure The cycle never stops.

What to Check After Each Release

The improvement cycle turns analytics from a passive monitoring tool into an active engineering practice. It is the difference between having a dashboard and using a dashboard.

8. Future Direction: OpenTelemetry

The custom pipeline you built maps directly to industry standards. OpenTelemetry (OTel) is an open-source observability framework that provides standardized collector SDKs, a transport protocol (OTLP), and integrations with popular backends. It is becoming the industry standard for collecting and transmitting telemetry data.

Your Pipeline vs OpenTelemetry

Your Component OTel Equivalent
collector.js (browser SDK) OTel Browser SDK (@opentelemetry/sdk-trace-web)
POST /collect (ingestion endpoint) OTLP/HTTP (standardized transport protocol)
Server processing (validation, enrichment) OTel Collector (processors, exporters, pipelines)
MySQL (storage) Tempo (traces), Prometheus (metrics), Loki (logs)
Reporting API (JSON endpoints) PromQL / TraceQL (query languages for backends)
Dashboard (charts and tables) Grafana (open-source dashboarding)
OTel is the future of observability, but understanding the fundamentals makes you effective with any tool. You know what a collector does because you built one. You know what server-side enrichment means because you wrote the middleware. You understand schema design because you created the tables. When you encounter OpenTelemetry, Datadog, or New Relic in a production environment, you will understand what they are doing under the hood — and why.

9. What You Built

Over six phases, you built a complete, production-architecture analytics system from scratch. Here is the full pipeline, end to end:

Phase 1 Phase 2 Phase 3 ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ Collector │ │ Server │ │ Storage │ │ │ │ Processing │ │ │ │ collector.js │────>│ Node / PHP │────>│ MySQL │ │ │ │ │ │ │ │ pageviews │ │ validate │ │ 6 tables │ │ performance │ │ enrich │ │ indexes │ │ errors │ │ sessionize │ │ partitions │ │ events │ │ store │ │ retention │ └──────────────┘ └──────────────┘ └──────────────┘ │ Phase 6 Phase 5 Phase 4 ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ Decisions │ │ Dashboard │ │ Reporting │ │ │ │ │ │ API │ │ performance │<────│ SPA with │<────│ │ │ budgets │ │ charts, │ │ auth, roles │ │ error triage │ │ tables, │ │ JSON endpts │ │ alerting │ │ filters │ │ rate limits │ │ improvement │ │ │ │ │ └──────────────┘ └──────────────┘ └──────────────┘

What Each Phase Produced

You now understand every layer of a production analytics system. Whether you use Google Analytics, Plausible, PostHog, or build your own, you know what is happening under the hood. You know why the collector batches requests, why the server enriches data instead of trusting the client, why the schema uses DATETIME instead of TIMESTAMP, why the API returns JSON and never HTML, and why the dashboard uses textContent instead of innerHTML. These are not tool-specific skills. They are engineering fundamentals that transfer to any analytics platform, any observability stack, and any data pipeline you will encounter in your career.