Module 08: Configuration API

In Modules 01–07, the collector was a script that just runs — load it with a <script> tag, and it automatically collects data and sends beacons. That works for simple use cases, but real analytics libraries need configuration: Which endpoint should I send to? Which features should I enable? What sampling rate? Should I log to the console for debugging?

This module transforms the collector from a script that runs into a library that is configured. The public API gives the page author control over what gets collected, how it gets sent, and what custom data is attached.

Demo Files

collector-v7.js — IIFE-based collector with configuration API
test.html — Full API demo: init, track, set, identify

Run: Open test.html and experiment with the API. Use the buttons to call track(), set(), and identify(). Toggle debug mode to see payloads logged to the console instead of sent over the network.

From Script to Library

In every previous module, the collector followed the same pattern: an IIFE that runs immediately, collects data, and sends it. The page author has no say in what happens — they include the script, and it does its thing. This is fine for a learning exercise, but it is a dead end for production use.

Real analytics tools — Google Analytics, Segment, Amplitude, Plausible — all follow the same model: load the library, then configure it. The library does nothing until you call init() with your settings. After initialization, you have methods to send custom events (track), attach persistent properties (set), and link sessions to users (identify).

Before (Modules 01-07): After (Module 08): +--------------------------+ +--------------------------+ | <script src= | | <script src= | | "collector.js"> | | "collector.js"> | | | | | | // runs immediately | | collector.init({ | | // collects everything | | endpoint: '/collect', | | // sends automatically | | sampleRate: 0.5, | | | | debug: true | +--------------------------+ | }); | | | No control. No custom events. | collector.track( | No user identity. No sampling. | 'signup', { plan: 1 } | | ); | | | | collector.identify( | | 'user-abc-123' | | ); | +--------------------------+

The IIFE Pattern (Revealing Module)

The collector uses an Immediately Invoked Function Expression (IIFE), also called the Revealing Module pattern. The idea is simple: wrap all your code in a function that executes once, keep everything private inside that function scope, and return only the public API methods as an object.

This is the structural skeleton of the entire collector:

const collector = (function() {
  'use strict';

  // Private state -- not accessible outside
  let config = {};
  let initialized = false;
  const globalProps = {};

  // Private functions
  function send(payload) { /* ... */ }
  function buildPayload(eventName) { /* ... */ }
  function getSessionId() { /* ... */ }
  function getTechnographics() { /* ... */ }
  function getNavigationTiming() { /* ... */ }
  // ... all the internal machinery

  // Public API -- returned at the end
  return {
    init:     function(options) { /* ... */ },
    track:    function(event, data) { /* ... */ },
    set:      function(key, value) { /* ... */ },
    identify: function(userId) { /* ... */ }
  };
})();

The outer function runs once (the () at the end invokes it immediately). The const collector = captures whatever the function returns — in this case, an object with four methods. Everything else (the config, the state, the helper functions) lives in the function's closure and is completely inaccessible from outside. You cannot read collector.config or call collector.send() from the page — only init, track, set, and identify are exposed.

Why does this matter? Two reasons:

Encapsulation. Nothing can accidentally (or maliciously) modify the collector's internal state. The session ID, the config, the beacon queue — all protected.
No global pollution. The only global variable is collector. All the internal function names (send, buildPayload, round, etc.) are scoped to the IIFE and will never collide with other scripts on the page.

The init() Method

The init() method is the entry point. It accepts an options object, merges it with sensible defaults, performs sampling, and starts automatic collection based on the enabled feature flags.

const defaults = {
  endpoint: '/collect',
  enableTechnographics: true,
  enableTiming: true,
  enableVitals: true,
  enableErrors: true,
  sampleRate: 1.0,      // 1.0 = 100% of sessions
  debug: false           // true = log to console instead of sending
};

function init(options) {
  if (initialized) {
    warn('collector.init() called more than once');
    return;
  }

  // Merge user options with defaults
  config = {};
  for (const key of Object.keys(defaults)) {
    config[key] = (options && options[key] !== undefined)
      ? options[key]
      : defaults[key];
  }

  // Sampling: decide once per session whether to collect
  if (!shouldSample()) {
    log(`Session not sampled (rate: ${config.sampleRate})`);
    return;
  }

  initialized = true;

  // Start automatic collection based on config
  if (config.enableErrors) initErrorTracking();
  if (config.enableVitals) initVitalsObservers();

  // Fire pageview beacon after load
  window.addEventListener('load', () => {
    setTimeout(() => {
      const payload = buildPayload('pageview');
      if (config.enableTiming) payload.timing = getNavigationTiming();
      if (config.enableTiming) payload.resources = getResourceSummary();
      if (config.enableTechnographics) payload.technographics = getTechnographics();
      send(payload);
    }, 0);
  });

  log('Collector initialized', config);
}

Key design decisions:

Single initialization. Calling init() twice logs a warning and returns. This prevents accidental double-initialization from multiple script tags or SPAs re-rendering.
Defaults-first merge. Every config key has a default. The page author only needs to override the values they care about. If they pass { debug: true }, everything else stays at the default.
Early exit on sampling. If the session is not sampled, init() returns without setting initialized = true. This means track(), set(), and identify() will all be no-ops for this session — zero overhead.
Feature flags gate features. Each subsystem (errors, vitals, timing, technographics) is only started if its flag is true. This lets you deploy the collector with some features disabled while you build confidence.

Sampling

Sampling controls what percentage of sessions actually collect data. At high traffic volumes, you do not need (or want) 100% of sessions reporting — the storage costs add up and the marginal value of each additional session shrinks.

function shouldSample() {
  // Check session storage first -- sampling is per-session, not per-page
  const sampled = sessionStorage.getItem('_collector_sampled');
  if (sampled !== null) return sampled === 'true';

  // Roll the dice once per session
  const result = Math.random() < config.sampleRate;
  sessionStorage.setItem('_collector_sampled', String(result));
  return result;
}

A sampleRate of 0.5 means 50% of sessions collect data. The critical detail is that the sampling decision is made once per session and stored in sessionStorage. If a user navigates across 10 pages in a single session, they are either tracked on all 10 pages or none of them. You never get half-sessions — that would create misleading funnel data and broken session replays.

The decision is stored in sessionStorage rather than localStorage because sampling should reset when the user starts a new session (closes the tab and comes back later). This means your effective data volume is: total sessions × sampleRate.

The track() Method

While the collector automatically sends a pageview beacon on load, most interesting analytics come from custom events — things the user does after the page loads. The track() method sends a named event with optional data:

function track(eventName, data) {
  if (!initialized) {
    warn('collector.track() called before init()');
    return;
  }
  const payload = buildPayload(eventName);
  if (data) payload.data = data;
  send(payload);
}

The guard check on initialized protects against two cases: the page author forgot to call init(), or this session was not sampled (in which case initialized is false). Either way, track() silently no-ops with a console warning.

Usage examples:

collector.track('button_click', { id: 'signup-cta', text: 'Sign Up Free' });
collector.track('form_submit', { form: 'newsletter', email_provided: true });
collector.track('video_play', { id: 'intro', duration: 120 });
collector.track('search', { query: 'pricing', results: 12 });

The event name is a string you define. The data object can contain any JSON-serializable properties. Together, they let you instrument any user interaction without modifying the collector itself.

The set() Method

Sometimes you want a property to be included in every beacon, not just one event. The set() method adds key-value pairs to a global properties object that gets merged into every payload:

function set(key, value) {
  globalProps[key] = value;
}

Global properties are merged into the payload by buildPayload():

function buildPayload(eventName) {
  const payload = {
    url: window.location.href,
    title: document.title,
    referrer: document.referrer,
    timestamp: new Date().toISOString(),
    type: eventName,
    session: getSessionId()
  };
  // Merge global properties
  for (const k of Object.keys(globalProps)) {
    payload[k] = globalProps[k];
  }
  return payload;
}

Usage:

collector.set('environment', 'production');
collector.set('version', '2.1.0');
collector.set('plan', 'enterprise');

// Now EVERY beacon includes environment, version, and plan
// -- the automatic pageview, every track() call, everything

This is useful for segmentation. When you analyze your data later, you can filter beacons by environment, app version, pricing plan, A/B test group, or any other dimension you set globally.

The identify() Method

The identify() method links the current analytics session to an authenticated user:

function identify(userId) {
  globalProps.userId = userId;
  log('User identified:', userId);
}

Why its own method? Functionally, identify(userId) is just a convenience wrapper around set('userId', userId). But it is important enough to warrant its own method. Linking analytics sessions to authenticated users is one of the most valuable things you can do — it lets you answer questions like "How many pages did user X visit before converting?" and "What features do paying users actually use?" Every major analytics platform (Segment, Amplitude, Mixpanel) has an identify() method for exactly this reason.

Typical usage pattern:

// On page load (before login)
collector.init({ endpoint: '/collect' });

// After the user logs in
collector.identify('user-abc-123');

// From this point on, every beacon includes userId: 'user-abc-123'
collector.track('dashboard_view', { widgets: 5 });

Debug Mode

During development, you want to see what the collector is doing without sending real data to your analytics endpoint. Debug mode logs everything to the console instead of sending network requests:

function log(...args) {
  if (config.debug) {
    console.log('[Collector]', ...args);
  }
}

function warn(...args) {
  console.warn('[Collector]', ...args);
}

function send(payload) {
  if (config.debug) {
    console.log('[Collector] Would send:', payload);
    return; // Don't actually send in debug mode
  }

  // ... actual sendBeacon/fetch logic
  const blob = new Blob(
    [JSON.stringify(payload)],
    { type: 'application/json' }
  );
  if (navigator.sendBeacon) {
    navigator.sendBeacon(config.endpoint, blob);
  } else {
    fetch(config.endpoint, {
      method: 'POST',
      body: blob,
      keepalive: true
    }).catch((err) => {
      warn('Send failed:', err.message);
    });
  }
}

The log() function only outputs when config.debug is true. The warn() function always outputs — warnings indicate bugs that the developer should fix regardless of mode. The send() function checks debug mode and returns early, so no network requests are made during development.

Enable debug mode in your init() call:

collector.init({
  endpoint: '/collect',
  debug: true    // Logs to console, no network requests
});

Cross-Reference: The Reference Collector

Reference Implementation: The reference collector.js uses a config object (line 18) with feature flags (isResourceTiming, isElementTiming, maxTime). Our version extends this concept into a full public API with init(), track(), set(), and identify(). The reference collector's reportPerf() function (line 436) and pushTask() (line 425) handle the sending — our send() function serves the same purpose but with the cascading sendBeacon → fetch fallback we built in Module 04. The key difference: our version makes configuration explicit and gives the page author control, while the reference collector hard-codes its configuration internally.

Summary

IIFE pattern keeps everything private except the four public API methods
init() configures the collector with sensible defaults and starts automatic collection
track() sends custom events with arbitrary data attached
set() attaches properties to every future beacon (useful for segmentation)
identify() links analytics sessions to authenticated users
Sampling is per-session (decided once, stored in sessionStorage) — no half-sessions
Debug mode logs payloads to the console instead of sending them over the network
This transforms the collector from a script that runs into a configurable library with a clean public interface

← Previous: Error Tracking Next: Extensions & Plugins →