In Modules 01–07, the collector was a script that just runs — load it with a <script> tag, and it automatically collects data and sends beacons. That works for simple use cases, but real analytics libraries need configuration: Which endpoint should I send to? Which features should I enable? What sampling rate? Should I log to the console for debugging?
This module transforms the collector from a script that runs into a library that is configured. The public API gives the page author control over what gets collected, how it gets sent, and what custom data is attached.
Run: Open test.html and experiment with the API. Use the buttons to call track(), set(), and identify(). Toggle debug mode to see payloads logged to the console instead of sent over the network.
In every previous module, the collector followed the same pattern: an IIFE that runs immediately, collects data, and sends it. The page author has no say in what happens — they include the script, and it does its thing. This is fine for a learning exercise, but it is a dead end for production use.
Real analytics tools — Google Analytics, Segment, Amplitude, Plausible — all follow the same model: load the library, then configure it. The library does nothing until you call init() with your settings. After initialization, you have methods to send custom events (track), attach persistent properties (set), and link sessions to users (identify).
The collector uses an Immediately Invoked Function Expression (IIFE), also called the Revealing Module pattern. The idea is simple: wrap all your code in a function that executes once, keep everything private inside that function scope, and return only the public API methods as an object.
This is the structural skeleton of the entire collector:
const collector = (function() {
'use strict';
// Private state -- not accessible outside
let config = {};
let initialized = false;
const globalProps = {};
// Private functions
function send(payload) { /* ... */ }
function buildPayload(eventName) { /* ... */ }
function getSessionId() { /* ... */ }
function getTechnographics() { /* ... */ }
function getNavigationTiming() { /* ... */ }
// ... all the internal machinery
// Public API -- returned at the end
return {
init: function(options) { /* ... */ },
track: function(event, data) { /* ... */ },
set: function(key, value) { /* ... */ },
identify: function(userId) { /* ... */ }
};
})();
The outer function runs once (the () at the end invokes it immediately). The const collector = captures whatever the function returns — in this case, an object with four methods. Everything else (the config, the state, the helper functions) lives in the function's closure and is completely inaccessible from outside. You cannot read collector.config or call collector.send() from the page — only init, track, set, and identify are exposed.
Why does this matter? Two reasons:
collector. All the internal function names (send, buildPayload, round, etc.) are scoped to the IIFE and will never collide with other scripts on the page.The init() method is the entry point. It accepts an options object, merges it with sensible defaults, performs sampling, and starts automatic collection based on the enabled feature flags.
const defaults = {
endpoint: '/collect',
enableTechnographics: true,
enableTiming: true,
enableVitals: true,
enableErrors: true,
sampleRate: 1.0, // 1.0 = 100% of sessions
debug: false // true = log to console instead of sending
};
function init(options) {
if (initialized) {
warn('collector.init() called more than once');
return;
}
// Merge user options with defaults
config = {};
for (const key of Object.keys(defaults)) {
config[key] = (options && options[key] !== undefined)
? options[key]
: defaults[key];
}
// Sampling: decide once per session whether to collect
if (!shouldSample()) {
log(`Session not sampled (rate: ${config.sampleRate})`);
return;
}
initialized = true;
// Start automatic collection based on config
if (config.enableErrors) initErrorTracking();
if (config.enableVitals) initVitalsObservers();
// Fire pageview beacon after load
window.addEventListener('load', () => {
setTimeout(() => {
const payload = buildPayload('pageview');
if (config.enableTiming) payload.timing = getNavigationTiming();
if (config.enableTiming) payload.resources = getResourceSummary();
if (config.enableTechnographics) payload.technographics = getTechnographics();
send(payload);
}, 0);
});
log('Collector initialized', config);
}
Key design decisions:
init() twice logs a warning and returns. This prevents accidental double-initialization from multiple script tags or SPAs re-rendering.{ debug: true }, everything else stays at the default.init() returns without setting initialized = true. This means track(), set(), and identify() will all be no-ops for this session — zero overhead.true. This lets you deploy the collector with some features disabled while you build confidence.Sampling controls what percentage of sessions actually collect data. At high traffic volumes, you do not need (or want) 100% of sessions reporting — the storage costs add up and the marginal value of each additional session shrinks.
function shouldSample() {
// Check session storage first -- sampling is per-session, not per-page
const sampled = sessionStorage.getItem('_collector_sampled');
if (sampled !== null) return sampled === 'true';
// Roll the dice once per session
const result = Math.random() < config.sampleRate;
sessionStorage.setItem('_collector_sampled', String(result));
return result;
}
A sampleRate of 0.5 means 50% of sessions collect data. The critical detail is that the sampling decision is made once per session and stored in sessionStorage. If a user navigates across 10 pages in a single session, they are either tracked on all 10 pages or none of them. You never get half-sessions — that would create misleading funnel data and broken session replays.
The decision is stored in sessionStorage rather than localStorage because sampling should reset when the user starts a new session (closes the tab and comes back later). This means your effective data volume is: total sessions × sampleRate.
While the collector automatically sends a pageview beacon on load, most interesting analytics come from custom events — things the user does after the page loads. The track() method sends a named event with optional data:
function track(eventName, data) {
if (!initialized) {
warn('collector.track() called before init()');
return;
}
const payload = buildPayload(eventName);
if (data) payload.data = data;
send(payload);
}
The guard check on initialized protects against two cases: the page author forgot to call init(), or this session was not sampled (in which case initialized is false). Either way, track() silently no-ops with a console warning.
Usage examples:
collector.track('button_click', { id: 'signup-cta', text: 'Sign Up Free' });
collector.track('form_submit', { form: 'newsletter', email_provided: true });
collector.track('video_play', { id: 'intro', duration: 120 });
collector.track('search', { query: 'pricing', results: 12 });
The event name is a string you define. The data object can contain any JSON-serializable properties. Together, they let you instrument any user interaction without modifying the collector itself.
Sometimes you want a property to be included in every beacon, not just one event. The set() method adds key-value pairs to a global properties object that gets merged into every payload:
function set(key, value) {
globalProps[key] = value;
}
Global properties are merged into the payload by buildPayload():
function buildPayload(eventName) {
const payload = {
url: window.location.href,
title: document.title,
referrer: document.referrer,
timestamp: new Date().toISOString(),
type: eventName,
session: getSessionId()
};
// Merge global properties
for (const k of Object.keys(globalProps)) {
payload[k] = globalProps[k];
}
return payload;
}
Usage:
collector.set('environment', 'production');
collector.set('version', '2.1.0');
collector.set('plan', 'enterprise');
// Now EVERY beacon includes environment, version, and plan
// -- the automatic pageview, every track() call, everything
This is useful for segmentation. When you analyze your data later, you can filter beacons by environment, app version, pricing plan, A/B test group, or any other dimension you set globally.
The identify() method links the current analytics session to an authenticated user:
function identify(userId) {
globalProps.userId = userId;
log('User identified:', userId);
}
identify(userId) is just a convenience wrapper around set('userId', userId). But it is important enough to warrant its own method. Linking analytics sessions to authenticated users is one of the most valuable things you can do — it lets you answer questions like "How many pages did user X visit before converting?" and "What features do paying users actually use?" Every major analytics platform (Segment, Amplitude, Mixpanel) has an identify() method for exactly this reason.
Typical usage pattern:
// On page load (before login)
collector.init({ endpoint: '/collect' });
// After the user logs in
collector.identify('user-abc-123');
// From this point on, every beacon includes userId: 'user-abc-123'
collector.track('dashboard_view', { widgets: 5 });
During development, you want to see what the collector is doing without sending real data to your analytics endpoint. Debug mode logs everything to the console instead of sending network requests:
function log(...args) {
if (config.debug) {
console.log('[Collector]', ...args);
}
}
function warn(...args) {
console.warn('[Collector]', ...args);
}
function send(payload) {
if (config.debug) {
console.log('[Collector] Would send:', payload);
return; // Don't actually send in debug mode
}
// ... actual sendBeacon/fetch logic
const blob = new Blob(
[JSON.stringify(payload)],
{ type: 'application/json' }
);
if (navigator.sendBeacon) {
navigator.sendBeacon(config.endpoint, blob);
} else {
fetch(config.endpoint, {
method: 'POST',
body: blob,
keepalive: true
}).catch((err) => {
warn('Send failed:', err.message);
});
}
}
The log() function only outputs when config.debug is true. The warn() function always outputs — warnings indicate bugs that the developer should fix regardless of mode. The send() function checks debug mode and returns early, so no network requests are made during development.
Enable debug mode in your init() call:
collector.init({
endpoint: '/collect',
debug: true // Logs to console, no network requests
});
isResourceTiming, isElementTiming, maxTime). Our version extends this concept into a full public API with init(), track(), set(), and identify(). The reference collector's reportPerf() function (line 436) and pushTask() (line 425) handle the sending — our send() function serves the same purpose but with the cascading sendBeacon → fetch fallback we built in Module 04. The key difference: our version makes configuration explicit and gives the page author control, while the reference collector hard-codes its configuration internally.