The collector fires JSON payloads at your server. Now what? This phase receives those raw beacons and transforms them into clean, enriched, sessionized records ready for storage. Every beacon must be parsed, validated against an expected schema, enriched with server-side data the client cannot provide (authoritative timestamps, IP addresses, geolocation), linked into sessions, and finally inserted into the database.
Get this phase wrong and your analytics data is unreliable — or worse, an attack vector. Get it right and you have a pipeline that produces trustworthy data at scale.
The browser sends a POST /collect request containing a JSON body. On the server side, that request kicks off a multi-step pipeline before the data ever reaches the database. Each step has a specific job, and if any step fails, the pipeline must handle the failure gracefully without leaking information back to the client.
The 204 No Content response is intentional. The server confirms receipt without sending a response body. This is the standard pattern for analytics endpoints — there is nothing useful to return to the client, and a smaller response means less bandwidth and faster completion.
200 OK implies there is a response body worth reading. A 204 No Content explicitly says "I received your data, there is nothing to send back." This is semantically correct for analytics collection and is what navigator.sendBeacon() expects.
Analytics data can theoretically be sent via either HTTP method, but the choice has real consequences for security, capacity, and correctness.
| Concern | GET | POST |
|---|---|---|
| Data location | Query string (?key=val) |
Request body (JSON) |
| Size limit | ~2,000–8,000 characters (browser/server dependent) | Effectively unlimited (server config) |
| Visibility | URL visible in server logs, browser history, referer headers | Body not logged by default, not in history |
| HTTP semantics | GET = retrieve a resource (idempotent, safe) | POST = submit data for processing (correct for analytics) |
| Caching | Browsers and CDNs may cache GET responses | POST responses are not cached by default |
| sendBeacon | Not supported | The only method sendBeacon uses |
GET is the legacy approach. The classic tracking pixel pattern (<img src="/track.gif?page=/about">) uses GET because it requires no JavaScript. It still works and has its place (email tracking, noscript fallbacks), but it cannot carry rich payloads.
POST is the modern standard. It supports structured JSON bodies, carries no size constraints from the URL, keeps data out of server access logs, and is semantically correct — you are submitting data, not requesting a resource. navigator.sendBeacon() uses POST exclusively.
In a typical analytics setup, the collector script runs on yoursite.com but sends beacons to analytics.yoursite.com or a completely different domain. The browser's Same-Origin Policy blocks these cross-origin requests unless the server explicitly allows them via CORS (Cross-Origin Resource Sharing) headers.
Access-Control-Allow-Origin: https://yoursite.com Access-Control-Allow-Methods: POST, OPTIONS Access-Control-Allow-Headers: Content-Type Access-Control-Max-Age: 86400
When the browser sends a POST with Content-Type: application/json, it is a "non-simple" request. The browser automatically sends an OPTIONS preflight request first to ask the server for permission. Your endpoint must handle both:
Access-Control-Allow-Origin: * in production. A wildcard allows any website to send data to your endpoint. An attacker could flood your database with fake analytics data from their own page. Always whitelist the specific origins you expect.
sendBeacon and preflights: If you send a beacon with Content-Type: text/plain instead of application/json, the browser treats it as a "simple" request and skips the preflight. This halves the number of requests but means you must parse the body manually on the server. Many production collectors use this trick for performance.
Analytics beacons arrive as user-supplied input. An attacker can craft any payload they want using curl, browser DevTools, or a script. If you blindly store this data and later display it in a dashboard, you have a stored XSS vulnerability.
| Field | Type | Validation Rule |
|---|---|---|
url |
string | Must be a valid URL, max 2048 chars |
referrer |
string | Valid URL or empty string, max 2048 chars |
event_type |
string | Must be one of: pageview, click, error, performance |
timestamp |
number | Unix epoch in ms, must be within reasonable range |
viewport_w |
integer | Positive integer, max 10000 |
viewport_h |
integer | Positive integer, max 10000 |
user_agent |
string | Max 512 chars, strip HTML |
The client can tell you what page the user is on and what their viewport size is. But some data can only come from the server, and some data the server should override because the client cannot be trusted to provide it accurately.
| Field | Source | Why Server-Side |
|---|---|---|
server_timestamp |
Date.now() / time() |
Client clocks can be wrong; server time is authoritative |
client_ip |
Request IP / X-Forwarded-For |
Only visible to the server; clients cannot self-report IPs |
geo_country |
IP geolocation lookup | Derived from IP; placeholder until a GeoIP database is integrated |
user_hash |
Hash of IP + User-Agent | Privacy-preserving user identifier without cookies |
user_agent |
User-Agent header |
Server reads the header directly; client-reported value is ignored |
Storing raw IP addresses raises privacy concerns (GDPR, CCPA). A common approach is to hash the IP with a daily rotating salt, producing a consistent identifier for sessionization without storing the actual IP:
// Node.js
const crypto = require('crypto');
const dailySalt = getDailySalt(); // rotates every 24 hours
const userHash = crypto
.createHash('sha256')
.update(clientIP + userAgent + dailySalt)
.digest('hex')
.substring(0, 16);
// PHP
$dailySalt = getDailySalt(); // rotates every 24 hours
$userHash = substr(
hash('sha256', $clientIP . $userAgent . $dailySalt),
0, 16
);
A session groups related page views and events from a single user visit. Without sessions, your analytics data is just a flat list of unconnected events. Sessions let you answer questions like "how many pages does a typical user view?" and "where do users enter and exit?"
The industry standard (used by Google Analytics, Adobe Analytics, and others) defines a session as ending after 30 minutes of inactivity. If a user views a page, leaves for 25 minutes, and comes back, it is the same session. If they leave for 35 minutes, a new session starts.
user_hash from IP + User-Agent (see Enrichment above)user_hashsession_idsession_id (UUID v4 or similar)session_id with the current eventThe first event in a session carries special significance. It records:
Once beacons are validated, enriched, and sessionized, they need to be inserted into the database. There are two fundamental strategies, each with distinct tradeoffs.
Each incoming beacon triggers an immediate INSERT statement. Data is available in the database within milliseconds of the event occurring.
| Advantage | Disadvantage |
|---|---|
| Data available immediately | One DB connection per beacon |
| Simple implementation | Higher DB load under traffic spikes |
| No data loss from buffer crashes | Slower individual inserts vs. bulk |
Beacons are accumulated in an in-memory buffer (or a file queue) and flushed to the database periodically (e.g., every 5 seconds or every 100 records).
| Advantage | Disadvantage |
|---|---|
| Far fewer DB connections | Data is delayed by the flush interval |
Bulk INSERT is much faster |
Buffer loss on server crash |
| Handles traffic spikes gracefully | More complex implementation |
Your analytics endpoint will encounter errors: malformed JSON, unexpected field types, database connection failures, disk full conditions. The way you handle these errors matters for both reliability and security.
Regardless of what happens internally, your /collect endpoint should return 204 No Content to the client. Never return error details.
| Scenario | Internal Action | Client Response |
|---|---|---|
| Valid beacon, stored successfully | INSERT into database | 204 |
| Invalid JSON body | Log error, discard payload | 204 |
| Validation failure | Log violation, discard payload | 204 |
| Database connection down | Log error, queue for retry or discard | 204 |
| Unknown/unexpected error | Log stack trace, discard payload | 204 |
400 Bad Request tells an attacker their payload was rejected, helping them refine their attack. A 500 Internal Server Error reveals your server is struggling.sendBeacon does not expose the response status code to JavaScript. Even fetch-based collectors typically ignore the response. There is no point returning errors the client will never read.500 and the client retries, a database outage becomes a traffic multiplier. Returning 204 prevents cascading failures.204 to the client does not mean you ignore errors. Log validation failures, database errors, and malformed payloads to a server-side log file or monitoring system. You need this data to debug pipeline issues.
The tutorials in this section provide both a Node.js (Express) and a PHP (PDO) implementation of the server-side processing pipeline. Both cover the same logic: receive, validate, enrich, sessionize, and store analytics beacons.
| Aspect | Node.js (Express) | PHP (PDO) |
|---|---|---|
| HTTP handling | express middleware |
$_SERVER, php://input |
| JSON parsing | express.json() middleware |
json_decode(file_get_contents('php://input')) |
| CORS | cors package or manual headers |
header() calls |
| Database | mysql2/promise |
PDO with prepared statements |
| Hashing | crypto.createHash('sha256') |
hash('sha256', ...) |
| Deployment | Runs as a persistent process (PM2, systemd) | Runs per-request behind Apache/Nginx |
You do not need to build both. Choose the language that matches your stack — or build both to compare the execution model differences firsthand. The Node.js endpoint demonstrates the event-loop / long-running process model, while the PHP endpoint demonstrates the start-process-die model. Both are valid architectures for analytics ingestion.
Build an Express endpoint that receives, validates, enriches, and stores analytics beacons in MySQL.
Start Module →Build a PHP endpoint using PDO that mirrors the Node.js pipeline with the start-process-die execution model.
Start Module →Harden your endpoint with schema validation, input sanitization, and defense against stored XSS.
Start Module →Implement session linking with 30-minute timeout, user hashing, and first-touch attribution.
Start Module →