Analytics endpoints are public — anyone can POST data to them. Without validation, your database becomes a vector for stored XSS, SQL injection, and garbage data. This module builds reusable validation middleware for both Node.js and PHP.
Your analytics endpoint accepts untrusted input and stores it. Later, your dashboard reads that data and renders it in HTML. This creates a classic stored XSS pipeline: an attacker sends a <script> tag inside a URL or user-agent field, your endpoint saves it as-is, and your dashboard injects it into the page without escaping.
The fix is to validate and sanitize at ingest time, before data ever reaches storage. Here are the common attack vectors for analytics fields:
| Field | Attack Type | Mitigation |
|---|---|---|
url |
Stored XSS via <script> in URL string |
Validate URL format, HTML-encode entities |
referrer |
Stored XSS, open redirect payloads | Validate URL format, HTML-encode entities |
userAgent |
Stored XSS via crafted UA string | HTML-encode, truncate to max length |
sessionId |
SQL injection, NoSQL injection | Restrict to alphanumeric + hyphens only |
type |
Arbitrary string injection | Allowlist of known values |
viewportWidth/Height |
Type confusion, overflow | Parse as integer, clamp to range |
timestamp |
Format confusion, injection via date strings | Validate ISO 8601 format |
viewportWidth) can carry unexpected types. A number field set to the string "999999999999999999999" can cause integer overflow. A boolean field set to "__proto__" can trigger prototype pollution in careless JavaScript code.
Every field in your analytics payload needs a specific validation rule. Here is the complete specification for the beacon format used in this tutorial series:
| Field | Type | Rule | Example |
|---|---|---|---|
url |
string | Valid URL format (http:// or https://), max 2048 chars |
https://example.com/about |
type |
string | Enum allowlist: pageview, event, error, performance |
pageview |
timestamp |
string | ISO 8601 format | 2026-02-17T10:30:00.000Z |
viewportWidth |
number | Integer, range 0–32767 | 1920 |
viewportHeight |
number | Integer, range 0–32767 | 1080 |
userAgent |
string | Max 512 chars | Mozilla/5.0 (Windows NT 10.0; Win64; x64)... |
referrer |
string | Valid URL format, max 2048 chars | https://google.com/search?q=example |
sessionId |
string | Alphanumeric + hyphens/underscores, max 64 chars | a1b2c3d4-e5f6-7890 |
Validation rejects bad data. Sanitization cleans data that passes validation but might still contain dangerous characters. The three core sanitization operations:
Replace characters that have special meaning in HTML with their entity equivalents. This prevents stored XSS when the data is later rendered in a dashboard:
Character Entity Why
& & Starts an HTML entity
< < Opens an HTML tag
> > Closes an HTML tag
" " Breaks out of attribute values
' ' Breaks out of single-quoted attributes
After encoding, <script>alert(1)</script> becomes the harmless string <script>alert(1)</script> that renders as visible text, not executable code.
Control characters (ASCII 0x00 through 0x1F and 0x7F) have no place in analytics data. They can break parsers, corrupt log files, and serve as payloads for certain injection attacks:
// JavaScript: strip control characters
str.replace(/[\x00-\x1f\x7f]/g, '');
// PHP: strip control characters
preg_replace('/[\x00-\x1f\x7f]/', '', $str);
Even after encoding, enforce maximum lengths. A 1MB URL string will pass URL validation but should not be stored:
// JavaScript
str.substring(0, maxLen);
// PHP (multibyte-safe)
mb_substr($str, 0, $maxLen, 'UTF-8');
& instead of &, which corrupts the output.
The validate.js module exports a single function: validateBeacon(data). It takes a raw parsed JSON object and returns either a sanitized object or null if the data is invalid.
const { validateBeacon } = require('./validate');
// In your Express handler:
app.post('/collect', (req, res) => {
const clean = validateBeacon(req.body);
if (!clean) {
return res.status(400).json({ error: 'Invalid beacon data' });
}
// clean is safe to store
fs.appendFile(LOG_FILE, JSON.stringify(clean) + '\n', (err) => {
if (err) return res.sendStatus(500);
res.sendStatus(204);
});
});
Walk through the key functions in validate.js:
validateBeacon(data)The main entry point. Returns null immediately if data is not an object. Checks that url is present and matches the URL pattern. Then builds a clean output object by running each field through its specific validator:
function validateBeacon(data) {
if (!data || typeof data !== 'object') return null;
const allowedTypes = ['pageview', 'event', 'error', 'performance'];
const urlPattern = /^https?:\/\/.{1,2048}$/;
// URL is the only required field
if (typeof data.url !== 'string' || !urlPattern.test(data.url)) return null;
return {
url: sanitize(data.url, 2048),
type: allowedTypes.includes(data.type) ? data.type : 'pageview',
userAgent: sanitize(data.userAgent || '', 512),
viewportWidth: clampInt(data.viewportWidth, 0, 32767),
viewportHeight: clampInt(data.viewportHeight, 0, 32767),
referrer: sanitize(data.referrer || '', 2048),
timestamp: isISO8601(data.timestamp) ? data.timestamp : null,
sessionId: sanitizeId(data.sessionId || '', 64),
payload: data.payload || null,
};
}
sanitize(str, maxLen)Truncates, HTML-encodes, and strips control characters in a single pass:
function sanitize(str, maxLen) {
return String(str)
.substring(0, maxLen)
.replace(/[<>&"']/g, c => ({
'<':'<', '>':'>', '&':'&',
'"':'"', "'":'''
}[c]))
.replace(/[\x00-\x1f\x7f]/g, '');
}
clampInt(val, min, max)Parses a value as an integer and clamps it to a range. Returns null if the value is not a valid integer:
function clampInt(val, min, max) {
const n = parseInt(val, 10);
if (isNaN(n)) return null;
return Math.max(min, Math.min(max, n));
}
sanitizeId(str, maxLen)Strips everything except alphanumeric characters, hyphens, and underscores. This is the strictest sanitizer — session IDs should never contain HTML, SQL, or shell metacharacters:
function sanitizeId(str, maxLen) {
return String(str).substring(0, maxLen).replace(/[^a-zA-Z0-9\-_]/g, '');
}
The validate.php library mirrors the Node.js version exactly. PHP has built-in functions that make validation more concise in some cases:
require_once 'validate.php';
// In your endpoint:
$raw = json_decode(file_get_contents('php://input'), true);
$clean = validateBeacon($raw ?? []);
if ($clean === null) {
http_response_code(400);
echo json_encode(['error' => 'Invalid beacon data']);
exit;
}
// $clean is safe to store
Key differences from the Node.js version:
| Operation | Node.js | PHP |
|---|---|---|
| URL validation | Regex pattern | filter_var($url, FILTER_VALIDATE_URL) |
| HTML encoding | Manual .replace() |
htmlspecialchars() |
| Integer validation | parseInt() + isNaN() |
filter_var($val, FILTER_VALIDATE_INT) |
| String truncation | .substring() |
mb_substr() (multibyte-safe) |
| Regex | /pattern/.test(str) |
preg_match('/pattern/', $str) |
PHP's filter_var() with FILTER_VALIDATE_URL is stricter than a simple regex — it checks scheme, host, and path components. PHP's htmlspecialchars() with ENT_QUOTES | ENT_HTML5 handles all five dangerous characters in one call, whereas the Node.js version uses a manual replacement map.
Your validation code is only as good as the payloads you test it against. Here are examples of malicious inputs and what each implementation should do with them:
{
"url": "https://example.com/<script>alert('xss')</script>",
"type": "pageview"
}
Result: The url field is sanitized. The < and > characters are encoded to < and >. The script tag becomes inert text.
{
"url": "https://example.com/",
"type": "pageview",
"sessionId": "'; DROP TABLE beacons; --"
}
Result: The sanitizeId function strips everything except [a-zA-Z0-9\-_]. The output is DROPTABLEbeacons — harmless gibberish.
{
"url": "https://example.com/",
"type": "pageview",
"viewportWidth": 99999999999
}
Result: clampInt clamps the value to the maximum of 32767. Absurd values are silently normalized.
{
"url": "https://example.com/",
"type": "malicious_custom_type"
}
Result: The type is not in the allowlist, so it defaults to "pageview". Unknown types are never stored.
{
"url": "https://example.com/",
"type": "pageview",
"userAgent": "Mozilla/5.0\x00\x01\x02 injected"
}
Result: Control characters (0x00 through 0x1F) are stripped. The output is Mozilla/5.0 injected.
{
"type": "pageview",
"viewportWidth": 1920
}
Result: validateBeacon returns null. The endpoint responds with 400 Bad Request. No data is stored.
sqlmap and Burp Suite can also test your endpoint for injection vulnerabilities.