Module 02: PHP Collection Endpoint

Build the same collection endpoint in PHP using PDO prepared statements. The validation and enrichment logic is identical to the Node.js version — same fields, same 204 response, same security considerations — but the runtime model is fundamentally different.

Demo Files

1. PHP vs Node.js for Collection

The collection logic is the same in both languages: receive a POST, parse the JSON body, validate the fields, enrich with server-side data, insert into the database, and return 204. The differences are in how the runtime handles requests.

Aspect PHP Node.js
Execution model Per-request process — script starts, runs, and exits for every request Long-running process — single event loop handles all requests
State between requests None by default — no shared memory between requests Shared — variables persist across requests in the same process
Database driver PDO (PHP Data Objects) mysql2 / pg / sqlite3
Connection pooling Handled by the web server (e.g., persistent connections via PDO::ATTR_PERSISTENT) Handled in application code (e.g., mysql2/promise pool)
Crash isolation A fatal error kills only one request An unhandled exception can crash the entire server
Deployment Drop files into the web server directory Run a process, manage with pm2 or systemd

PHP's per-request model means every request gets a clean slate. There is no risk of memory leaks accumulating over time, and a bug in one request cannot corrupt state for another. The tradeoff is that PHP must re-establish database connections and re-initialize state on every request (though persistent connections mitigate this).

Browser Apache + PHP ┌─────────────┐ ┌──────────────────────┐ │ collector.js │ │ Apache │ │ │ POST /collect │ ├─ mod_rewrite │ │ sendBeacon() ├──────────────>│ │ /collect │ │ or fetch() │ │ │ → collect.php │ │ │ │ └─ PHP process │ │ │<── 204 ───────│ 1. Parse JSON │ └─────────────┘ No Content │ 2. Validate │ │ 3. PDO INSERT │ │ 4. Exit │ └──────────────────────┘

2. URL Rewriting with .htaccess

The collector script sends beacons to /collect, but the actual PHP file is collect.php. Apache's mod_rewrite bridges this gap:

RewriteEngine On
RewriteRule ^collect$ collect.php [L,QSA]

This rule tells Apache: when a request comes in for /collect (no file extension), internally rewrite it to collect.php. The client never sees the rewrite — the URL stays as /collect.

Note: mod_rewrite must be enabled on the server. On Ubuntu/Debian, run sudo a2enmod rewrite and restart Apache. Also ensure the directory's AllowOverride directive includes FileInfo or All so that .htaccess files are respected.

In the Node.js version, routing is handled in application code (app.post('/collect', ...)). In PHP, routing is typically delegated to the web server via rewrite rules. This is a fundamental architectural difference — PHP separates the routing concern from the application logic.

3. The PHP Endpoint

Here is collect.php broken down step by step.

Step 1: CORS Headers

Just like the Node.js version, the PHP endpoint must send CORS headers to allow cross-origin beacons:

header('Access-Control-Allow-Origin: *');
header('Access-Control-Allow-Methods: POST, OPTIONS');
header('Access-Control-Allow-Headers: Content-Type');

if ($_SERVER['REQUEST_METHOD'] === 'OPTIONS') {
    http_response_code(204);
    exit;
}

PHP's header() function sets HTTP response headers. The exit after handling OPTIONS ensures no further code runs for preflight requests.

Step 2: Method Check

Reject anything that is not a POST:

if ($_SERVER['REQUEST_METHOD'] !== 'POST') {
    http_response_code(405);
    exit;
}

In Node.js/Express, this is implicit — app.post('/collect', ...) only matches POST requests. In PHP, you must check explicitly because the script runs for any HTTP method.

Step 3: Read and Parse the JSON Body

PHP does not automatically parse JSON request bodies the way Express middleware does. You must read the raw input stream and decode it manually:

$raw = file_get_contents('php://input');
$data = json_decode($raw, true);

php://input is a read-only stream that gives you the raw request body. The true parameter to json_decode returns an associative array instead of an object, which is more convenient for field access with the ?? null coalescing operator.

Step 4: Validate

Silently reject invalid data with a 204 — do not leak validation details to potential attackers:

if (!$data || empty($data['url'])) {
    http_response_code(204);
    exit;
}

$allowedTypes = ['pageview', 'event', 'error', 'performance'];
$type = in_array($data['type'] ?? '', $allowedTypes) ? $data['type'] : 'pageview';

Note the difference from the Node.js version: here we return 204 even for invalid data instead of 400. This is a deliberate security choice — returning different status codes for valid vs. invalid data tells an attacker which fields are required and which values are accepted. A uniform 204 reveals nothing.

Step 5: Enrich with Server-Side Data

$serverTimestamp = date('Y-m-d H:i:s');
$clientIp = $_SERVER['REMOTE_ADDR'] ?? '';

Same enrichment as the Node.js version: a server-side timestamp (because client clocks cannot be trusted) and the client IP address (for geolocation and bot detection).

Step 6: PDO Prepared INSERT

This is where PHP and Node.js differ most at the code level. PHP uses PDO with positional placeholders (?):

$pdo = new PDO(
    'mysql:host=localhost;dbname=analytics;charset=utf8mb4',
    'root', '',
    [PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION]
);

$stmt = $pdo->prepare(
    'INSERT INTO pageviews (url, type, user_agent, viewport_width,
     viewport_height, referrer, client_timestamp, server_timestamp,
     client_ip, session_id, payload)
     VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)'
);

$stmt->execute([
    substr($data['url'], 0, 2048),
    $type,
    substr($data['userAgent'] ?? '', 0, 512),
    isset($data['viewportWidth']) ? (int)$data['viewportWidth'] : null,
    isset($data['viewportHeight']) ? (int)$data['viewportHeight'] : null,
    substr($data['referrer'] ?? '', 0, 2048),
    $data['timestamp'] ?? null,
    $serverTimestamp,
    $clientIp,
    substr($data['sessionId'] ?? '', 0, 64),
    isset($data['payload']) ? json_encode($data['payload']) : null,
]);

Key points about the PDO approach:

Step 7: Error Handling and Response

try {
    // ... PDO code above ...
} catch (PDOException $e) {
    error_log('Analytics collect error: ' . $e->getMessage());
}

http_response_code(204);

Errors are logged server-side but never exposed to the client. The endpoint always returns 204, whether the INSERT succeeded or failed. This prevents information leakage and ensures the collector script is not affected by database issues.

4. Security Notes

SQL Injection Prevention

PDO prepared statements are the primary defense against SQL injection. Compare the unsafe approach with the safe one:

// UNSAFE - string concatenation allows injection
$pdo->query("INSERT INTO pageviews (url) VALUES ('$url')");

// SAFE - prepared statement separates SQL from data
$stmt = $pdo->prepare("INSERT INTO pageviews (url) VALUES (?)");
$stmt->execute([$url]);

With prepared statements, the database driver handles escaping. Even if $url contains SQL metacharacters like ' or ;, they are treated as literal data, not SQL syntax. There is no way to break out of the placeholder.

XSS Prevention for Stored Data

When you later display the collected data in a dashboard, use htmlspecialchars() to prevent stored XSS:

// When displaying collected data in HTML
echo htmlspecialchars($row['url'], ENT_QUOTES, 'UTF-8');

An attacker could send a beacon with a URL like <script>alert('xss')</script>. If you render this directly in your dashboard HTML without escaping, the script executes. htmlspecialchars() converts < to &lt;, neutralizing the attack.

Defense in depth: Sanitize at both input and output. Truncate and validate at collection time (the endpoint), and escape at render time (the dashboard). Neither alone is sufficient.

5. Testing

Use test.html to send a test beacon to the PHP endpoint. Open the page, click the button, and check your browser's Network tab for the 204 response.

You can also test with curl from the command line:

curl -X POST http://localhost/collect \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com","type":"pageview","userAgent":"curl/test"}' \
  -v

Look for HTTP/1.1 204 No Content in the response. Then verify the data was inserted:

mysql -u root analytics -e "SELECT * FROM pageviews ORDER BY id DESC LIMIT 5;"

The test page sends a beacon with all the standard fields (URL, type, user agent, viewport dimensions, referrer, session ID) and displays the result. It uses the same approach as the Node.js test page — a simple fetch POST with a JSON body.

Summary