Module 03: Server-Log Collection

In this module, you will learn how to collect analytics data using the web server itself — no custom JavaScript collector required. Using tracking pixels and custom log formats, the server becomes the data collector.

Demo Files

Note: tracking-pixel.php requires a PHP-capable server (Apache with mod_php, or php -S localhost:8000).

A Different Approach: Server-Side Collection

Modules 01 and 02 used JavaScript to collect data and send beacons. This module takes a completely different approach — using the web server itself as the data collector. No JavaScript required.

This is the oldest analytics technique on the web, predating Google Analytics by a decade. In the mid-1990s, tools like Analog and Webalizer parsed server access logs to produce traffic reports. The approach fell out of fashion as JavaScript-based analytics offered richer client-side data, but it remains useful as a fallback and for capturing traffic that JavaScript cannot see.

Browser Web Server ┌──────────┐ ┌─────────────┐ │ <img> │ ── GET /pixel.php ▶ │ Serves 1x1 │ │ loads │ + query params │ GIF + logs │ │ pixel │ ◀─ 1x1 GIF ────── │ the request │ └──────────┘ └─────────────┘ ↓ Access Log Entry: IP, UA, time, referer, params

The key insight is that every HTTP request the browser makes is logged by the web server. If you can trigger a request with useful data attached, the server will record it for you automatically.

The Tracking Pixel Pattern

A tracking pixel is a 1x1 transparent GIF image embedded in the page. When the browser renders the page, it requests the image, and the server logs that request. The image is invisible to the user, but the HTTP request it generates carries valuable data:

<!-- Basic tracking pixel -->
<img src="https://analytics.example.com/pixel.php?page=/home&t=pageview"
     width="1" height="1" alt="" style="position:absolute;left:-9999px">

<!-- With cache-busting (prevents browser from caching the pixel) -->
<img src="https://analytics.example.com/pixel.php?page=/home&r=1706123456789"
     width="1" height="1" alt="">

<!-- noscript fallback — works even when JavaScript is disabled -->
<noscript>
  <img src="https://analytics.example.com/pixel.php?page=/home&js=0"
       width="1" height="1" alt="">
</noscript>

The style="position:absolute;left:-9999px" pushes the pixel off-screen so it does not affect page layout. The cache-busting technique appends a unique value (typically a timestamp) to the query string, ensuring the browser makes a fresh request every time rather than serving the image from cache.

Note: The <noscript> version is important — it is the only analytics method that works when JavaScript is completely disabled. This is why Google Analytics still includes a tracking pixel as a fallback.

Building the PHP Tracking Pixel

The tracking-pixel.php script does two things: serves a 1x1 transparent GIF to the browser, and logs the request data to a file. Let us walk through it section by section.

Cache-Control Headers

<?php
// tracking-pixel.php — Serves a 1x1 transparent GIF and logs the request

// Prevent caching so every page view generates a new request
header('Cache-Control: no-store, no-cache, must-revalidate');
header('Pragma: no-cache');
header('Expires: 0');

These headers are essential. If the browser caches the pixel image, subsequent page views will not generate new HTTP requests — the browser will simply reuse the cached version. That means lost data. The three headers cover different caching layers:

Serving the Pixel

// Set content type to GIF image
header('Content-Type: image/gif');

// The smallest valid GIF (43 bytes) — a 1x1 transparent pixel
// This is the GIF89a header + a single transparent pixel
echo base64_decode('R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7');

The base64 string decodes to a 43-byte GIF89a image — the smallest valid transparent GIF possible. The browser receives a real image, renders it (invisibly), and the user sees nothing. Meanwhile, the server has captured the request.

Logging the Request

// Log the hit (in production, write to a database or log file)
$data = [
    'timestamp' => date('c'),
    'ip'        => $_SERVER['REMOTE_ADDR'] ?? '',
    'ua'        => $_SERVER['HTTP_USER_AGENT'] ?? '',
    'referer'   => $_SERVER['HTTP_REFERER'] ?? '',
    'page'      => $_GET['page'] ?? '',
    'type'      => $_GET['t'] ?? 'pageview',
    'language'  => $_SERVER['HTTP_ACCEPT_LANGUAGE'] ?? '',
];

// Append to a JSON Lines file
$logFile = __DIR__ . '/pixel-hits.jsonl';
file_put_contents($logFile, json_encode($data) . "\n", FILE_APPEND | LOCK_EX);

The script extracts data from two sources: the $_GET superglobal (query parameters you control) and the $_SERVER superglobal (HTTP headers the browser sends automatically). It writes each hit as a single line of JSON to a .jsonl (JSON Lines) file — one JSON object per line, easy to parse with standard tools.

The LOCK_EX flag ensures that concurrent requests do not corrupt the file by writing simultaneously.

Custom Server Log Formats

Instead of writing PHP to log data, you can configure the web server itself to capture richer data in its access log. This requires no application code at all — just a configuration change.

Apache Custom LogFormat

# Standard Combined Log Format (what most servers use by default)
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

# Extended format for analytics — adds response time, cookie, and query string
LogFormat "%h %t \"%r\" %>s %b %D \"%{Referer}i\" \"%{User-Agent}i\" \"%{Accept-Language}i\" \"%{_session}C\"" analytics

# Using the custom format
CustomLog /var/log/apache2/analytics.log analytics

Each format directive captures a specific piece of data:

Directive Meaning
%h Remote host (client IP address)
%t Timestamp of the request
%r Request line (e.g., GET /page.html HTTP/1.1)
%>s Final HTTP status code
%b Response size in bytes
%D Time to serve the request in microseconds
%{Header}i Value of the named input (request) header
%{Cookie}C Value of the named cookie

Nginx Custom log_format

# Extended analytics format
log_format analytics '$remote_addr - $remote_user [$time_local] '
                     '"$request" $status $body_bytes_sent '
                     '"$http_referer" "$http_user_agent" '
                     '"$http_accept_language" '
                     '$request_time '
                     '"$http_x_forwarded_for"';

access_log /var/log/nginx/analytics.log analytics;

Nginx uses variable names instead of percent directives. The $request_time variable gives response time in seconds with millisecond precision, and $http_x_forwarded_for captures the original client IP when behind a reverse proxy or load balancer.

Client Hints: Modern Alternative to User-Agent

The User-Agent string has become increasingly unreliable — browsers are freezing and reducing their UA strings to prevent fingerprinting. HTTP Client Hints provide a structured alternative. The server opts in by sending an Accept-CH response header, and the browser then includes the requested hints in subsequent requests.

# Server response header (opt-in)
Accept-CH: Sec-CH-UA, Sec-CH-UA-Platform, Sec-CH-UA-Mobile, ECT, Downlink, RTT

# Subsequent browser request headers
Sec-CH-UA: "Chromium";v="120", "Google Chrome";v="120"
Sec-CH-UA-Platform: "macOS"
Sec-CH-UA-Mobile: ?0
ECT: 4g
Downlink: 10
RTT: 50

These headers provide structured, machine-readable data instead of a monolithic User-Agent string that must be parsed with complex regex patterns. The network-quality hints (ECT, Downlink, RTT) are particularly valuable for analytics — they tell you the user's effective connection type, bandwidth, and round-trip time.

You can capture Client Hints in your server log format:

# Apache — capture Client Hints in log
LogFormat "%h %t \"%r\" %>s \"%{Sec-CH-UA}i\" \"%{Sec-CH-UA-Platform}i\" \"%{ECT}i\" \"%{Downlink}i\"" hints

The Script-to-Header Bridge

JavaScript can set cookies, and cookies are sent as HTTP headers with every request. This creates a bridge between client-side data collection and server-side logging: JavaScript collects data that is only available in the browser, stores it in a cookie, and the server logs the cookie value from the HTTP headers.

// Client-side: store viewport size in a cookie
document.cookie = 'vp=' + window.innerWidth + 'x' + window.innerHeight +
                  ';path=/;max-age=1800;SameSite=Lax';
# Server-side: log the cookie value
LogFormat "%h %t \"%r\" %>s \"%{vp}C\"" viewport-log

The first request from a user will not include the cookie (it has not been set yet), but every subsequent request will carry the viewport data in the Cookie header. The server logs it without any application code.

Warning: This technique is limited by cookie size restrictions (~4KB total per domain) and adds overhead to every HTTP request. Use it sparingly for high-value data points that cannot be captured in headers natively.

Advantages and Limitations

How does server-log collection compare to the JavaScript beacon approach from Modules 01 and 02?

JavaScript Beacons (Modules 01-02) Server-Log Collection
Works without JS No Yes (tracking pixel)
Client-side data Full access (viewport, JS APIs) Limited (headers, Client Hints)
Implementation Custom script Server config
Data granularity High (custom events, timing) Lower (page views, basic metadata)
Bot traffic Can filter client-side Captures everything
Privacy Can check consent in JS Harder to add consent
Cross-Reference: See analytics-overview.html Section 5 (Enriching Server Logs) for more on this approach. In practice, most analytics systems combine both methods: JavaScript beacons for rich client-side data, server logs for completeness and as a fallback. Module 03 is a parallel track — the scripts from Modules 01-02 do not depend on this module's code.

Summary