Raw pageviews are individual events. Sessionization groups them into coherent user visits using a 30-minute inactivity timeout. This module builds session-stitching logic for both Node.js and PHP.
A session is a sequence of pageviews from the same visitor without a gap longer than 30 minutes. If a user visits three pages within a few minutes, those pageviews belong to one session. If the same user comes back an hour later, that starts a new session.
The session_id comes from the client (typically set via a cookie or generated in JavaScript), but the server controls session boundaries. The client identifies who is visiting; the server decides when one visit ends and another begins.
The 30-minute threshold is an industry convention (used by Google Analytics and most analytics platforms). The idea is simple: if a user has not interacted with your site for half an hour, they have effectively "left" even if they never closed the tab.
The schema-sessions.sql file defines the sessions table. Each row represents one user visit:
| Column | Type | Purpose |
|---|---|---|
id |
INT AUTO_INCREMENT | Internal primary key |
session_id |
VARCHAR(64) | Client-generated session identifier |
first_page |
VARCHAR(2048) | Landing page URL (entry point) |
last_page |
VARCHAR(2048) | Most recent page viewed |
page_count |
INT UNSIGNED | Total pages viewed in this session |
start_time |
DATETIME | When the session began |
last_activity |
DATETIME | Most recent pageview timestamp |
duration_seconds |
INT UNSIGNED | Elapsed time from start to last activity |
referrer |
VARCHAR(2048) | Referring URL from the first pageview |
user_agent |
VARCHAR(512) | Browser user-agent string |
The relationship between sessions and pageviews is one-to-many: each session contains one or more pageviews.
session_id column is the client-provided identifier. A single session_id can map to multiple session rows if the user returns after a 30-minute gap — each return creates a new row with the same session_id but different start_time.
Every time a pageview arrives, the server must decide: does this pageview belong to an existing session, or does it start a new one? Here is the pseudocode:
ON each incoming pageview:
1. Extract session_id from the beacon
2. Query: SELECT id, last_activity FROM sessions
WHERE session_id = ?
ORDER BY last_activity DESC LIMIT 1
3. IF a row is found:
gap = now - last_activity
IF gap <= 30 minutes:
-- Continue existing session
UPDATE sessions SET
last_page = pageview.url,
page_count = page_count + 1,
last_activity = now,
duration_seconds = now - start_time
WHERE id = row.id
ELSE:
-- Gap too large, start new session
INSERT INTO sessions (session_id, first_page, ...)
4. IF no row found:
-- First visit ever for this session_id
INSERT INTO sessions (session_id, first_page, ...)
The key insight is that steps 3 (gap too large) and 4 (no row found) produce the same action: insert a new session row. The only case where we update is when an existing session is still active (gap <= 30 minutes).
ORDER BY last_activity DESC LIMIT 1 query and the row-level UPDATE handle this gracefully — both pageviews will update the same session row. For very high traffic, you might add a database-level lock or use INSERT ... ON DUPLICATE KEY UPDATE, but for a course project the simple approach works fine.
The sessionize.js module exports a single sessionize() function. It takes a MySQL connection pool and a beacon object, then applies the 30-minute timeout logic.
Let us walk through it step by step:
const SESSION_TIMEOUT_MS = 30 * 60 * 1000; // 30 minutes in milliseconds
We define the timeout as a constant at the top of the module. Using milliseconds matches JavaScript's Date arithmetic, where subtracting two Date objects yields a difference in milliseconds.
async function sessionize(pool, beacon) {
const { sessionId, url, referrer, userAgent } = beacon;
if (!sessionId) return;
The function destructures the beacon to extract the fields it needs. If there is no sessionId, we bail out early — we cannot sessionize without an identifier.
const now = new Date();
// Look up existing session
const [rows] = await pool.execute(
'SELECT id, last_activity FROM sessions WHERE session_id = ? ORDER BY last_activity DESC LIMIT 1',
[sessionId]
);
We query for the most recent session row matching this session_id. The ORDER BY last_activity DESC LIMIT 1 ensures we get the latest one if multiple exist.
if (rows.length > 0) {
const lastActivity = new Date(rows[0].last_activity);
const gap = now - lastActivity;
if (gap <= SESSION_TIMEOUT_MS) {
// Update existing session
await pool.execute(
`UPDATE sessions SET
last_page = ?, page_count = page_count + 1,
last_activity = ?, duration_seconds = TIMESTAMPDIFF(SECOND, start_time, ?)
WHERE id = ?`,
[url, now, now, rows[0].id]
);
return;
}
}
If a session exists and the gap is within 30 minutes, we update the existing row: set last_page to the current URL, increment page_count, update last_activity, and recalculate duration_seconds using MySQL's TIMESTAMPDIFF.
// Create new session (either no previous session or gap exceeded)
await pool.execute(
`INSERT INTO sessions (session_id, first_page, last_page, page_count, start_time, last_activity, duration_seconds, referrer, user_agent)
VALUES (?, ?, ?, 1, ?, ?, 0, ?, ?)`,
[sessionId, url, url, now, now, referrer || null, userAgent || null]
);
}
If no session was found, or the gap exceeded 30 minutes, we insert a new session row. The first_page and last_page are both set to the current URL (since this is the first pageview), page_count starts at 1, and duration_seconds starts at 0.
The sessionize.php file implements the same logic using PDO. The structure mirrors the Node.js version closely.
define('SESSION_TIMEOUT_SECONDS', 1800); // 30 minutes
PHP uses seconds natively (via time() and strtotime()), so we define the timeout in seconds rather than milliseconds.
function sessionize(PDO $pdo, array $beacon): void {
$sessionId = $beacon['sessionId'] ?? '';
if ($sessionId === '') return;
The null coalescing operator (??) provides a default empty string if the key is missing. We return early if there is no session ID.
$url = $beacon['url'];
$referrer = $beacon['referrer'] ?? null;
$userAgent = $beacon['userAgent'] ?? null;
$now = date('Y-m-d H:i:s');
We format the current time as a MySQL-compatible datetime string. The referrer and userAgent default to null if not provided.
// Look up existing session
$stmt = $pdo->prepare(
'SELECT id, last_activity FROM sessions WHERE session_id = ? ORDER BY last_activity DESC LIMIT 1'
);
$stmt->execute([$sessionId]);
$row = $stmt->fetch(PDO::FETCH_ASSOC);
Same query as the Node.js version. PDO's fetch() returns false if no rows match, which we can use as a boolean check.
if ($row) {
$lastActivity = strtotime($row['last_activity']);
$gap = time() - $lastActivity;
if ($gap <= SESSION_TIMEOUT_SECONDS) {
$update = $pdo->prepare(
'UPDATE sessions SET last_page = ?, page_count = page_count + 1, last_activity = ?, duration_seconds = TIMESTAMPDIFF(SECOND, start_time, ?) WHERE id = ?'
);
$update->execute([$url, $now, $now, $row['id']]);
return;
}
}
We convert last_activity to a Unix timestamp with strtotime(), then compare the gap against our threshold. If within bounds, we update the session row and return.
// Create new session
$insert = $pdo->prepare(
'INSERT INTO sessions (session_id, first_page, last_page, page_count, start_time, last_activity, duration_seconds, referrer, user_agent) VALUES (?, ?, ?, 1, ?, ?, 0, ?, ?)'
);
$insert->execute([$sessionId, $url, $url, $now, $now, $referrer, $userAgent]);
}
The fallthrough case inserts a new session, identical in structure to the Node.js version.
The referrer on the first pageview of a session tells you where the user came from. This is called first-touch attribution — the traffic source that initiated the visit.
We store the referrer on the session row, not on every pageview. Why? Because only the first pageview in a session carries the external referrer. Subsequent pageviews within the same session will have your own site as the referrer (internal navigation), which is not useful for attribution.
https://twitter.com/... (external — this is the one we store)https://yoursite.com/ (internal — not useful for attribution)https://yoursite.com/about (internal — not useful for attribution)In both the Node.js and PHP implementations, the referrer is only written when we INSERT a new session row. The UPDATE path (continuing an existing session) does not touch the referrer column. This ensures the referrer always reflects the traffic source that started the session, not the last internal page visited.
Common referrer categories for analytics dashboards:
| Category | Example Referrers |
|---|---|
| Direct | Empty or null (user typed URL or used bookmark) |
| Search | google.com, bing.com, duckduckgo.com |
| Social | twitter.com, facebook.com, linkedin.com |
| Referral | Any other external site linking to yours |