URL Fundamentals

The Addressing System of the Web

"What makes a cool URI? A cool URI is one which does not change. What sorts of URI change? URIs don't change: people change them."
— Tim Berners-Lee

CSE 135 — Full Overview

Section 1What is a URL?

URI is the umbrella term. URL is the most common type — it tells you how to get there.

URI / URL / URN

  • URI (Uniform Resource Identifier) — the umbrella term for any string that identifies a resource
  • URL (Uniform Resource Locator) — a URI that tells you how to get the resource (includes a scheme like https://)
  • URN (Uniform Resource Name) — a URI that names a resource without telling you how to access it (e.g., urn:isbn:978-0-13-468599-1)
┌─────────────────────────────────────┐ │ URI │ │ (Uniform Resource Identifier) │ │ │ │ ┌─────────────┐ ┌──────────────┐ │ │ │ URL │ │ URN │ │ │ │ (Locator) │ │ (Name) │ │ │ │ https://... │ │ urn:isbn:... │ │ │ │ ftp://... │ │ urn:uuid:... │ │ │ └─────────────┘ └──────────────┘ │ └─────────────────────────────────────┘
In everyday web development, "URI" almost always means "URL." The distinction matters technically, but you'll work with URLs 99% of the time.

Section 2URL Anatomy: The Five Components

Scheme (how) • Authority (where) • Path (what) • Query (filters) • Fragment (within)

The Five Components

scheme authority path query fragment ┌──┐ ┌──────────────┐ ┌──────────────┐ ┌─────────────────┐ ┌──────┐ https://www.example.com:443/products/search?category=books&sort=price#results └─┬─┘ └──────┬───────┘ └──────┬───────┘ └────────┬────────┘ └──┬───┘ scheme authority path query fragment (how) (where) (what) (filters) (within)
ComponentSeparatorRequired?Example
Scheme:// (after)Yeshttps
Authority// (before)Yes (for web URLs)www.example.com:443
Path/ (segments)Yes (at least /)/products/search
Query? (before), & (pairs)Nocategory=books&sort=price
Fragment# (before)Noresults

Section 3Scheme (Protocol)

The scheme tells the client how to retrieve the resource — what protocol to use.

Web & Communication Schemes

Web Schemes

SchemePurpose
httpsSecure HTTP (encrypted)
httpUnencrypted HTTP
wssSecure WebSocket

Communication

SchemePurpose
mailtoEmail composition
telPhone call
smsSMS message

Other Schemes & HTTPS Warning

SchemePurposeExample
ftpFile Transfer (legacy)ftp://files.example.com/pub/
fileLocal filesystemfile:///Users/jane/index.html
dataInline data (section 10)data:text/plain,Hello%20World
geoGeographic coordinatesgeo:37.7749,-122.4194

App-specific schemes: slack://, spotify://, vscode://, zoommtg:// — deep linking into native apps.

Always use HTTPS. HTTP sends everything in plaintext. HTTPS encrypts the connection, protects privacy, and is required for modern browser APIs (geolocation, camera, service workers). There is no reason to use http:// for new websites.

Section 4Authority: Host and Port

Which server to connect to — domain or IP, plus optional port number.

Host, Subdomains & Ports

  • Host — domain name (example.com) or IP address (93.184.216.34). DNS translates between them.
  • Subdomains add hierarchy: www., api., blog., mail.
SchemeDefault PortWith PortWithout (same thing)
http80http://example.com:80/pagehttp://example.com/page
https443https://example.com:443/pagehttps://example.com/page
ftp21ftp://files.example.com:21/ftp://files.example.com/
DNS resolves domain names to IP addresses. When you type example.com, your browser first asks DNS for the IP address, then connects. Only specify the port when it's not the default (e.g., localhost:3000).

Section 5Path

The hierarchical address of the resource on the server — like a filesystem directory structure.

Path: Hierarchy, Case & Conventions

/ ← root (home page) /about ← about page /products/shoes ← shoes within products /products/shoes/running ← running shoes within shoes /api/v2/users/42 ← user 42 in API version 2
  • Case sensitivity: Linux servers treat /About and /about as different resources. Windows (IIS) treats them the same. Always use lowercase.
  • Trailing slashes: /products/ traditionally implies a collection; /products implies a resource. Be consistent.
  • File extensions: Modern best practice is to omit them. They expose implementation details and make URLs fragile if you change technologies.
Paths are case-sensitive on Linux. Developing on macOS (case-insensitive) and deploying to Linux (case-sensitive) is a common source of 404 bugs.

Section 6Query String

Key-value pairs after ? — search, filtering, sorting, pagination, tracking.

Query String Patterns

Use CaseExample
Search?q=search+terms
Filtering?category=books&author=Doe
Pagination?page=2&per_page=25
Sorting?sort=date&order=desc
Tracking?utm_source=google&utm_medium=cpc
Multiple values?color=red&color=blue
Query strings are visible everywhere: browser bar, server logs, Referer header, browser history. Never put secrets (passwords, tokens, API keys) or PII in query strings. Use POST bodies or HTTP headers for sensitive data.

Section 7Fragment Identifier

Starts with # — handled entirely by the browser, never sent to the server.

Fragment: Client-Side Only

Browser sends: Browser keeps: ┌─────────────────────────────────────┐ ┌──────────┐ │ GET /docs/guide?version=2 HTTP/1.1 │ │ #chapter3│ │ Host: example.com │ │ │ │ │ │ (client- │ │ ← Everything before the # │ │ side │ │ gets sent to the server │ │ only) │ └─────────────────────────────────────┘ └──────────┘ URL: https://example.com/docs/guide?version=2#chapter3 └──────────── sent to server ───────────┘└─ not sent ─┘
  • Page sections: #introduction, #chapter3 — scrolls to an element with that id
  • Tab content: #settings, #profile — show a specific tab
  • SPA routing: #/users/42 — hash-based routing (changing the hash doesn't trigger a page reload)
Changing the fragment doesn't cause a new server request. This is why early SPAs used hash-based routing — it updated the URL without reloading the page.

Section 8Absolute vs Relative URLs

Full addresses vs partial addresses resolved against the current document.

Types of Relative URLs

TypeExampleMeaning
Same directorypage.htmlFile in the current directory
Child directoryimages/photo.jpgFile in a subdirectory
Parent directory../styles/main.cssGo up one level, then into styles/
Root-relative/images/logo.pngRelative to the site root

Given base URL https://example.com/blog/posts/article.html:

Relative URLResolved To
other.htmlhttps://example.com/blog/posts/other.html
../about.htmlhttps://example.com/blog/about.html
/styles/main.csshttps://example.com/styles/main.css
//cdn.example.com/lib.jshttps://cdn.example.com/lib.js

Base Tag & Best Practices

The <base> tag changes the base URL for all relative URLs on the page:

<head> <base href="https://cdn.example.com/assets/"> </head> <body> <!-- Resolves to https://cdn.example.com/assets/logo.png --> <img src="logo.png"> </body>

Gotcha: <base> affects all relative URLs, including #section fragment links.

Best Practices

  • Root-relative (/images/logo.png) for site-wide assets — works from any page depth
  • Document-relative (../styles/main.css) for resources that move with the document
  • Absolute (https://cdn.example.com/lib.js) for external resources
Don't use protocol-relative URLs (//cdn.example.com/...). Just use https://. They were a transitional pattern and break with file://.

Section 9URL Encoding (Percent-Encoding)

URLs can only contain a limited set of ASCII characters — everything else must be encoded as %XX.

Character Categories & Common Encodings

Character → UTF-8 bytes → Percent-encoded "A" → 0x41 → A (unreserved, no encoding needed) " " → 0x20 → %20 "&" → 0x26 → %26 "/" → 0x2F → %2F "é" → 0xC3 0xA9 → %C3%A9 (2 bytes) "中" → 0xE4 0xB8 0xAD → %E4%B8%AD (3 bytes)
  • Unreserved (never need encoding): A-Z a-z 0-9 - _ . ~
  • Reserved (special meaning; encode when used as data): : / ? # [ ] @ ! $ & ' ( ) * + , ; =
  • Everything else (must always be encoded): spaces, < > { } | \ ^ `, all non-ASCII

JS Functions & the Space Problem

FunctionPurposeUse For
encodeURIComponent()Encode a single valueQuery parameter values, path segments
encodeURI()Encode a full URLComplete URLs (preserves structure)
// encodeURIComponent — for individual values const query = encodeURIComponent('cats & dogs'); // → "cats%20%26%20dogs" // encodeURI — for complete URLs const fullUrl = encodeURI('https://example.com/path with spaces/page'); // → "https://example.com/path%20with%20spaces/page"

%20 vs +

  • %20 — standard percent-encoding (used in paths and most contexts)
  • + — only valid in query strings of application/x-www-form-urlencoded (HTML forms)
  • When in doubt, use %20. It's always correct.
Always use encodeURIComponent() for values. Never build URLs by string concatenation. Use the URL API (section 15) for building URLs programmatically.

Section 10Data URIs

Embed resource content directly in the URL itself — no HTTP request needed.

Syntax & When to Use

data:[mediatype][;base64],data data:text/plain,Hello%20World data:text/html,<h1>Hello</h1> data:image/svg+xml,<svg xmlns="...">...</svg> data:image/png;base64,iVBORw0KGgoAAAANSUhEUg...
MIME TypeTypical Use
text/plainSimple text content
text/htmlInline HTML documents, iframes
image/svg+xmlInline vector graphics (icons, logos)
image/pngSmall raster images (base64)
  • Use for: small icons, critical above-the-fold images, CSS backgrounds for tiny images
  • Don't use for: large images (base64 adds ~33% size), cacheable resources, resources shared across pages
Security: Data URIs can be XSS vectors. data:text/html can contain JavaScript. CSP headers can restrict them. Be cautious with user input.

Section 11URLs and State Management

URLs make state shareable and bookmarkable — one of the oldest state mechanisms on the web.

History API & Hash vs pushState

The History API updates the URL without triggering a page reload:

history.pushState({page: 2}, '', '/products?page=2'); // Adds new entry to browser history history.replaceState({}, '', '/products?sort=price'); // Replaces current entry (no new history entry)
AspectHash Routing (#/path)History API (/path)
URL appearanceexample.com/#/users/42example.com/users/42
Server configNone neededRequires catch-all route
SEOPoor (fragments not sent to server)Good (real URLs, crawlable)
Modern usageLegacy SPAsStandard for modern frameworks
The shareability test: If a user should be able to copy the URL and have a colleague see the same view — that state belongs in the URL. Search results, filtered lists, and specific views are prime examples.

Section 12URL Security

URLs are user-controllable input. Any part can be manipulated by an attacker.

Common URL-Based Attacks

AttackHow It WorksPrevention
Parameter manipulation?user_id=42?user_id=1Server-side authorization
Open redirect/login?redirect=https://evil.comWhitelist domains; relative URLs only
Path traversal/files/../../etc/passwdNormalize paths; reject ..
XSS via URLs/search?q=<script>alert(1)</script>HTML-escape; use CSP
javascript: scheme<a href="javascript:...">Only allow http:/https:
URL phishingLookalike domains, homograph attacksAwareness; browser warnings

Validation & Sensitive Data Exposure

// Safe URL validation function isSafeUrl(input) { try { const url = new URL(input); return ['http:', 'https:'].includes(url.protocol); } catch { return false; } } // Safe redirect validation function isSafeRedirect(redirectUrl) { try { const url = new URL(redirectUrl, window.location.origin); return url.origin === window.location.origin; } catch { return false; } }

URLs are exposed in: browser history, server logs, Referer header, proxy logs, and bookmarks.

Never put passwords, tokens, session IDs, or PII in URLs. Use HTTP headers (Authorization, Cookie) or request bodies. If a token must be in a URL temporarily (e.g., password reset), make it single-use and short-lived.

Section 13Cool URIs Don't Change

Good URLs should last forever. ~25% of links in academic papers are dead within 7 years.

Design Principles for Lasting URLs

Bad URL (Why)Good URL (Why)
/cgi-bin/display.pl?id=42
(exposes technology)
/articles/42
(technology-independent)
/~smith/papers/paper1.html
(tied to a person)
/research/machine-learning
(topic-based)
/docs/v2.3.1/api.aspx
(version + extension)
/docs/api
(stable, versionless)
  • 301 Moved Permanently — the URL has a new home. Browsers, search engines, and bookmarks update.
  • 410 Gone — the resource is permanently removed. More honest than a 404.
  • Archive rather than delete — add a notice rather than removing content.
A 301 redirect preserves bookmarks, SEO value, and external links. The cost of a redirect is trivial; the cost of a broken link (lost traffic, broken references) is not. Maintain redirects forever.

Section 14URLs as Interface

Users read, edit, share, and judge URLs. Good URLs are readable, predictable, and hackable.

Readable, Predictable, Hackable

Readable

  • Use real words: /products/running-shoes
  • Use hyphens: /blog/url-design-tips
  • Use lowercase: /about-us
  • Keep URLs concise

Hackable

  • Remove the last segment to go "up":
    /products/shoes/running/products/shoes
  • Change a parameter to explore:
    /search?q=python/search?q=javascript
PatternExample
REST API/api/users, /api/users/42, /api/users/42/posts
Blog/blog, /blog/2024/url-design
Documentation/docs/getting-started, /docs/api/authentication
The Phone Test: Can you read this URL aloud to someone over the phone and have them type it correctly? If yes, it's a good URL.

Section 15The URL API in JavaScript

Built-in URL and URLSearchParams classes — always prefer these over string manipulation.

URL Constructor & Properties

const url = new URL('https://example.com:8080/api/users?role=admin&active=true#table'); url.protocol // "https:" url.hostname // "example.com" url.port // "8080" url.pathname // "/api/users" url.search // "?role=admin&active=true" url.hash // "#table" url.origin // "https://example.com:8080" // Parse relative URL against a base const rel = new URL('/api/books', 'https://example.com'); // → "https://example.com/api/books"

URLSearchParams

const url = new URL('https://example.com/search'); url.searchParams.set('q', 'javascript tutorials'); url.searchParams.set('page', '1'); // → "https://example.com/search?q=javascript+tutorials&page=1" url.searchParams.get('q'); // "javascript tutorials" url.searchParams.has('page'); // true url.searchParams.delete('page'); // Validation URL.canParse('https://example.com'); // true URL.canParse('not a url'); // false
Always prefer the URL API over string manipulation. It handles encoding, validation, and edge cases correctly.

Section 16Summary

Key takeaways from all 16 sections.

Key Takeaways

ConceptKey Points
What is a URLURI is the umbrella; URL tells you how to get there
AnatomyScheme (how), Authority (where), Path (what), Query (filters), Fragment (within)
SchemeAlways use HTTPS. App-specific schemes enable deep linking.
AuthorityHost + port. DNS resolves domains to IPs. Default ports: HTTP=80, HTTPS=443.
PathHierarchical, case-sensitive on Linux. Omit file extensions.
QueryKey-value pairs for search/filter/sort. Never put secrets in them.
FragmentNever sent to the server. Used for page sections and SPA routing.
Relative URLsRoot-relative for assets, absolute for external, document-relative for co-located resources.
EncodingencodeURIComponent() for values. %20 over + when in doubt.
Data URIsInline resources. Good for small icons; bad for large or cacheable content.
StateURL state = shareable/bookmarkable. History API updates without reload.
SecurityURLs are user input. Validate. Never put secrets in URLs.
Cool URIsGood URLs last forever. 301 redirects when they must change.
InterfaceReadable, predictable, hackable. The Phone Test.
URL APInew URL(), URLSearchParams, URL.canParse(). Always prefer over string manipulation.