Web Performance

The Performance Golden Rule

"Send less data, less often, from nearby, when it is needed."

Performance is not a single technique — it is a progression of techniques, each building on the last. The golden rule unfolds into eight stages:

  1. Content Selection — Do you even need to send it?
  2. Minimization — Make what you send smaller (minification, tree-shaking, dead code elimination)
  3. Compression — Encode it efficiently (gzip, brotli, image optimization)
  4. Caching — Avoid re-sending it (HTTP cache headers, service workers)
  5. Latency Reduction — Shorten the path (CDNs, DNS optimization, preconnect)
  6. Installation and Preload — Be ready before the user asks (service workers, prefetch, preload)
  7. Demand-Driven Loading — Send it when the user actually needs it (lazy loading, code splitting)
  8. Monitoring and Analytics — Measure everything with RUM as you aim to hit RAIL-like performance

This page walks through each stage, from fundamental principles to implementation techniques.

1. Why Performance Matters

Speed and User Behavior

The data is in and it is irrefutable and quite consistent across every study: speed deeply matters.

The specifics vary — Amazon's "100ms = 1% sales," Google's "500ms delay = 20% fewer searches" — but the direction is always the same. Speed is not a feature. Speed is the baseline expectation.

It Is Not Just a Web Thing

Perception of time passing is subjective and context-dependent. The ancient Greeks distinguished two concepts of time: chronos (quantitative, clock time) and kairos (qualitative, experiential time). The qualitative aspects can overwhelm human perceptions — a minute spent waiting for a page to load feels longer than a minute spent reading interesting content.

Tolerances for delay are not specific to computer interfaces. Sticky doors, slow restaurant service, boring lectures (you've never heard one of those!) — humans have always been impatient. What changes is the expectation floor. We went from waiting in bank lines on prescribed days and hours (now you know what "banker's hours means"), to drive-through pneumatic tubes, to ATMs at any hour, to swiping a card, to touching a phone to a terminal. Each leap collapsed the expectation to a new perceptual floor. Once users experience faster, they cannot un-experience it. Today a few seconds and people are exclaiming "Wow it's slow today!"

User tolerance strongly correlates with needs and wants Would you still deal with something you normally wouldn't if you had to do the thing? (Example: DMV line) How long would you tolerate a page load if you really wanted something? (Example: possibility for a prize or getting something you really want) As much as we want to make software about technical things at its heart the ultimate "grader" of quality is a human shaped by their experience and their particular needs and wants!

The Clock Starts Before Your Site

The user's total time includes unlocking their device, opening a browser or app, typing a query, DNS lookup, TCP handshake — all before your server sees a single byte. Your site is not the user's reason for being there; it is the mechanism for what they are trying to do.

Think about the user in three phases:

  1. before they arrive
  2. during their visit
  3. after they leave

Be very mindful of the before phase. This is phase is where expected response time is set by what people experience doing the same things elsewhere. As an example, if Amazon can transact in a second or two, users bring that expectation to your site — fair or not. This is a form of the "99% rule": 99% of the time, people are elsewhere, and that shapes their perception of your 1%.

The Impact of Each Phase

We usually remember the first time and the last time phase more; the middle is more of a montage. Consider that the arrival sets the tone for the entire performance perception. How users arrive matters: direct type-in (high intent), link follow (volatile referral vs. stable bookmark), search engine results (earned organic vs. paid placement). Each arrival path carries different expectations and tolerances. If the arrival doesn't go well you really don't have a chance. The ending of course sets the take away and that will be ultimately what the user remembers, if only you can get to that positive outcome!

UX/DX Tension

Performance is about users and how they feel using our code — not about us and our code. More effort spent on user benefit often means less efficient development. This creates a fundamental economic tension: delivery is a variable cost (every user pays the byte tax), while development is a more fixed cost (write once, ship many times). Framework convenience saves developer hours but costs bytes in every download. Whether that trade-off is acceptable depends on your growth stage and user count. This is just my whole DX and UX tension showing itself once again

Mobile Reality

Mobile phones are not desktop computers with small screens. Even high-end phone performance is a magnitude slower than laptop performance. Benchmarks lie: they don't account for power constraints, true core utilization, or the gap between simulators and real devices. Developers use flagships; real users are on median Android devices.

"The web has moved to relatively underpowered mobile devices with connections that are often slow, flaky, or both." — Addy Osmani, Google
Performance is fundamentally a user-experience discipline, not solely a technical one. The Foundations Overview covers the client-server model and the UX vs. DX tension in more detail. It's my strong opinion that in the modern space of LLM driven development performance is likely to suffer and your ability to guide LLMs or refine their output will be a differentiating aspect for you as a developer.

2. Key Definitions: Bandwidth and Latency

Bandwidth

Bandwidth is the amount of data that can be sent in a given time period. The common analogy: data is water, bandwidth is the diameter of the pipe. A wider pipe carries more water per second — but it doesn't make a drop of water arrive sooner no matter how big the pipe is!

Latency

Latency is the time delay required for information to travel across a network. It is analogous to the length of the pipe plus any delays switching between pipes. Latency includes processing delays, queuing delays, and transmission delays at every hop.

Research consistently shows: upgrading bandwidth from 5–10 Mbps yields only about 5% load time improvement. But reducing round-trip time by 20ms yields linear load time improvement. The implication is clear:

"Latency is the real enemy."

Fallacies of Hope

Common misconceptions that waste time and money:

Scale ≠ Speed

"Scale ≠ Speed."

Adding servers doesn't make a site faster unless you are already overloaded. If you need more servers to handle traffic, that is a capacity problem, not a speed problem. Scale issues suggest you don't know your capacity and tolerance thresholds. Know your max concurrent connections, requests per second, and resource ceilings. Scale equals cost — so act accordingly.

Scale & Speed Confusion: An easy way to think about this is when you have long lines at a store because of two few registers open and they add a checker what it does to the throughput of customers. Scale is different than speed, but they are related.

Key Definitions

Term Definition
Speed Both perceptual (how fast it feels) and actual (measured milliseconds). Both matter.
Bandwidth Data capacity per unit time. Measured in Mbps. Determines throughput, not responsiveness.
Latency Network delay / travel time. Measured in ms. The dominant factor in page load performance.
TTFB Time to First Byte. From request sent to the first byte of the response arriving at the client.
TTLB Time to Last Byte. From request sent to the final byte of the response arriving.
FCP First Contentful Paint. When the browser renders the first piece of DOM content (text, image, SVG).
TTI Time to Interactive. When the page is visually rendered AND reliably responsive to user input.
LCP Largest Contentful Paint. When the largest visible content element finishes rendering. A Core Web Vital.
CLS Cumulative Layout Shift. Measures visual stability — how much content jumps around during load. A Core Web Vital.
INP Interaction to Next Paint. Measures responsiveness to user interactions throughout the page lifecycle. A Core Web Vital (replaced FID).
Bandwidth gets the marketing (100GBps, unlimited bandwidth!, etc.); latency gets the blame. Most performance problems are latency problems. See the HTTP Overview for how request-response round trips amplify latency costs. Be careful of the marketing of bandwidth, there often serious catches in them and unlimited is never unlimited in the real sense.

3. RAIL: The Performance Model

Time Delay Buckets

Humans perceive delays in roughly these ranges:

RAIL Breakdown

Phase Budget What It Means
Response < 100ms Actions and input must produce a visual reaction within 100ms. Tap/click to visible feedback must feel instant.
Animation ~10ms per frame Target 60 fps = ~16.66ms per frame. Aim for ~10ms of work to leave room for browser rendering. Failure produces jank.
Idle 0–50ms chunks Background work should complete in ≤ 50ms chunks. You share the main thread with the UI; anything longer sets up jank for later interactions.
Load < 1000ms Get critical above-the-fold content on screen in under 1 second. Avoid the White Screen of Death (WSOD).
Jank Busting: Addressing page jank can be quite tricky because as we know we don't control the client. However, it is far worse than that because when you consider the browser we often don't acknowledge that our client-side JavaScript code shares the execution thread with the browser trying to paint the UI. What we do literally can stall the browser itself!
3rd Party Lack of Control When you rely on external services and linked scripts you have given away control of your performance outcome. You really can't control services so you link to so keep an eye on them. You'll find out that gurantees of performance are possible by these are what are called SLA (Service Level Agreements) and to get one you will pay a fair amount of money.

The Challenge Summarized

Deliver rich experiences over a public network worldwide, to slow devices on unreliable connections, in under 1 second for first load, then run at 100ms response times often using services and dependencies we don't control. Maintaining high performance under this conditions is insanely difficult and that is why it should never be an afterthought.

The 90% Problem

"~90% of user-response time issues are client-side." — Steve Souders

Start with client-side optimizations for the best gains. They are simpler to implement, easier to measure, and affect the largest portion of the user's experience. Server-side improvements matter, but the biggest wins are in what happens after the response leaves the server. In some ways getting the server side wrong unless you have huge data scale reveals engineering maturity as opposed to actual difficult. Client-side truly is harder than server-side if you are honest with yourself.

RAIL Is Not Binary

You cannot hit RAIL numbers in a simple pass/fail fashion. Real User Monitoring (RUM) reveals a spectrum: some users meet goals comfortably, some tolerate near-misses, and some fail entirely. The analytical framework should categorize user experiences into meeting, tolerating, and failing to reach performance goals.

RAIL provides the targets; monitoring reveals the reality. The OpenTelemetry Overview covers how to build an analytical framework that captures this spectrum instead of reducing performance to a single number.

4. Content Selection and Payload Reduction

The Payload

HTTP responses consist of a request line, headers, and payload. Most performance trouble lives in the payload. Payloads come in two flavors: text (HTML, CSS, JavaScript) and binary (images, video, audio, PDF, fonts). Each requires different optimization strategies.

Do You Even Need It?

The first question is always: do we really need this object? Bytes seem "freeish," so this thought often hasn't happened. This problem is exacerbated by the "localhost" effect: the developer's perception of performance on a fast local machine simply isn't the user's reality on a slow phone over a cellular connection.

Focus on large objects first — images are often the biggest and most unoptimized payload. But textual content matters too, because of how the browser's rendering pipeline works.

The (Un)clarity of (In)tangibility: Other engineering disciplines do not suffer in some ways like software engineering does. The nature of software in many cases is too abstract. We don't acknowledge the nature of the materials as much as say a structural engineer acknowledges the characteristics of materials and the forces placed upon them. In many ways those forces are still there, but they are often distant from us in the cloud or happening on a client's device. Studying analytics hopefully opened your eyes to this, but I truly believe the distance and the gains of Moore's Law making hardware and performance cheap and plentiful has created a big of a hangover of reality. The truth is we don't need racks of computers to deliver most things, unless we have massive scale, but our industry and its economic concerns plus our lack of seeing the humming data centers has blinded us to the truth of our "technical" material use. At the end of the day every byte is an atom in terms of power and has effects on the world. Poor development resulting in poor performance has costs just like more obvious things like single use plastic bottles.
Not all bytes are the same. JavaScript costs far more than images because JS must be downloaded, parsed, AND compiled before it can execute. 200 KB of JavaScript is significantly more expensive than 200 KB of images. This is the single most important thing to understand about payload optimization.

Content You Might Not Need

Unused Code and Framework Bloat

A persistent problem: using complete frameworks for a small subset of features. Importing all of Bootstrap CSS just to center a few elements. Including all of some JavaScript library just to use one utility function. The entire framework ships to every user, every time.

Solutions include custom bundle generation, tree-shaking (eliminating unreachable code at build time), and careful dependency auditing. Note the tension: a single framework <link> tag is high DX (easy for developers), while custom-building with care is lower DX but better UX (fewer bytes for users). In some sense, this is just a savings in creation that might be far exceeded cost wise in use! Economic folks might talk about this in the sense of fixed and variable costs!

The fastest byte is the one you never send. Before optimizing delivery, question whether each resource is necessary. Every <script>, <link>, and <img> should justify its existence.

5. Minification: HTML

Many "best practices" for performance are waived away because there is some assumption they will make coding harder. The complaints are often a bit suspect to the level of suggesting that one look at their assembly code and maintain it directly. Instead of employing such motivated thinking adopt a balanced philosophy - minification (the reduction of code size) is first application of that style of thinking.

Dev-Performance Pragmatism

"Code for maintenance, but prepare for delivery."

Minification should be automated — part of a build/publish pipeline, not a manual authoring effort. Your source code stays readable; your delivered code is optimized. Minification is NOT security; it is slight obfuscation easily reversed with pretty-printing.

The principle: minify first, then compress. They address different things and compound — minification removes structural redundancy, compression removes statistical redundancy.

HTML Minification Techniques

You likely aren't going to do perform these minification techniques by hand, rather your tool will do that, but it is useful to understand the techniques in case you want to do them yourself or encourage your GenAI tool to do so.

Technique Example Caveats
Whitespace removal Collapse multiple spaces/newlines to single space Preserve <pre>, <textarea>, &nbsp;, CSS white-space property, <script> tags without semicolons
Optional quote removal <p id="foo"><p id=foo> Unsafe for non-ordinal values (class lists, JS code in attributes)
Optional close tag removal </p>, </li> omitted Relies on content model inference; works until a parser changes
Comment removal Strip <!-- ... --> Preserve IE conditionals, template comments, DOM-trickery comments
Boolean attribute shortening <hr noshade="noshade"><hr noshade> Standard HTML allows this; safe
Self-closing tag cleanup <br /><br> Not applicable if serving as XHTML. While XHTML is considered old, it is re-emerging somewhat because of the drift that LLMs introduce into generated markup as it can catch such issues at the syntax stage.
Entity remapping &#174;&reg; (or reverse, whichever is shorter) Encoding-dependent; ensure consistency
Color shortening #FF0000red Only for named-color equivalents

Dangerous territory: element elimination (<p></p> removal), letting browsers infer <html>, <body>, <thead>. These work until a parser or spec changes on you.Do note you will see some of these techniques used by the largest sites most notably Google who are aware of things like packet fragmentation at the TCP level. It is a hyperscale optimization.

Fact: HTML Absolutely Is the Base Object

In almost every web situation, the final product is HTML — it is the atoms of web content and producing it is our ultimate mission. HTML doesn't represent huge byte counts on its own, but as the root document that triggers all other fetches, any delay in its delivery adds small delays to everything downstream: CSS, JavaScript, images, fonts.

Don't cache base HTML objects aggressively. If you cache the root HTML page, invalidating its dependent objects (CSS, JS with versioned names) becomes impossible. The HTML must be fresh so it can reference updated asset URLs.

Markup Quality

Valid markup may streamline the parse process (likely, though definitive data is scarce). Semantic markup over "div-itis" reduces bytes and improves document structure. Tools: W3C Validator, HTML Tidy, HTMLHint, html-minifier can help ensure quality markup, but you should want to do it anyway because quality markup improves accessibility (a11y) and helps with bots even. Doing your development work right shouldn't be a function of your perception of technical importance, a seasoned engineer knows by experience that in a complex system that even the smallest thing can have an outsized impact! Don't wait to discover this deep truth, by a negative experience

6. Minification: CSS

The Growing Problem

CSS file sizes have grown significantly over the years, as the Web Almanac has validated. CSS requests per page went from 1–3 to 6+ over a decade (at the 90th percentile: 9 to 18 requests). But the impact of CSS is not just about delivery size:

CSS Optimization Techniques

Like HTML you may not employ these techniques by hand, rather a tool will do them (recall dev for maintenance, prepare for delivery).

Technique Example
Unused CSS removal Tools: PurifyCSS, PurgeCSS. Strip rules that match no elements in your HTML.
Rule shorthands margin-left; margin-right; margin-top; margin-bottommargin
Value recasting #ff0000red, rgb(255,0,0)red, bold700
Unit elimination 0px0
Rule merging h1 {color: red;} h2 {color: red; font-size: larger;}h1, h2 {color: red;} h2 {font-size: larger;}
Empty rule elimination Remove selectors with no declarations
Comment/whitespace removal Strip all comments, collapse whitespace
Selector optimization Remove unnecessary * selectors, reduce overly specific selectors

Class-itis

Over-use of verbose class names — usa-button-secondary-inverse usa-button-active — adds up across a site. Automated solutions rewrite to short names (u-b-s-i), but watch for JavaScript code that references class names by string. It's quite clear to be know that class issues are quite significant and there simply is no need to approach web dev this way. We sadly are suffering with sub-optimal solutions because of poor web dev education and one person's ability to become a popular CSS influencer and push utility classes as the one true way. This is yet another example of the outsized effects of market forces over actual measured engineered solutions.

Critical CSS

Extract critical above-the-fold CSS and inline it directly in the HTML <head>. This eliminates a render-blocking network request for the initial viewport. Load the full stylesheet asynchronously afterward. Tool: Addy Osmani's critical.

Inlining and CSP Tension

Inlining CSS can be blocked by Content Security Policy (CSP). CSP exists to prevent injection attacks by disallowing inline styles and scripts. This creates a genuine performance vs. security tension: inlining is a useful performance technique that directly conflicts with a useful security mechanism. Solutions include CSP nonces and hashes, but they add complexity. Yikes we have a three way trade-off going on here! Hopefully with this one you are seeing that judgement is key and letting code just get spit out or copied in without applying such judgement is likely to have very negative impacts once the trade-offs implied in the solution are felt.

Tooling: cssnano, clean-css, CSSO, PurgeCSS, crass. Linting: CSSLint, stylelint.

CSS optimization is not just about bytes — it's about render blocking. A large unoptimized CSS file delays First Contentful Paint even if it compresses well, because the browser must parse it entirely before rendering. This is one reason the Professor echoing the opinions of many others considers excessive HTML and CSS in JavaScript to be one of the most troubling development practices in terms of performance as it literally is getting in the way of what the browser has been optimized to do! This is a deep example of the site versus app tension and if you should build up from HTML or emit wholly. Personally, from a performance point of view emission of HTML and CSS should be the exception not the norm, but observation of modern practices shows that simply is not the case despite the performance traces!

7. Minification: JavaScript

JS Costs More Than You Think

JavaScript must be downloaded, parsed, and compiled before it can execute. It is not just a download problem. More JS bytes is significantly worse than equivalent image bytes because of this triple cost.

JavaScript abuse is arguably the worst performance issue on the modern web. Even when you mitigate JavaScript delivery, you can hit the uncanny valley: a page that is visually loaded but not usable. This is measured by Time to Interactive (TTI) and can be severe on low-powered devices like median Android phones. Developer perceptions skewed by desktop or flagship phones cause avoidable problems that are caught later through user suffering. This is starting to have social impacts which as lead some folks to describe modern web practices for WWW as focused more on a Wealth Western Web as opposed to a world wide web.

JS Minification Techniques

Like the other minification techniques this is an overview of approaches taken by tools and you are not encouraged to pursue these things by hand.

Technique Example Savings
Whitespace reduction Remove newlines, indentation, extra spaces Moderate. Watch for ASI (Automatic Semicolon Insertion) issues.
Variable name rewriting var myLongVariableNamevar x Significant. Repeated long names are common in unminified code.
Code optimizations i = i + 1i++ Small per-instance, adds up.
Repetition rewrites document.write() × 10 → d = document; d.write() × 10 Moderate. Reduces repeated long property chains.
Dead code elimination Remove unreachable returns, unused functions Variable. Can be very large in framework code.
Statement combining var x; var y; var z;var x,y,z; Small. Eliminates repeated keywords.
Conditional shortening if/else → ternary ?: where appropriate Small. Watch for readability in source; minifier handles delivery.

JS Minification Risks

The primary worry: did you break the code? Risks come from the global nature of JavaScript, interactions between JS and HTML/CSS, and third-party scripts you don't control. Postel's Law applies: "Be conservative in what you do, liberal in what you accept" — be conservative in optimizations unless you control all included code.

Bundling and Code Splitting

Separate JS files are a developer value, not a delivery value. Bundle for delivery. Given JavaScript's shared global namespace, there is no delivery reason to keep files separate.

But a single monolithic bundle is also wrong. Code splitting by route (the Webpack pattern) sends only what the current page needs. The homepage gets homepage.js; the checkout page gets checkout.js; shared utilities load on demand.

Future: WebAssembly

Will JavaScript eventually become a high-level language with a compile target? WebAssembly offers near-native performance but is currently limited: it cannot manipulate the DOM directly (a limitation shared with Web Workers). This constrains its applicability for typical web application code, though it excels for compute-intensive tasks like image processing, encryption, and gaming.

JavaScript optimization is the highest-leverage performance work you can do. The Steve Souders 90% rule points here: most user-facing delay comes from client-side JS processing. Reducing JS payload reduces download, parse, and compile time simultaneously.
Your KB Mileage May Vary: A very disturbing aspect of how JavaScript dependencies are promoted is via a big of sleight of hand. For example, if a core library is 40Kb Gzipped what is it ungzipped? Hint: It's probably 200Kb or more. Even if things are small and quick to deliver what happens then? Very often these svelte libraries then go make fetches for all manner of plug-ins and extensions. Now we are getting hit with more latency! There isn't anything wrong with putting your best foot forward when promoting something, but many of the claims made are somewhat problematic. The Prof's firms often talk about something we dub "The Demo Illusion" which is where the Getting Started demo is very small and very easy. The sad part though is it is an illusion. The performance reality often comes crashing into your load traces once it is too late to remove the dependency. This lived experience is one reason I make such a big point about knowing what a dependency does and well it does it BEFORE you adopt it because it is often difficult to get rid of it later on - you are somewhat "married to" unless you have abstracted it away using an abstraction so it is easy to replace. With LLMs we may see that it is easier to move away from poorly thought out decisions, but of course using these tools in software has other issues. As usual, it's trade-offs all the way down!

8. Fonts, URI Paths, and Finishing Touches

Font Optimization

Web fonts have a real cost. The first question: is the font worth it? Most users cannot tell the difference between SansSerif A and SansSerif B. Consider using system fonts if you can - they are free delivery wise!

If you must use a specialized font be reasonable about it. Do you need all the versions of it? Semi-bold, bold, light, extra-bold, regular?

Possible Solution: Use a variable font. Variable fonts are fonts which are "smart" and have variables you can tune. These font formats are rarely used, though they are broadly supported. A single file in such a format can represent many variations of the typeface.

Are you are using a Unicode font with thousands of glyphs for an English page, you are sending glyphs you will never use.

Possible Solution: subset fonts — generate a font file containing only the glyphs your content actually needs. A full Google Fonts file might be 150 KB; the Latin subset might be 15 KB. The unicode-range CSS descriptor lets the browser download only the subsets it needs.

URI Path Reduction

Dependent objects (images, CSS backgrounds, JS files, fonts) that are not directly linked by users can have their paths shortened, made more cache-friendly, and obfuscated:

<!-- Before -->
<img src="/images/UCSD_logo_big.png">
<link rel="stylesheet" href="/css/main-styles.css">

<!-- After -->
<img src="/d/i/u0.png">
<link rel="stylesheet" href="/d/c/m0.css">

Use short 2-letter combinations (upper/lower case plus digits) — this gives thousands of unique paths. Content/entry-point URLs should NOT be rewritten (user-facing paths stay readable and bookmarkable). Only rewrite dependent resource paths.

Repeated site objects (nav elements, background images) propagate savings across every HTML and CSS file. A side benefit: anti-scraping/hotlinking, since these private-interface paths can be changed at will. For more robust hotlinking protection, use Referer header checking and CSP.

These are fine polish and belong mostly in the realm of large-scale site optimizations. At scale, every byte truly matters — high-traffic sites may even focus on HTTP header size reduction. For smaller sites, these techniques yield diminishing returns.

9. Compression

HTTP Compression

Compression is transparent to the application layer — negotiated via the Accept-Encoding request header and Content-Encoding response header. The browser says what it supports; the server picks the best match. Most servers and CDNs implement compression automatically.

Client Server | | |--- GET /app.js ------------------> | | Accept-Encoding: gzip, br | | | | [compress]| | | |<-- 200 OK ---------------------- | | Content-Encoding: br | | Content-Length: 48,291 | | (original: 167,422 bytes) | | Savings: ~71% | | |

Compression Formats

Format Identifier Notes
gzip gzip Universal support. The safe default.
Brotli br Better compression ratios than gzip. Increasingly supported. Requires HTTPS.
Deflate deflate Older, less common. Largely superseded by gzip.
Dictionary compression Emerging A form of delta encoding that leverages the "sameness" of web pages across requests.

Compression and Minification Together

Gzipping captures much of minification's gains on its own, but you should minify first, then compress. They are complementary: minification removes structural redundancy (comments, whitespace, long names), and compression removes statistical redundancy (repeated byte patterns). Beware of recompression from acceleration devices/services — double encoding can cause corruption.

Image Compression

Images are the biggest and most obvious byte savings opportunity. Compressing images feels like it should be an obvious build pipeline activity — and yet it is consistently neglected. Modern formats (WebP, AVIF) offer significant improvements over JPEG and PNG.

Image Format Choice is Also Contextual Choosing a font type like WebP or AVIF might save you delivery bytes, but if a user is expected or wants to use that image locally they may be frustrated. Don't just think about bytes with images think about use as well.
Don't be "packet stupid." Once you are down to a few kilobytes (1–2 KB), making a resource smaller doesn't even help — the TCP packet envelope has a minimum size. Also beware decompression time with large physical-dimension images: paint delay may outweigh the delivery savings. Consider replacing raster images with inline SVG where the visual is simple enough.
Compression is one of the highest-ROI performance techniques. It is usually a server configuration change (enable gzip/brotli), not a code change. See the HTTP Overview for details on content negotiation headers.

10. Caching

"Why do you keep sending me that logo?!" — Signed, your browser.

Caches store previously fetched objects for later reuse, avoiding network round trips entirely. HTTP cache headers govern the behavior: what gets cached, for how long, by whom, and when to revalidate.

Cache-Control Directives

This table gives an overview of caching headers used with HTTP, but sadly the topic is so complex there is a multi-hundred page book on the subject (Web Caching by Duane Wessels). A useful overview of caching is Mark Nottingham's Caching Tutorial for Web Authors and Webmasters. It's a classic!

Directive Meaning Use Case
public Any cache (browser, proxy, CDN) can store this Static assets: images, CSS, JS, fonts
private Only the user's browser may cache this Personalized content, authenticated responses
max-age=N Cache is fresh for N seconds max-age=31536000 (1 year) for versioned assets
no-cache Must revalidate with server before using cached copy Content that changes but benefits from conditional requests
no-store Do not cache at all Sensitive data (banking, health records)
immutable Content will never change; don't even revalidate Content-hashed assets (app.3f2a1b.js)

Validation Mechanisms

304 saves bandwidth but still incurs a network round-trip. Use max-age with versioned filenames to avoid even the round-trip.

Cache Busting

The URL is the cache key. To force invalidation, change the URL:

Useful and Ignorant? Busting caches with query strings is super useful sometimes, but often it points to developers who don't know we have headers for this as well. When you need a quick and dirty check to see if a page you changed is really changed add that query string, but if this is your goto strategy you likely need to study caching a bit more.

Cache Variation (Vary Header)

A single URL can represent multiple versions: compressed vs. uncompressed, different image formats (WebP vs. JPEG), different languages. The Vary header tells caches which request headers create distinct cached versions. Vary: Accept-Encoding is the most common. To remember this idea, consider a page /perf-tutorial. That URL could have an English and Spanish version, a compress and uncompressed, and so on. If the URL is the key to the object in the cache how do we know which one you want? We put the URL+The Vary header as the key in this case. You could of course solve this problem as /perf-tutorial-esand /perf-tutorial (for en), but now have other issues. Personally I think multi-format URLs make sense and is a sign of a very skilled web practioneer.

Caching Layers

Caches exist at multiple levels: browser cache, proxy cache, CDN edge cache, and reverse proxy cache at the origin. Private vs. public caching is critical for sensitive data — private ensures only the user's browser caches the response, not shared proxies.

Dos and Don'ts

Reverse Proxy Caching

"Static dynamic" pages — like /pressrelease.php?id=5 — often return the same content for every user but rebuild on every visit. Solutions: self-generate into static HTML files, or place a reverse proxy cache (Varnish, Nginx) in front of the application server. What can't easily go in a proxy cache: truly dynamic content and personalized content.

Caching is the most effective latency elimination technique. A cached resource has zero network latency. See the HTTP Overview for cache header mechanics and the Web Servers Overview for reverse proxy configuration. The State Management page covers the overlap between caching and application state.

11. Latency Reduction: CDNs and DNS

The Latency Problem

Even with all optimizations — minified, compressed, cached — point-source web serving has inherent latency issues for distant users. A server in San Diego cannot serve a user in Mumbai in under 50ms no matter how fast the server is. The speed of light imposes a floor.

Solution: move content closer to users.

Content Distribution Networks (CDNs)

CDNs replicate content to edge servers distributed across the globe. When a user requests a resource, the CDN routes them to the nearest edge server. This improves reliability, scalability, and performance simultaneously.

User (Tokyo) | [CDN Edge - Tokyo] ← Cache hit: 15ms | | Cache miss? v [CDN Edge - Singapore] | v [Origin Server - San Diego] ← Full round-trip: 180ms

CDNs use DNS-based or URL-rewriting-based redirection to route users to the optimal edge cache. They still have last-mile issues, and dynamic content at the edge is complex — edge caches may need to become intelligent edge compute servers. This also have a cost issue, this replication isn't free! Some cloud vendors include and some don't, be careful.

Global Web Farms

An alternative to CDNs: deploy multiple web farms worldwide with global load balancing. Redirect users based on server availability, network distance, geography, or a mix. Downside: increased data center and hardware costs — which is exactly why CDNs exist as a service.

DNS Optimization

DNS is both fragile and robust. Recommendations:

DNS centralization is a double-edged sword. Centralized DNS services (Cloudflare 1.1.1.1, Google 8.8.8.8) provide speed, but concentration creates single points of failure. The 2021 Cloudflare and Fastly outages took large portions of the internet offline. Some attacks on DNS infrastructure have been carried out by state level actors (often behind hacking groups). DNS in particular is so critical that is is a running joke with many web techies that when something has a problem, it's probably DNS.

Server Capacity and Bottlenecks

Common bottlenecks: incoming bandwidth saturation, hardware/software limits, flash traffic events, too many concurrent connections, slow downloaders holding connections open, and long server-side processing times.

Solutions include DNS round-robin (simple but suboptimal), hardware load balancers (F5, Citrix), and traffic segmentation (images.example.com, store.example.com). Connection offload (TCP termination and SSL offloading at the load balancer) reduces origin server load. Static content bottlenecks are typically disk I/O — memory caching helps. Dynamic content bottlenecks are typically CPU — reverse proxy caching helps.

Pay for Performance The truth is things are easier now than they were before. You can utilize cloud vendors and get resiliency and latency improvement at cost. The fast you want to go, the more you are likely to pay though just like a car! Be careful though, do not pick these things based upon brand. Assuming Amazon is the best is more brand loyalty than measured reality at times, especially when you consider cost for performance. Even if there 'bestness' was true, everyone pilling into Amazon-East zone is stupid. Observably people seem to do it assume it is somehow the best and when it goes down so do many services online!

12. Preloading, Prefetch, and Demand-Driven Loading

The "When You Need It" Principle

"Try to send the bytes when you actually need to send them, no earlier or later than when absolutely appropriate."

This is the intersection of performance engineering and user experience design. Send too early and you waste bandwidth on resources that may never be used. Send too late and the user waits.

Preloading and Prefetch

Interestingly these techniques and the ones that follow are mostly done in HTML and JavaScript. It's very little work for potentially quite a lot of gain.

Hint Syntax Purpose When to Use
preload <link rel="preload" href="font.woff2" as="font"> Fetch a resource needed for the current page but discovered late by the browser Critical fonts, hero images, key scripts referenced deep in CSS
prefetch <link rel="prefetch" href="/next-page.js"> Fetch a resource likely needed for the next navigation Next-page assets, predicted user flow (e.g., search → results page)
preconnect <link rel="preconnect" href="https://cdn.example.com"> Establish TCP + TLS connection to an origin early Third-party origins you know you'll need (CDN, analytics, fonts)
dns-prefetch <link rel="dns-prefetch" href="https://cdn.example.com"> Resolve DNS only (lighter than preconnect) Domains you might need; low cost, broad browser support

Predictive prefetch: Use real user analytics data to predict which pages users visit next (e.g., guess.js). If 80% of users on the product page go to checkout, prefetch checkout assets while they browse products.

Lazy Loading

Load below-the-fold content only when it is about to enter the viewport. Native lazy loading is now a standard:

<!-- Native lazy loading — no JavaScript needed -->
<img src="photo.jpg" loading="lazy" alt="Below-the-fold photo">

<!-- Eager loading for above-the-fold hero image -->
<img src="hero.jpg" loading="eager" alt="Hero image">

This pattern has been a native standard — for a long time and sadly I still see people using JavaScript today. It is a good idea to prototype performance improvements with JavaScript, but once browsers adopt it, you should go native.

Code Splitting

Don't send a monolithic bundle. Split by route/page. The homepage gets homepage.css and homepage.js; load global resources in the background. Same principle applies to JavaScript bundles via Webpack, Rollup, or Vite code splitting.

HTTP/2 and HTTP/3 Implications

Many older HTTP/1.1 specific optimizations — bundling, image sprites, domain sharding — become anti-patterns with HTTP/2. HTTP/2 multiplexes many streams over a single TCP connection, making many small files potentially more efficient than fewer large bundles. Just as modem-era patterns made the 2010s web worse, 2010s patterns can make the 2020s web worse.

Optimization techniques have a shelf life. Bundling was essential under HTTP/1.1's 6-connection-per-domain limit. Under HTTP/2, it can actually hurt performance by preventing fine-grained caching. Always consider the protocol your users are actually on. I think this is a more generalized warning because I am noticing many things are being done more out of "lore" and belief than actual value today. I worry that LLMs having indexed this stuff so hard you'll be hard pressed not to accidentally adopt expired "best practices" without even knowing it.
Loading strategy is where performance engineering meets UX design. See the Connections Overview for HTTP/2 multiplexing and WebSocket considerations.

13. Service Workers and Progressive Web Apps

The Network Problem

After doing everything right — minifying, compressing, caching, using a CDN — you still can't meet RAIL goals predictably with the network in play. Network latency is variable, connections are unreliable, and mobile users routinely go through tunnels and dead zones. The network is the last uncontrollable variable.

Progressive Web Apps (PWAs) using Service Workers make the network a controlled and even optional component.

Service Worker Role in Performance

A Service Worker is a JavaScript proxy that sits between your web page and the network. It can intercept every network request the page makes and decide how to handle it: serve from cache, fetch from network, or combine both strategies.

Cache Strategies

Strategy How It Works Best For
Cache-first Serve from cache; if not cached, fetch from network and cache for next time Static assets (CSS, JS, images, fonts) that change infrequently
Network-first Try network; if network fails or is slow, fall back to cache Dynamic content (API responses, user data) where freshness matters
Stale-while-revalidate Serve stale cache immediately, fetch updated version in background for next time Content where some staleness is acceptable (news feeds, dashboards)
Page Service Worker Cache Network | | | | |--- fetch(/api) ----->| | | | |--- check cache ------->| | | |<-- cache hit ----------| | |<-- cached data ------| | | | |--- fetch (background) |------------------->| | |<-- fresh data ---------|<-------------------| | |--- update cache ------>| | | | | | [stale-while-revalidate: user sees instant response, cache updates silently]
Service Workers are the logical endpoint of the performance progression. Content selection reduces what you send. Compression and caching reduce how much and how often. CDNs reduce the distance. Service Workers eliminate the network dependency entirely for cached resources. In many ways Service Workers expose that the network should be considered a progressive enhancement target and this observation is behind the idea of local first pattern of software.

14. Monitoring, Analytics, and RAIL Conformance

You Must Measure

Performance optimization without measurement is guessing. You need to measure three things:

Real User Monitoring (RUM) reveals how things really are — and it will humble you. Lab tests on developer machines don't capture the 90th percentile user on a median Android phone over a congested cellular connection.

What to Monitor

Metric What It Measures Target
LCP (Largest Contentful Paint) When the largest visible element renders ≤ 2.5 seconds
INP (Interaction to Next Paint) Responsiveness to user interactions ≤ 200ms
CLS (Cumulative Layout Shift) Visual stability during load ≤ 0.1
TTFB Server + network responsiveness ≤ 800ms
FCP First visible content ≤ 1.8 seconds
TTI When the page is reliably interactive ≤ 3.8 seconds

Beyond timing metrics, monitor your users' connection speed distribution, device capability distribution, and geographic distribution to understand how infrastructure and demographics affect your performance reality.

RAIL Is Not Binary (Revisited)

RUM data reveals a distribution, not a single number. The analytical framework should categorize user experiences into three buckets:

Testing Tools

Measurement closes the feedback loop. Without RUM, you are optimizing blind. The OpenTelemetry Overview covers how to build a telemetry pipeline that captures these metrics at scale and categorizes the user experience spectrum.

15. Interface Illusion and the Perception Stack

When Engineering Runs Out

After reduction, compression, caching, CDN deployment, and preloading have run their course, there is one more lever: interface illusion. Satisfying speed goals may involve code changes, delivery modifications, AND visual/design changes. Keep people busy and occupied is a pretty important skill IRL and everyone from waiters getting your drinks early or Disneyland keeping you engaged in a long line know this. The web is no different and has its own techniques such as:

The Perception Stack

Perceived speed and actual speed are different things. The user's experience is the final metric, not raw numbers. A page that loads in 3 seconds with a skeleton screen feels faster than a page that loads in 2.5 seconds with a blank white screen. Design changes that make waiting feel shorter are a legitimate and potent performance tool. To be fair, if the time has a cost associated with it in that more bytes = more money no amount of UI/UX stage craft will change that.

Actual Load Time (ms): 0 ---- 500 ---- 1000 ---- 1500 ---- 2000 ---- 2500 Approach A (WSOD): [ blank white screen ] [content!] User perceives: "slow, broken?" Approach B (skeleton): [skeleton] [partial] [ content fills in ] User perceives: "fast, responsive" Same total load time. Different user experience.

Performance and Security Tensions

Performance and security often pull in opposite directions:

Privacy Sidebar

Browser private mode isn't as private as users think. The browser itself forgets, but DNS caches, ISP logs, transit proxy records, and origin server logs do not. Entering incognito mode may itself be a detectable signal. Device fingerprinting (canvas fingerprinting, WebGL fingerprinting, font enumeration) goes beyond cookies. The technology isn't the problem — it's what is done with the data.

Browser caches can reveal browsing history to anyone with device access. Cache probing (timing how long a resource takes to load) can detect whether a user has visited a specific site. This is a concrete intersection of performance infrastructure (caching) and privacy.

The best performance work addresses both the actual speed and the perceived speed. Engineering handles the first; design handles the second. Neither alone is sufficient.

16. Summary

Concept Key Takeaway
Why It Matters Speed is not a feature — it is the baseline expectation. Users abandon slow sites. Mobile devices are underpowered. Developer perceptions lie.
Bandwidth vs. Latency Latency is the real enemy. More bandwidth doesn't help past a threshold. Scale ≠ Speed.
RAIL Model Response < 100ms, Animation < 10ms/frame, Idle < 50ms chunks, Load < 1s. ~90% of the problem is client-side.
Content Selection The fastest byte is the one you never send. Not all bytes are equal — JS costs far more than images.
Minification Code for maintenance, prepare for delivery. Automate minification in the build pipeline. Minify first, then compress.
Compression gzip/Brotli provide up to 70% savings on text. A server config change, not a code change. Complements minification.
Caching Zero-latency for cached resources. Use versioned filenames and long max-age. Don't cache base HTML aggressively.
CDNs & DNS Move content closer to users. CDNs solve global latency. DNS optimization and capacity planning prevent bottlenecks.
Loading Strategy Preload what's needed now, prefetch what's needed next, lazy-load what's below the fold. HTTP/2 changes the calculus.
Service Workers Make the network optional. Cache-first, network-first, and stale-while-revalidate cover most scenarios.
Monitoring RUM reveals reality. Core Web Vitals (LCP, INP, CLS) are the standard metrics. RAIL conformance is a spectrum, not binary.
Interface Illusion Perceived speed ≠ actual speed. Skeleton screens, optimistic UI, and progress indicators are legitimate tools.
"Send less data, less often, from nearby, when it is needed."

Performance is a progression: select content carefully, minimize what you send, compress it efficiently, cache to avoid resending, reduce latency with CDNs, preload and lazy-load strategically, and measure everything with real user data.

This page covers the engineering techniques for performance, but many other points were assumed to be understood. The Foundations Overview provides the architectural context (client-server model, UX vs. DX). The HTTP Overview covers the protocol mechanics (headers, content negotiation, caching). The Connections Overview covers HTTP/2, HTTP/3, and real-time techniques. The OpenTelemetry Overview covers the monitoring and observability framework.