"Send less data, less often, from nearby, when it is needed."
Performance is not a single technique — it is a progression of techniques, each building on the last. The golden rule unfolds into eight stages:
This page walks through each stage, from fundamental principles to implementation techniques.
The data is in and it is irrefutable and quite consistent across every study: speed deeply matters.
The specifics vary — Amazon's "100ms = 1% sales," Google's "500ms delay = 20% fewer searches" — but the direction is always the same. Speed is not a feature. Speed is the baseline expectation.
Perception of time passing is subjective and context-dependent. The ancient Greeks distinguished two concepts of time: chronos (quantitative, clock time) and kairos (qualitative, experiential time). The qualitative aspects can overwhelm human perceptions — a minute spent waiting for a page to load feels longer than a minute spent reading interesting content.
Tolerances for delay are not specific to computer interfaces. Sticky doors, slow restaurant service, boring lectures (you've never heard one of those!) — humans have always been impatient. What changes is the expectation floor. We went from waiting in bank lines on prescribed days and hours (now you know what "banker's hours means"), to drive-through pneumatic tubes, to ATMs at any hour, to swiping a card, to touching a phone to a terminal. Each leap collapsed the expectation to a new perceptual floor. Once users experience faster, they cannot un-experience it. Today a few seconds and people are exclaiming "Wow it's slow today!"
The user's total time includes unlocking their device, opening a browser or app, typing a query, DNS lookup, TCP handshake — all before your server sees a single byte. Your site is not the user's reason for being there; it is the mechanism for what they are trying to do.
Think about the user in three phases:
Be very mindful of the before phase. This is phase is where expected response time is set by what people experience doing the same things elsewhere. As an example, if Amazon can transact in a second or two, users bring that expectation to your site — fair or not. This is a form of the "99% rule": 99% of the time, people are elsewhere, and that shapes their perception of your 1%.
We usually remember the first time and the last time phase more; the middle is more of a montage. Consider that the arrival sets the tone for the entire performance perception. How users arrive matters: direct type-in (high intent), link follow (volatile referral vs. stable bookmark), search engine results (earned organic vs. paid placement). Each arrival path carries different expectations and tolerances. If the arrival doesn't go well you really don't have a chance. The ending of course sets the take away and that will be ultimately what the user remembers, if only you can get to that positive outcome!
Performance is about users and how they feel using our code — not about us and our code. More effort spent on user benefit often means less efficient development. This creates a fundamental economic tension: delivery is a variable cost (every user pays the byte tax), while development is a more fixed cost (write once, ship many times). Framework convenience saves developer hours but costs bytes in every download. Whether that trade-off is acceptable depends on your growth stage and user count. This is just my whole DX and UX tension showing itself once again
Mobile phones are not desktop computers with small screens. Even high-end phone performance is a magnitude slower than laptop performance. Benchmarks lie: they don't account for power constraints, true core utilization, or the gap between simulators and real devices. Developers use flagships; real users are on median Android devices.
"The web has moved to relatively underpowered mobile devices with connections that are often slow, flaky, or both." — Addy Osmani, Google
Bandwidth is the amount of data that can be sent in a given time period. The common analogy: data is water, bandwidth is the diameter of the pipe. A wider pipe carries more water per second — but it doesn't make a drop of water arrive sooner no matter how big the pipe is!
Latency is the time delay required for information to travel across a network. It is analogous to the length of the pipe plus any delays switching between pipes. Latency includes processing delays, queuing delays, and transmission delays at every hop.
Research consistently shows: upgrading bandwidth from 5–10 Mbps yields only about 5% load time improvement. But reducing round-trip time by 20ms yields linear load time improvement. The implication is clear:
"Latency is the real enemy."
"Scale ≠ Speed."
Adding servers doesn't make a site faster unless you are already overloaded. If you need more servers to handle traffic, that is a capacity problem, not a speed problem. Scale issues suggest you don't know your capacity and tolerance thresholds. Know your max concurrent connections, requests per second, and resource ceilings. Scale equals cost — so act accordingly.
| Term | Definition |
|---|---|
| Speed | Both perceptual (how fast it feels) and actual (measured milliseconds). Both matter. |
| Bandwidth | Data capacity per unit time. Measured in Mbps. Determines throughput, not responsiveness. |
| Latency | Network delay / travel time. Measured in ms. The dominant factor in page load performance. |
| TTFB | Time to First Byte. From request sent to the first byte of the response arriving at the client. |
| TTLB | Time to Last Byte. From request sent to the final byte of the response arriving. |
| FCP | First Contentful Paint. When the browser renders the first piece of DOM content (text, image, SVG). |
| TTI | Time to Interactive. When the page is visually rendered AND reliably responsive to user input. |
| LCP | Largest Contentful Paint. When the largest visible content element finishes rendering. A Core Web Vital. |
| CLS | Cumulative Layout Shift. Measures visual stability — how much content jumps around during load. A Core Web Vital. |
| INP | Interaction to Next Paint. Measures responsiveness to user interactions throughout the page lifecycle. A Core Web Vital (replaced FID). |
Humans perceive delays in roughly these ranges:
| Phase | Budget | What It Means |
|---|---|---|
| Response | < 100ms | Actions and input must produce a visual reaction within 100ms. Tap/click to visible feedback must feel instant. |
| Animation | ~10ms per frame | Target 60 fps = ~16.66ms per frame. Aim for ~10ms of work to leave room for browser rendering. Failure produces jank. |
| Idle | 0–50ms chunks | Background work should complete in ≤ 50ms chunks. You share the main thread with the UI; anything longer sets up jank for later interactions. |
| Load | < 1000ms | Get critical above-the-fold content on screen in under 1 second. Avoid the White Screen of Death (WSOD). |
Deliver rich experiences over a public network worldwide, to slow devices on unreliable connections, in under 1 second for first load, then run at 100ms response times often using services and dependencies we don't control. Maintaining high performance under this conditions is insanely difficult and that is why it should never be an afterthought.
"~90% of user-response time issues are client-side." — Steve Souders
Start with client-side optimizations for the best gains. They are simpler to implement, easier to measure, and affect the largest portion of the user's experience. Server-side improvements matter, but the biggest wins are in what happens after the response leaves the server. In some ways getting the server side wrong unless you have huge data scale reveals engineering maturity as opposed to actual difficult. Client-side truly is harder than server-side if you are honest with yourself.
You cannot hit RAIL numbers in a simple pass/fail fashion. Real User Monitoring (RUM) reveals a spectrum: some users meet goals comfortably, some tolerate near-misses, and some fail entirely. The analytical framework should categorize user experiences into meeting, tolerating, and failing to reach performance goals.
HTTP responses consist of a request line, headers, and payload. Most performance trouble lives in the payload. Payloads come in two flavors: text (HTML, CSS, JavaScript) and binary (images, video, audio, PDF, fonts). Each requires different optimization strategies.
The first question is always: do we really need this object? Bytes seem "freeish," so this thought often hasn't happened. This problem is exacerbated by the "localhost" effect: the developer's perception of performance on a fast local machine simply isn't the user's reality on a slow phone over a cellular connection.
Focus on large objects first — images are often the biggest and most unoptimized payload. But textual content matters too, because of how the browser's rendering pipeline works.
<meta> tags proclaiming your site proudly powered by something add no user-facing valueA persistent problem: using complete frameworks for a small subset of features. Importing all of Bootstrap CSS just to center a few elements. Including all of some JavaScript library just to use one utility function. The entire framework ships to every user, every time.
Solutions include custom bundle generation, tree-shaking (eliminating unreachable code at build time), and careful dependency auditing. Note the tension: a single framework <link> tag is high DX (easy for developers), while custom-building with care is lower DX but better UX (fewer bytes for users). In some sense, this is just a savings in creation that might be far exceeded cost wise in use! Economic folks might talk about this in the sense of fixed and variable costs!
<script>, <link>, and <img> should justify its existence.
Many "best practices" for performance are waived away because there is some assumption they will make coding harder. The complaints are often a bit suspect to the level of suggesting that one look at their assembly code and maintain it directly. Instead of employing such motivated thinking adopt a balanced philosophy - minification (the reduction of code size) is first application of that style of thinking.
"Code for maintenance, but prepare for delivery."
Minification should be automated — part of a build/publish pipeline, not a manual authoring effort. Your source code stays readable; your delivered code is optimized. Minification is NOT security; it is slight obfuscation easily reversed with pretty-printing.
The principle: minify first, then compress. They address different things and compound — minification removes structural redundancy, compression removes statistical redundancy.
You likely aren't going to do perform these minification techniques by hand, rather your tool will do that, but it is useful to understand the techniques in case you want to do them yourself or encourage your GenAI tool to do so.
| Technique | Example | Caveats |
|---|---|---|
| Whitespace removal | Collapse multiple spaces/newlines to single space | Preserve <pre>, <textarea>, , CSS white-space property, <script> tags without semicolons |
| Optional quote removal | <p id="foo"> → <p id=foo> |
Unsafe for non-ordinal values (class lists, JS code in attributes) |
| Optional close tag removal | </p>, </li> omitted |
Relies on content model inference; works until a parser changes |
| Comment removal | Strip <!-- ... --> |
Preserve IE conditionals, template comments, DOM-trickery comments |
| Boolean attribute shortening | <hr noshade="noshade"> → <hr noshade> |
Standard HTML allows this; safe |
| Self-closing tag cleanup | <br /> → <br> |
Not applicable if serving as XHTML. While XHTML is considered old, it is re-emerging somewhat because of the drift that LLMs introduce into generated markup as it can catch such issues at the syntax stage. |
| Entity remapping | ® → ® (or reverse, whichever is shorter) |
Encoding-dependent; ensure consistency |
| Color shortening | #FF0000 → red |
Only for named-color equivalents |
Dangerous territory: element elimination (<p></p> removal), letting browsers infer <html>, <body>, <thead>. These work until a parser or spec changes on you.Do note you will see some of these techniques used by the largest sites most notably Google who are aware of things like packet fragmentation at the TCP level. It is a hyperscale optimization.
In almost every web situation, the final product is HTML — it is the atoms of web content and producing it is our ultimate mission. HTML doesn't represent huge byte counts on its own, but as the root document that triggers all other fetches, any delay in its delivery adds small delays to everything downstream: CSS, JavaScript, images, fonts.
Valid markup may streamline the parse process (likely, though definitive data is scarce). Semantic markup over "div-itis" reduces bytes and improves document structure. Tools: W3C Validator, HTML Tidy, HTMLHint, html-minifier can help ensure quality markup, but you should want to do it anyway because quality markup improves accessibility (a11y) and helps with bots even. Doing your development work right shouldn't be a function of your perception of technical importance, a seasoned engineer knows by experience that in a complex system that even the smallest thing can have an outsized impact! Don't wait to discover this deep truth, by a negative experience
CSS file sizes have grown significantly over the years, as the Web Almanac has validated. CSS requests per page went from 1–3 to 6+ over a decade (at the 90th percentile: 9 to 18 requests). But the impact of CSS is not just about delivery size:
Like HTML you may not employ these techniques by hand, rather a tool will do them (recall dev for maintenance, prepare for delivery).
| Technique | Example |
|---|---|
| Unused CSS removal | Tools: PurifyCSS, PurgeCSS. Strip rules that match no elements in your HTML. |
| Rule shorthands | margin-left; margin-right; margin-top; margin-bottom → margin |
| Value recasting | #ff0000 → red, rgb(255,0,0) → red, bold → 700 |
| Unit elimination | 0px → 0 |
| Rule merging | h1 {color: red;} h2 {color: red; font-size: larger;} → h1, h2 {color: red;} h2 {font-size: larger;} |
| Empty rule elimination | Remove selectors with no declarations |
| Comment/whitespace removal | Strip all comments, collapse whitespace |
| Selector optimization | Remove unnecessary * selectors, reduce overly specific selectors |
Over-use of verbose class names — usa-button-secondary-inverse usa-button-active — adds up across a site. Automated solutions rewrite to short names (u-b-s-i), but watch for JavaScript code that references class names by string. It's quite clear to be know that class issues are quite significant and there simply is no need to approach web dev this way. We sadly are suffering with sub-optimal solutions because of poor web dev education and one person's ability to become a popular CSS influencer and push utility classes as the one true way. This is yet another example of the outsized effects of market forces over actual measured engineered solutions.
Extract critical above-the-fold CSS and inline it directly in the HTML <head>. This eliminates a render-blocking network request for the initial viewport. Load the full stylesheet asynchronously afterward. Tool: Addy Osmani's critical.
Inlining CSS can be blocked by Content Security Policy (CSP). CSP exists to prevent injection attacks by disallowing inline styles and scripts. This creates a genuine performance vs. security tension: inlining is a useful performance technique that directly conflicts with a useful security mechanism. Solutions include CSP nonces and hashes, but they add complexity. Yikes we have a three way trade-off going on here! Hopefully with this one you are seeing that judgement is key and letting code just get spit out or copied in without applying such judgement is likely to have very negative impacts once the trade-offs implied in the solution are felt.
Tooling: cssnano, clean-css, CSSO, PurgeCSS, crass. Linting: CSSLint, stylelint.
JavaScript must be downloaded, parsed, and compiled before it can execute. It is not just a download problem. More JS bytes is significantly worse than equivalent image bytes because of this triple cost.
Like the other minification techniques this is an overview of approaches taken by tools and you are not encouraged to pursue these things by hand.
| Technique | Example | Savings |
|---|---|---|
| Whitespace reduction | Remove newlines, indentation, extra spaces | Moderate. Watch for ASI (Automatic Semicolon Insertion) issues. |
| Variable name rewriting | var myLongVariableName → var x |
Significant. Repeated long names are common in unminified code. |
| Code optimizations | i = i + 1 → i++ |
Small per-instance, adds up. |
| Repetition rewrites | document.write() × 10 → d = document; d.write() × 10 |
Moderate. Reduces repeated long property chains. |
| Dead code elimination | Remove unreachable returns, unused functions | Variable. Can be very large in framework code. |
| Statement combining | var x; var y; var z; → var x,y,z; |
Small. Eliminates repeated keywords. |
| Conditional shortening | if/else → ternary ?: where appropriate |
Small. Watch for readability in source; minifier handles delivery. |
The primary worry: did you break the code? Risks come from the global nature of JavaScript, interactions between JS and HTML/CSS, and third-party scripts you don't control. Postel's Law applies: "Be conservative in what you do, liberal in what you accept" — be conservative in optimizations unless you control all included code.
Separate JS files are a developer value, not a delivery value. Bundle for delivery. Given JavaScript's shared global namespace, there is no delivery reason to keep files separate.
But a single monolithic bundle is also wrong. Code splitting by route (the Webpack pattern) sends only what the current page needs. The homepage gets homepage.js; the checkout page gets checkout.js; shared utilities load on demand.
Will JavaScript eventually become a high-level language with a compile target? WebAssembly offers near-native performance but is currently limited: it cannot manipulate the DOM directly (a limitation shared with Web Workers). This constrains its applicability for typical web application code, though it excels for compute-intensive tasks like image processing, encryption, and gaming.
Web fonts have a real cost. The first question: is the font worth it? Most users cannot tell the difference between SansSerif A and SansSerif B. Consider using system fonts if you can - they are free delivery wise!
If you must use a specialized font be reasonable about it. Do you need all the versions of it? Semi-bold, bold, light, extra-bold, regular?
Possible Solution: Use a variable font. Variable fonts are fonts which are "smart" and have variables you can tune. These font formats are rarely used, though they are broadly supported. A single file in such a format can represent many variations of the typeface.
Are you are using a Unicode font with thousands of glyphs for an English page, you are sending glyphs you will never use.
Possible Solution: subset fonts — generate a font file containing only the glyphs your content actually needs. A full Google Fonts file might be 150 KB; the Latin subset might be 15 KB. The unicode-range CSS descriptor lets the browser download only the subsets it needs.
Dependent objects (images, CSS backgrounds, JS files, fonts) that are not directly linked by users can have their paths shortened, made more cache-friendly, and obfuscated:
<!-- Before --> <img src="/images/UCSD_logo_big.png"> <link rel="stylesheet" href="/css/main-styles.css"> <!-- After --> <img src="/d/i/u0.png"> <link rel="stylesheet" href="/d/c/m0.css">
Use short 2-letter combinations (upper/lower case plus digits) — this gives thousands of unique paths. Content/entry-point URLs should NOT be rewritten (user-facing paths stay readable and bookmarkable). Only rewrite dependent resource paths.
Repeated site objects (nav elements, background images) propagate savings across every HTML and CSS file. A side benefit: anti-scraping/hotlinking, since these private-interface paths can be changed at will. For more robust hotlinking protection, use Referer header checking and CSP.
Compression is transparent to the application layer — negotiated via the Accept-Encoding request header and Content-Encoding response header. The browser says what it supports; the server picks the best match. Most servers and CDNs implement compression automatically.
| Format | Identifier | Notes |
|---|---|---|
| gzip | gzip |
Universal support. The safe default. |
| Brotli | br |
Better compression ratios than gzip. Increasingly supported. Requires HTTPS. |
| Deflate | deflate |
Older, less common. Largely superseded by gzip. |
| Dictionary compression | Emerging | A form of delta encoding that leverages the "sameness" of web pages across requests. |
Gzipping captures much of minification's gains on its own, but you should minify first, then compress. They are complementary: minification removes structural redundancy (comments, whitespace, long names), and compression removes statistical redundancy (repeated byte patterns). Beware of recompression from acceleration devices/services — double encoding can cause corruption.
Images are the biggest and most obvious byte savings opportunity. Compressing images feels like it should be an obvious build pipeline activity — and yet it is consistently neglected. Modern formats (WebP, AVIF) offer significant improvements over JPEG and PNG.
"Why do you keep sending me that logo?!" — Signed, your browser.
Caches store previously fetched objects for later reuse, avoiding network round trips entirely. HTTP cache headers govern the behavior: what gets cached, for how long, by whom, and when to revalidate.
This table gives an overview of caching headers used with HTTP, but sadly the topic is so complex there is a multi-hundred page book on the subject (Web Caching by Duane Wessels). A useful overview of caching is Mark Nottingham's Caching Tutorial for Web Authors and Webmasters. It's a classic!
| Directive | Meaning | Use Case |
|---|---|---|
public |
Any cache (browser, proxy, CDN) can store this | Static assets: images, CSS, JS, fonts |
private |
Only the user's browser may cache this | Personalized content, authenticated responses |
max-age=N |
Cache is fresh for N seconds | max-age=31536000 (1 year) for versioned assets |
no-cache |
Must revalidate with server before using cached copy | Content that changes but benefits from conditional requests |
no-store |
Do not cache at all | Sensitive data (banking, health records) |
immutable |
Content will never change; don't even revalidate | Content-hashed assets (app.3f2a1b.js) |
Expires and max-age headers. The browser doesn't ask the server at all until the cache expires.ETag and If-None-Match for conditional requests. The browser asks, "Has this changed?" The server responds with 304 Not Modified (no body) if unchanged.304 saves bandwidth but still incurs a network round-trip. Use max-age with versioned filenames to avoid even the round-trip.
The URL is the cache key. To force invalidation, change the URL:
app.js → app.3f2a1b.js (best practice — filename changes only when content changes)logo.gif → logo-v2.giflogo.jpg?ts=32424324 (works but less clean; some proxies don't cache query-string URLs)A single URL can represent multiple versions: compressed vs. uncompressed, different image formats (WebP vs. JPEG), different languages. The Vary header tells caches which request headers create distinct cached versions. Vary: Accept-Encoding is the most common. To remember this idea, consider a page /perf-tutorial. That URL could have an English and Spanish version, a compress and uncompressed, and so on. If the URL is the key to the object in the cache how do we know which one you want? We put the URL+The Vary header as the key in this case. You could of course solve this problem as /perf-tutorial-esand /perf-tutorial (for en), but now have other issues. Personally I think multi-format URLs make sense and is a sign of a very skilled web practioneer.
Caches exist at multiple levels: browser cache, proxy cache, CDN edge cache, and reverse proxy cache at the origin. Private vs. public caching is critical for sensitive data — private ensures only the user's browser caches the response, not shared proxies.
"Static dynamic" pages — like /pressrelease.php?id=5 — often return the same content for every user but rebuild on every visit. Solutions: self-generate into static HTML files, or place a reverse proxy cache (Varnish, Nginx) in front of the application server. What can't easily go in a proxy cache: truly dynamic content and personalized content.
Even with all optimizations — minified, compressed, cached — point-source web serving has inherent latency issues for distant users. A server in San Diego cannot serve a user in Mumbai in under 50ms no matter how fast the server is. The speed of light imposes a floor.
Solution: move content closer to users.
CDNs replicate content to edge servers distributed across the globe. When a user requests a resource, the CDN routes them to the nearest edge server. This improves reliability, scalability, and performance simultaneously.
CDNs use DNS-based or URL-rewriting-based redirection to route users to the optimal edge cache. They still have last-mile issues, and dynamic content at the edge is complex — edge caches may need to become intelligent edge compute servers. This also have a cost issue, this replication isn't free! Some cloud vendors include and some don't, be careful.
An alternative to CDNs: deploy multiple web farms worldwide with global load balancing. Redirect users based on server availability, network distance, geography, or a mix. Downside: increased data center and hardware costs — which is exactly why CDNs exist as a service.
DNS is both fragile and robust. Recommendations:
wwwamazon.com)w/, ww/, www/, wwww/, and the bare domainCommon bottlenecks: incoming bandwidth saturation, hardware/software limits, flash traffic events, too many concurrent connections, slow downloaders holding connections open, and long server-side processing times.
Solutions include DNS round-robin (simple but suboptimal), hardware load balancers (F5, Citrix), and traffic segmentation (images.example.com, store.example.com). Connection offload (TCP termination and SSL offloading at the load balancer) reduces origin server load. Static content bottlenecks are typically disk I/O — memory caching helps. Dynamic content bottlenecks are typically CPU — reverse proxy caching helps.
"Try to send the bytes when you actually need to send them, no earlier or later than when absolutely appropriate."
This is the intersection of performance engineering and user experience design. Send too early and you waste bandwidth on resources that may never be used. Send too late and the user waits.
Interestingly these techniques and the ones that follow are mostly done in HTML and JavaScript. It's very little work for potentially quite a lot of gain.
| Hint | Syntax | Purpose | When to Use |
|---|---|---|---|
| preload | <link rel="preload" href="font.woff2" as="font"> |
Fetch a resource needed for the current page but discovered late by the browser | Critical fonts, hero images, key scripts referenced deep in CSS |
| prefetch | <link rel="prefetch" href="/next-page.js"> |
Fetch a resource likely needed for the next navigation | Next-page assets, predicted user flow (e.g., search → results page) |
| preconnect | <link rel="preconnect" href="https://cdn.example.com"> |
Establish TCP + TLS connection to an origin early | Third-party origins you know you'll need (CDN, analytics, fonts) |
| dns-prefetch | <link rel="dns-prefetch" href="https://cdn.example.com"> |
Resolve DNS only (lighter than preconnect) | Domains you might need; low cost, broad browser support |
Predictive prefetch: Use real user analytics data to predict which pages users visit next (e.g., guess.js). If 80% of users on the product page go to checkout, prefetch checkout assets while they browse products.
Load below-the-fold content only when it is about to enter the viewport. Native lazy loading is now a standard:
<!-- Native lazy loading — no JavaScript needed --> <img src="photo.jpg" loading="lazy" alt="Below-the-fold photo"> <!-- Eager loading for above-the-fold hero image --> <img src="hero.jpg" loading="eager" alt="Hero image">
This pattern has been a native standard — for a long time and sadly I still see people using JavaScript today. It is a good idea to prototype performance improvements with JavaScript, but once browsers adopt it, you should go native.
Don't send a monolithic bundle. Split by route/page. The homepage gets homepage.css and homepage.js; load global resources in the background. Same principle applies to JavaScript bundles via Webpack, Rollup, or Vite code splitting.
Many older HTTP/1.1 specific optimizations — bundling, image sprites, domain sharding — become anti-patterns with HTTP/2. HTTP/2 multiplexes many streams over a single TCP connection, making many small files potentially more efficient than fewer large bundles. Just as modem-era patterns made the 2010s web worse, 2010s patterns can make the 2020s web worse.
After doing everything right — minifying, compressing, caching, using a CDN — you still can't meet RAIL goals predictably with the network in play. Network latency is variable, connections are unreliable, and mobile users routinely go through tunnels and dead zones. The network is the last uncontrollable variable.
Progressive Web Apps (PWAs) using Service Workers make the network a controlled and even optional component.
A Service Worker is a JavaScript proxy that sits between your web page and the network. It can intercept every network request the page makes and decide how to handle it: serve from cache, fetch from network, or combine both strategies.
| Strategy | How It Works | Best For |
|---|---|---|
| Cache-first | Serve from cache; if not cached, fetch from network and cache for next time | Static assets (CSS, JS, images, fonts) that change infrequently |
| Network-first | Try network; if network fails or is slow, fall back to cache | Dynamic content (API responses, user data) where freshness matters |
| Stale-while-revalidate | Serve stale cache immediately, fetch updated version in background for next time | Content where some staleness is acceptable (news feeds, dashboards) |
Performance optimization without measurement is guessing. You need to measure three things:
Real User Monitoring (RUM) reveals how things really are — and it will humble you. Lab tests on developer machines don't capture the 90th percentile user on a median Android phone over a congested cellular connection.
| Metric | What It Measures | Target |
|---|---|---|
| LCP (Largest Contentful Paint) | When the largest visible element renders | ≤ 2.5 seconds |
| INP (Interaction to Next Paint) | Responsiveness to user interactions | ≤ 200ms |
| CLS (Cumulative Layout Shift) | Visual stability during load | ≤ 0.1 |
| TTFB | Server + network responsiveness | ≤ 800ms |
| FCP | First visible content | ≤ 1.8 seconds |
| TTI | When the page is reliably interactive | ≤ 3.8 seconds |
Beyond timing metrics, monitor your users' connection speed distribution, device capability distribution, and geographic distribution to understand how infrastructure and demographics affect your performance reality.
RUM data reveals a distribution, not a single number. The analytical framework should categorize user experiences into three buckets:
After reduction, compression, caching, CDN deployment, and preloading have run their course, there is one more lever: interface illusion. Satisfying speed goals may involve code changes, delivery modifications, AND visual/design changes. Keep people busy and occupied is a pretty important skill IRL and everyone from waiters getting your drinks early or Disneyland keeping you engaged in a long line know this. The web is no different and has its own techniques such as:
Perceived speed and actual speed are different things. The user's experience is the final metric, not raw numbers. A page that loads in 3 seconds with a skeleton screen feels faster than a page that loads in 2.5 seconds with a blank white screen. Design changes that make waiting feel shorter are a legitimate and potent performance tool. To be fair, if the time has a cost associated with it in that more bytes = more money no amount of UI/UX stage craft will change that.
Performance and security often pull in opposite directions:
private and no-store directivesBrowser caches can reveal browsing history to anyone with device access. Cache probing (timing how long a resource takes to load) can detect whether a user has visited a specific site. This is a concrete intersection of performance infrastructure (caching) and privacy.
| Concept | Key Takeaway |
|---|---|
| Why It Matters | Speed is not a feature — it is the baseline expectation. Users abandon slow sites. Mobile devices are underpowered. Developer perceptions lie. |
| Bandwidth vs. Latency | Latency is the real enemy. More bandwidth doesn't help past a threshold. Scale ≠ Speed. |
| RAIL Model | Response < 100ms, Animation < 10ms/frame, Idle < 50ms chunks, Load < 1s. ~90% of the problem is client-side. |
| Content Selection | The fastest byte is the one you never send. Not all bytes are equal — JS costs far more than images. |
| Minification | Code for maintenance, prepare for delivery. Automate minification in the build pipeline. Minify first, then compress. |
| Compression | gzip/Brotli provide up to 70% savings on text. A server config change, not a code change. Complements minification. |
| Caching | Zero-latency for cached resources. Use versioned filenames and long max-age. Don't cache base HTML aggressively. |
| CDNs & DNS | Move content closer to users. CDNs solve global latency. DNS optimization and capacity planning prevent bottlenecks. |
| Loading Strategy | Preload what's needed now, prefetch what's needed next, lazy-load what's below the fold. HTTP/2 changes the calculus. |
| Service Workers | Make the network optional. Cache-first, network-first, and stale-while-revalidate cover most scenarios. |
| Monitoring | RUM reveals reality. Core Web Vitals (LCP, INP, CLS) are the standard metrics. RAIL conformance is a spectrum, not binary. |
| Interface Illusion | Perceived speed ≠ actual speed. Skeleton screens, optimistic UI, and progress indicators are legitimate tools. |
"Send less data, less often, from nearby, when it is needed."
Performance is a progression: select content carefully, minimize what you send, compress it efficiently, cache to avoid resending, reduce latency with CDNs, preload and lazy-load strategically, and measure everything with real user data.
This page covers the engineering techniques for performance, but many other points were assumed to be understood. The Foundations Overview provides the architectural context (client-server model, UX vs. DX). The HTTP Overview covers the protocol mechanics (headers, content negotiation, caching). The Connections Overview covers HTTP/2, HTTP/3, and real-time techniques. The OpenTelemetry Overview covers the monitoring and observability framework.