I’ve now spent the better part of four months looking at HTTP/2 (or H2 as all the cool kids are calling it) and in particular Server Push.

Server Push is super exciting to me, paired with multiplexing — the ability to perform multiple HTTP transactions on a single TCP connection — I believe it has the potential to change how we deliver the web by allowing the server to push content to the client (specifically, into the client cache) proactively.

How Server Push Works

To explain Server Push, you must first understand HTTP/2.

HTTP/2 requests are sent via streams — a stream is the unique channel through which a request/response happens within the TCP connection. Each stream has a weight (which determines what proportion of resources are used to handle them) and each can be dependent on another stream.

Streams are made up of frames. Each frame has a type, length, any flags, and a stream identifier, to explicitly associate it with a stream.

The ability to interleave these frames, and the fact they can belong to any stream, is the basis of multiplexing.

One of those frame types is the SETTINGS frame, which is how the client can control whether or not to use Server Push.

Server Push can be enabled (default), or disabled by sending the SETTINGS_ENABLE_PUSH (or 0x02) flag.

When it is disabled, the server should not send any pushes. When it is enabled, a push is started by sending another type of frame, known as a PUSH_PROMISE.

The purpose of this frame is to inform the client that the server wants to push a resource, and to give the client the option to reject the pushed stream (by sending back a RST_STREAM frame). Each pushed resource is then sent in it’s own stream to the client and should be stored in the client cache — it will then be retrieved from the cache when it is requested by the client rather than fetched from the server.

HTTP/2 Visualization

Pushed resources must be cacheable as it is required to still be fresh when the actual request occurs. This means they should be the result of idempotent GET requests.

That’s Cool and All, but it’s Not Revolutionary…

As I said, I think Server Push and Multiplexing can change the web.

In the near term, we can start to simplify our web setups; multiplexing obsoletes domain sharding (in fact, sharding can be a detrimental practice, though not always), as well as a number of frontend strategies for performance tuning, such as inlining of resources for above-the-fold rendering, image sprites, and CSS/JS concatenation and minification.

Thinking longer term, we will start to see new strategies emerge, such as pushing the above-the-fold JS/CSS as separate resources with high priority along with the requested HTML, followed by the rest of the CSS/JS with a lower priority.

Or making webfonts dependant on the CSS file in which they are being used.

But The Web Isn’t Just Websites…

Another casualty of HTTP/1.1 is APIs. APIs often have to make the choice between combining sub-resources into the parent resource (sending more data than necessary if they are not be wanted, slowing down the response), or making many more requests for those sub-resources.

With Server Push and Multiplexing, the server can push those sub-resources along with the request for the parent resource, and the client can then choose to reject them if it doesn’t want them.

Alright, but what do you mean we’re doing it wrong?

Currently, the most popular way to do server push is for your application to send Link: /resource; rel=preload headers, which will inform the http server to push the resource in question. However, this format is defined by the new W3C Preload Specification (Working Draft), which is not intended for this purpose (although there is some disagreement).

The purpose of the preload Link is for a browser and:

provides a declarative fetch primitive that initiates an early fetch and separates fetching from resource execution.

It is related to the (so-called) Prebrowsing features — which allow you to instruct the browser to do a number of things to improve the performance of sub-resources (everything from pre-emptively doing a DNS lookup, opening a TCP socket, to fetching and completely pre-rendering the page in the background).

A Proposal

I like the solution of using headers to initiate pushes. This makes it something that can easily be done in non-async/parallel/threaded languages (e.g. PHP or Ruby) — with zero language changes necessary — and pushes the responsibility up to the HTTP layer.

Unfortunately, you run into a potential issue of being unable to distinguish between preloading and server push; and you may wish to use both for different assets — for example, you might want to use prefetching for your stylesheet, which when retrieved could have it’s fonts and images pushed. Furthermore, using preload for pushes could introduce a race condition between the a push and a preload for the same resource.

We don’t want to clobber the Preload specification, so: why not just change it to Link: /resource; rel=push.

By doing this we add enough granularity to distinguish between the two, and avoid a potential race condition. The header would be stripped by the httpd when it handles the push. If the client does not accept pushes (which the server knows thanks to the SETTINGS frame) the header should be passed through as-is (or can be changed to rel=preload) and the browser can then handle it as a preload instead.

If neither preload or push is supported then the asset is requested using the traditional methods (e.g. link, script, and img tags, or url() within stylesheets) this allows for a simple, robust, progressive enhancement mechanism for the loading of assets.

I’d like to thank my colleagues Mark Nottingham and Colin Bendell for their feedback on early revisions of this post.