Responsive Images: What’s the Problem, and How Do We Fix It?

Introduction

Responsive images is a surprisingly complicated topic, and one that's been steadily gaining attention over the last year as more developers discover they need them and then discover there's no good solution yet. This article aims to give an overview of the problem itself, and show the different proposals in the works to address it.

So, grab a brew, buckle up, and set your brain to concentrate — there's only so much condensing of a year's effort and thought an author can do...

Responsive image context

First of all we need to establish some context: what is it we want to do, and why? Responsive images are a small part of the responsive design methodology, which aims to adapt a website so it works optimally within known environmental constraints. Those constraints include:

Display dimensions
Display quality (pixel density, colour capability)
Connectivity (network conditions)
Input types (touch, mouse, keyboard)

By adapting a design to these conditions we aim to provide the best clarity of content, ease of use, load times, and device performance for any given user. Practically this means a designer gives consideration to each constraint and in response alters the design by adjusting some aspect of it. Examples include:

Adjusting multi-column layouts to single columns to avoid columns that are too narrow
Changing font sizes to maintain legibility on different screens
Loading smaller assets for devices that do not require large assets

How this is done is managed differently in the three different languages of the web.

CSS controls the presentational aspects of web pages and is equipped with Media Queries to act as environment sensors. We can then write rules to adjust the visual design according to the results of each environment test.

JavaScript controls the behaviour of a page and has a number of methods available to detect the environment properties we're interested in adapting to. By reacting to the same conditions as the CSS we can adjust the behaviour at each breakpoint, for example turning long navigation menus into compact drop-downs and adjusting multi-slide carousels into single slide carousels.

HTML offers no mechanism to sense environmental conditions, and no method of adjusting to them. Traditionally this has not been a problem as a fundamental consideration of any web page is that the design should never change the core content of the page. Any visitor should obtain the same understanding of the content as any other visitor, regardless of the device they use.

The problem with responsive images

This leads us on to the problem with responsive images. With responsive designs we need to adjust some <img> resources to accommodate the design changes that differing environments call for. For example:

We can't expect that an image intended for a 27" display will be suitable for a 3.5" display — we'll need to provide a different image that has the same semantic meaning. A good example is a photograph of an author on a biography page; at large sizes we can use a photo of the author standing in a book store, surrounded by their books, and not lose the detail of who that author is. For a small screen, the author would become unrecognisable, so we'd want to switch to a drivers licence type head shot in order to keep the same meaning for the user.
If a small device is only 480px at its maximum dimension, there is a tremendous amount of wasted bandwidth, device memory, and processing involved in delivering the oversized 27" version and having the device scale the image down to fit on its display.

To adapt the <img>s in such cases, we need to make changes to the mark-up in as well as the CSS and JS, but there is currently no native way to do this. There are a number of hacks around the problem (I've made a popular one myself — Adaptive Images), but each potential solution has its share of drawbacks and no existing solution is right for all occasions. The community is well aware of this, and realised a perfect hackcould never be made with the tools that exist at the moment. So over six months ago members of the community began looking to the future and a way of introducing a native way of addressing the issue.

A designers methodology

Before we look at the potential solutions, we need to build up a picture of the problems we are dealing with. First of all, it is worth outlining a 'best practice' process for creating a responsive design, because it has an impact on our problems and how they might be solved:

We have all of the content for the page we are designing up-front, and we'll start at the smallest layout. This often calls for a pure typography, minimal graphic, one column design.
We increase the browser width until the previous design begins to break (line lengths too long, etc). This marks a new breakpoint, and a new round of design adjustments.
We adjust the layout, font sizes, CSS background images, JavaScript behaviour, etc. to optimise the design at this new breakpoint size.
We go back to step 2 and continue repeating steps until we reach our maximum width. It's important to note that the breakpoints are design breakpoints for the page overall. If individual parts of content need adapting that's done with 'minor breakpoints' within the major breakpoint.

The limits of our tools

Now let's look in more detail at the limitation of the current toolset we have to build web sites.

You will notice the methodology above is all about dealing with browser size. We can do that, but we are somewhat stuck on the bandwidth question. This is a serious issue, but we have to infer information about connectivity; we have no capability to measure it directly. Instead, we assume a small screen means low bandwidth, and a big screen means high bandwidth. It's a poor assumption, but all we have to work with, and it works in the majority of cases (for now).

CSS is not likely to get bandwidth media queries as there is no practical way for CSS to do what we need in this regard — they simply can't work as we'd like. JavaScript is getting a Network API but the current implementations are effectively useless.

And HTML of course has nothing at all built in to adapt to anything.

Our tools are not only inadequate in terms of features, but they are not built to work with our methodology either; the content adaptations (CSS, JS) are dependent on an identical result of the same test — the width of the browser. And it is these design breakpoints that dictate when the CSS/JS/<img>s should change. But our technologies offer no central sensor mechanism that each of our languages can inherit from: there is no way to set breakpoints once and have disparate languages react to them. Instead we have to write specific tests in each language. We test for the same thing multiple times in CSS, and again in JS. That's not optimal in terms of performance, or for authors to work with and manage.

Web content in the real world

There are other considerations to take into account before we consider what an ideal solution to responsive images looks like, namely how a website is used once it's built.

It's easy to forget that the vast bulk of web content is not made by web designers building pages and populating content. It's done by website owners, and that means people who are not web experts, working inside of some CMS, and adding content to pages which are actually instances of a template that web designers have built. The huge majority of web pages are dumbin that no-one with expertise in HTML/CSS/JS laid a finger on them directly. That means any solution we come up with must be simple to automate, because most people using it will never actually code directly.

Additionally, real world websites have legacycontent, stuff that was there from the previous design. That content is simply re-imported into new templates that inherit the new designs. This has knock-on effects for us thinking about responsive images — how can we make responsive images that work with legacy content? How can we make sure that any mark-up we write now does not hinder us when it has become legacy content? We have a duty to come up with a solution that is future friendly and not just for the here-and-now.

Process improvements

For responsive design in general it would be ideal if we could perform our environment tests in one place and re-use the results throughout our languages - rather than declaring the tests repeatedly whenever we need to create an element that adapts. This would benefit us in a couple of ways:

It is programmatically more efficient to reference a variable value than to perform multiple tests.
It'd be much easier to adjust a design or add a new breakpoint if it only needed to be done in one place.

Right now, this is not possible, although there is a clever hack to achieve something similar for CSS and JS.

It would also be ideal if we could avoid putting any presentationally based properties or tests directly into HTML markup. The fact is, when a redesign happens the markup will still be there, and any presentational aspects will no longer apply to the design correctly, necessitating editing the mark-up. But not polluting our HTML with design properties is an old and important idea we do not want to throw away.

Pre-fetch; a spanner in the works

Now we come to the main pain point of responsive images! All of this would be trivial to fix reliably (although still hack-ish) with JavaScript, if only it wasn't for pre-fetch. Pre-fetch is a relatively new behaviour that browsers employ to attempt to load a page faster. Before the HTML has finished loading, a look-ahead-pre-parser scans for any <img> elements, and as soon as it finds one it immediately asks for the resource it contains. This happens before anything else on the page can do anything, including JavaScript. It didn't used to be this way, and indeed there was a solution in use by Filament Group that worked — right up until point where the browser vendors turned to pre-fetch. And then it didn't work at all. The ever brilliant Jason Grigsby has a long write-up about why pre-fetch is the real problem and it's well worth reading in it's entirety, but the take-away is this:

It seems implausible that there can ever be a solution to adaptive images that can work alongside pre-fetch unless vendors make pre-fetch smarter — which would make it slower, which negates the advantage of pre-fetch. The two technologies are both trying to do the same job (ensure a page loads as fast as possible) but they are seemingly mutually exclusive.

Finding a reliable standards solution

So given everything above, what are the options for a standards based solution?

Defining content, not context

In the case of <img> specifically, the ideal process would be to let the browser itself select an appropriate source file to download based on the current connection speed, device DPI, and size of the area into which an image must fit. This is preferable to relying on the author writing appropriate tests and defining appropriate srcs to match.

If we could tell the browser here is a list of assets applicable for this image, and here are their properties, listing file-size and dimensions, the browser would then be able to select which image is best. For example, if the browser is in a position to know the image is inside a container which is 600px wide, the connection is 5mbps and that the device has a 300dpi screen, it could then use its own heuristics and user-preferences to decide which available version of the image it should pull from the server. The benefits of this are many:

It's simpler than having to author specific tests for all combinations of environment a user may experience (narrow screen but high dpi, narrow screen but low dpi, narrow screen but high bandwidth, narrow screen but low bandwidth, wider screen but... etc).
There are no designed breakpoints to manage.
Because it's not based on any design breakpoint it's future-friendly; it'll always work with any redesigns.
User preferences can be taken into account (e.g., always load low-res images).

Unfortunately, this is not likely to happen without considerable effort, if at all:

First, our technology does not work that way. In order for that approach to work the markup needs to know what the layout is, for example how big is the space in which the image is sitting? But the browser can't work that out until the CSS has been applied, and that CSS may well rely on the image itself to force the width of the container. It's a chicken and egg situation at that point.

Second, even assuming the container has an explicit size applied via CSS, that CSS is normally in an external file, loaded after the <img> has been found by the browser. That leaves the browser sitting around with an <img> tag that it can't fire a request for because it must:

Wait for the CSS to be applied to the page so that it can...
...find out how big the space is that the image will fit into, so it can...
...figure out which of the available resources it should ask for.

That's a lot of waiting around, and browser vendors aren't keen on anything that causes waiting — to the extent that Google have made movements towards replacing HTTP itself to help prevent such things (see SPDY).

Third, browsers do not currently detect bandwidth and it's looking like meaningful and useful bandwidth detection is some way off, if it ever comes (we all hope it does).

Let's now have a look at the proposed solutions we've had so far.

`<picture>`

Putting aside the politics of the situation (other articles can be read for information on that debacle), one of the earliest proposed solutions was a new element: the <picture> element. A new element was thought to be required because it had been understood that altering <img> itself was off the cards.

After months of debate, research, community engagement, and looking into other possibilities, the Responsive Images Community Group decided <picture> was the most suitable approach and presented it to the WHATWG. <picture> follows the syntax of the existing HTML <video> and <audio> elements, and uses media queries to handle the detection and assignment of which resource to load. Not only is the mark-up familiar, so is the control mechanism; it's just standard CSS. This makes <picture> instantly easy to understand and work with from a web designers perspective.

It also means that <picture> can adapt to all the same things the design itself does with the media queries in the CSS files. <picture> is designed to be backward compatible, loading the default <img> in non-supporting browsers. An example of <picture> is as follows:

<picture alt="a picture of something">
<!-- Matches by default: -->
<source src="mobile.jpg" />
<source src="medium.jpg" media="min-width: 600px" />
<source src="fullsize.jpg" media="min-width: 900px" />
<img src="mobile.jpg" /><!-- fallback for non-supporting browsers -->
</picture>

Problems with `<picture>`

There are considerable downsides to this solution:

All that code for a single instance of an image is rather verbose.
It's verbose multiple times because for every image you'd repeat the same design breakpoint tests.
Because <picture>'s sensors are based on CSS, and because CSS can't support bandwidth detection, <picture> can not adapt to bandwidth.
It bakes the sensor tests into the mark-up, that's poor for performance, tedious to manage, and is future unfriendly — it will cause problems come any redesigns that don't share the same breakpoints as the current design.

`srcset`

srcset is a proposed property that can be added to an <img/> element, specifying alternate src resources that may be applicable. srcset had a bad reception when it was first suggested, due to some unclear communication and unclear specification. It could actually do many of the things <picture> does, but in a much more concise syntax — mainly because it was developed by people who were still thinking altering <img> itself was possible. The suggestion for srcset is as follows:

<img alt="image description" src="/path/to/fallbackimage.png" srcset="/
path/to/image.png 800w, /path/to/otherimage.png 600w">

Here the src would be used by browsers that don't support srcset, and srcset itself is defining two additional image resources that can be applied, along with rules for when each should be applied.

The main confusion here is that srcset brings together a few things into one property, and those srcset conditions (e.g., 800w) refer to the viewport width - not the width of the image resources. That's counter intuitive; attributes usually define properties of the element or linked resource; srcset's does neither. <picture> has a similar issue, but because it uses media queries we're already familiar with the fact min-width means of the viewport and wouldn't make sense as a property of the image itself.

The benefit here is mainly that it's much shorter code to do a similar job to <picture>. It's also possible that it could get around the image-prefetch issue — but only if browser vendors re-engineered their pre-parser with srcset in mind.

Problems with `srcset`

The drawbacks of srcset are:

It's hard to understand as it uses unfamiliar syntax.
The solution is stuck with pixels for measuring viewport width, unless further refinement is made — and we don't often use pixel dimensions in responsive designs. This may be impossible to solve; how can the HTML know the size of an EM or % when the CSS has not loaded yet?
It's not clear that the syntax refers to the min-width or max-width.
It doesn't offer all the same sensors as a media query does.
It still bakes the sensing into the mark-up as does <picture>, with all the same problems because of it.

A hybrid of `<picture>` and `srcset`

Later, Opera's rep on the CSS Working Group — Florian Rivoal — suggested a hybrid of <picture> and srcset:

<picture>
  <source media="(orientation:landscape)" srcset="long.jpg 1x, long2.jpg
2x">
  <source media="(orientation:portrait)" srcset="tall.jpg 1x, tall2.jpg 2x">
  <img src="fallback.jpg" />
</picture>

This blend addresses a number of subtle issues with the other two approaches. The srcset element here is restricted to telling the browser about images available for devices at differing pixel densities. This allows the browser to be smart about which it should download after it's matched a media query. For example, a browser could not only get the raw information about pixel density, but it could infer a rough guess to file-sizes. Should future innovation in browsers make bandwidth measures possible then the browser can figure out which is more appropriate to load.

Disadvantages of the hybrid solution

This proposal still has some disadvantages:

It's verbose.
It has two attributes that could easily be confused as doing the same job.
It bakes design properties into the mark-up.

`<meta>` variables

To address the problem of <picture>'s repetition and verbosity, and to enable centralisation of breakpoint management between all languages, the idea of <meta> variables was put forward. Later a revised version along similar lines emerged when Denis LeBlanc struck on a smarter implementation of the same meta concept. This looks like so:

<head>
  <meta name='case' data='breakpoint1' media='min-width:350px' />
  <meta name='case' data='breakpoint2' media='min-width:1000px' />
</head>
<body>
  <img src='/content/images/{case}/photo.jpg' alt='' />
</body>

This has a number of advantages:

We have a single <img> element with no custom properties, which will adapt to any number of breakpoints.
The mark-up does not include design properties, making it much more future-friendly.
Depending on the syntax used to reference the meta variable inside <img> the solution could be backward compatible.
Because <head> is loaded prior to any other HTML the pre-parser problem can be fixed, <meta> variables will have been loaded before any <img> is seen.
Because <meta> variables can be loaded before any external resource is requested, it's possible to have CSS and JS inspect them reliably — therefore meta variables offer a way to centralise breakpoints.

Meta disadvantages

The <meta> tag solution also has some disadvantages:

It requires access to the <head>, meaning this technique is only useful for site-wide or template-specific breakpoints, not special-case individual images.
It would require significant work for browser vendors, as URI resolution would now have to include looking up variables too.
It restricts image storage to pre-configured paths.

A new image format

With the mark-up approach proving to be so hard, another suggestion was put forward: a new image format with built-in mechanisms to deal with the issue. A good candidate for this is our old friend JPG, in it's progressive incarnation, though it would need some editing on the browser side of things to work as intended. Any solution involving a new format would have to go through the following steps:

The author will compress the progressive JPEG with multiple scans.
The browser would download an initial buffer of each image (10-20K), using the Range request header.
This initial buffer will contain the image's dimensions and (optionally) a scan infoJPEG comment that will state the byte breakpoints of each one of the JPEG scans (slightly similar to the MP4 video format meta data).
If the image is not a progressive JPEG, the browser will download the rest of the image's byte range.
When the scan info comment is present, the browser will download only the byte range that it actually needs, as soon as it knows the image's presentation size.
When the scan info comment is not present, the browser can rely on dimension based heuristics and the Content-Length header to try and guess how many bytes it needs to really download.

(Solution steps courtesy of Yoav Weiss).

The big advantage of this approach is that it side-steps any need to write custom mark-up, which removes a lot of problems with the previously mentioned solutions.

Image format disadvantages

The disadvantages are substantial:

Getting new file-format or enhanced format support is traditionally arduous and slow (we waited years to be able to use PNGs with 8-bit alpha).
Those images with their custom byte breakpoints will have to be created somehow — either via some CMS system or by hand. That seems non-trivial to do.

New headers

Another potential solution is to take all the management of images to the server, standardising the Adaptive Images method (or similar) by allowing browsers to send new headers to the server. The problem with any current server-side solution that does not use user agent sniffing (which we don't want to do) is reliance on cookies to change the resource loaded via a single URI. This has a knock-on effect of meaning the technique cannot work with proxies or CDNs (neither of which like cookies). But the advantages of server-side content negotiation are numerous:

It requires little to no effort to support from a CMS user or author.
It is future friendly.
It is backward compatible.
Depending on specific implementation it can generate its own down-sampled images, and/or address the different image at different sizesproblem.

HTTP is not a great protocol for this kind of thing due to the increased latency additional headers demand, but SPDY (and/or HTTP2) is likely to deal with it a lot more efficiently. SPDY can GZIP headers and a number of other clever tricks that reduce the number of requests.

A process for this to happen might go something like:

Browser asks for spdy://website.com
Server responds with content and adds a I request your bandwidth & device screen size header
Browser then appends these headers to all future requests on the domain (perhaps qualified via file-type).
Server can push any amended content from point 2 over SPDY without another request.
Server processing then handles which image to send from any URI request in a similar manner to Adaptive Images.

There are potential gotchas about this when thinking about proxies and content delivery networks. However should headers such as this be standardised it's possible proxies and CDNs could be built to deal with them - some CDNs are already smarter than dumb file-caches. And even if not, the majority of sites on the web aren't behemoths running on CDNs; they're on shared servers — and they'd benefit hugely from this technique.

Conclusion

As yet, there is no single perfect method that can answer all of the requirements of adapting a simple <img> to simple constraints like bandwidth or device size. There are no hacks that do a perfect job, and there is no clear way of standardising any such method that does not have its own problems. It's possible that a single grand solution will prove impossible, and we'll have to use a few techniques to achieve the things we want. It's possible that the only true solution will be patience while the constrained bandwidth and poor device capabilities problems slowly fade away — but while that may only be a few years away in privileged countries, the majority of the world will take a lot longer.

If you have any ideas, your feedback and effort would be greatly appreciated either at the WHAT-WG mailing list, or the Responsive Images Community Group.