Running Your Web Applications Offline With HTML5 AppCache

Introduction

Web applications have become a major part of many people’s lives, so much so that many of us use them all the time. Wouldn’t it be great if we could use them even when offline? Until recently, there wasn’t any viable way to do this – however, with the introduction of the W3C HTML5 application cache feature, it is possible to make your web applications run offline as well as online. Let us see how…

Why have your app run offline?

Web applications are becoming more complex and capable every day. There are many examples of web applications doing the same job as desktop applications in various fields (think about Google Docs, Picasa, etc.). However, one major disadvantage is that they cannot work when the user is not connected to the Internet.

This is where HTML5's new offline storage comes in. It tries to remove that disadvantage by defining a way to store files in a cache, so that when the user is offline, the browser still has access to the necessary files. These can be HTML, CSS or JavaScript files, or any other assets the site needs to run.

Saving your files for offline use with the application cache

HTML5 has a feature for offline web applications called application cache, or AppCache for short. Files stored in this AppCache are available to the application even when the user is offline. You can specify which files you want to store in the AppCache using a manifest file.

How is the application cache different from the normal browser cache?

There a number of ways in which AppCache is different from the browser's normal cache. First of all, is the intention behind the two. AppCache is intended for proper web applications whereas the browser's cache is for normal web pages in general. The normal cache will cache pretty much any page whereas the AppCache will only cache pages which are specifically included in the manifest file. Plus, the normal cache is unreliable, as we dont know which pages (and which resources within those pages) will be available for sure.

AppCache is exciting because now the developer has much more programmatic control over the cache, which means much more certainty and control over how web applications could behave offline. Also to note, is that you can have multiple pages share the same AppCache. Also, with AppCache, you can make use of the API to determine what is the state of the AppCache, and then even have it update itself.

The manifest file

This file resides on the server and dictates which files should be stored client-side in the browser's AppCache, in readiness for the user going offline. Let's delve a bit deeper in into how it works.

You can give the manifest file any name you want, but best practice dictates that you give it an extension of .manifest. Every manifest file has to start with CACHE MANIFEST, after which you list the files you want to be stored and made available for offline use. Comments can be made by putting # at the beginning of a line. A very simple manifest file looks like so:

CACHE MANIFEST
#You can also use the CACHE: section header to explicitly declare the following three files.

style.css
script.js
index.htm

The manifest file has to have the correct MIME Type, which is text/cache-manifest. To deal with these, you could have an extension .manifest for the manifest file, and add the following in the .htaccess file on your server:

AddType text/cache-manifest .manifest

Linking the HTML file with the manifest file

Now that you have created the manifest file telling which files need to be cached in your application, you need tell the HTML page to use that cache. To do this you have to link the page to the manifest file by including the manifest attribute in the <HTML> tag of your page. For example:

<html manifest=”demo.manifest”>

If your web application has more than one page, make sure that all the pages link to the manifest file in this way, otherwise they won't be part of the AppCache, and won't work offline.

Using section headers for better control over the AppCache

So far, we've seen a very basic example of how to use the manifest file. With the use of section headers, we can actually specify exactly how a certain file is to be cached, or not.

Explicitly defining the files to be cached

You can use the CACHE: section header to explicitly declare which files need to be cached. For example, the previous example of the manifest file can be written like so, and it will function in exactly the same way:

CACHE MANIFEST

CACHE:
style.css
script.js
index.htm

The only difference is that in this example we have explicitly declared that all those files will be part of the application cache. Here is a very simple example page that uses the CACHE: section header.

It is important to note that the path mentioned for the files should be relative to the location of the manifest file. In the examples here, we are assuming that the files mentioned are in the same directory as the manifest file. You can use relative as well as absolute URLs when stating the files in the manifest file.

The files specified as part of the CACHE: will load from AppCache (not the server) even if you are online, provided that there is no change in the manifest file. If, however, the browser finds an updated manifest file, then the new cache will once again be downloaded according to the what new manifest file says. So AppCache may not be suitable for sites with fast moving content like news blogs, for example, but can be very useful for web applications which do a particular thing and want to work offline (for example, a calendar app, or a to-do list, etc.).

What if I want a file to bypass the cache and load directly from the server?

If a page is associated with a manifest file, then only those files mentioned in the manifest file will try to load regardless of whether the user is online or offline. Now, there may be situations where you might want some file to bypass this cache when the user is online, so that it connects and downloads fresh from the server instead of the cache (For example, some dynamic content from a CGI Script).

Basically, if a page is associated with a manifest file, then all network traffic for its files is blocked, and files either have to be loaded from the AppCache, or fail to load. The NETWORK: section header gives exceptions for this rule. You can use the NETWORK: section header to declare which files should NOT to be cached, so that they are downloaded from the server, and never be part of the application cache. The NETWORK: section header respects the browser's normal cache header. So if a file is supposed to be cached by the browser's normal cache, then it will still be cached by it (just like any other file not specified in the AppCache), even if it's specified under the NETWORK: section header.

CACHE MANIFEST

CACHE:
style.css
script.js
index.htm

NETWORK:
style2.css

In the above example, style2.css will always be downloaded from the server and never be part of the application cache. Keep in mind that it may be the case that you have too many files to list which need to bypass the cache, making writing them all down under the NETWORK: section header cumbersome. In this case you can use the asterisk character (*), so that all urls are allowed to go online if you are online.

Check out my example that uses a manifest file employing a NETWORK: section header. You will notice what when you are offline and re-load the page, the page does reload, but the background styling disappears. This is because the background styling in this example is in the file style2.css, which is under the NETWORK: section header, meaning that it is not cached and will only load when you are online and re-load the page.

Providing fallback content

The FALLBACK: section header is used to define fallbacks to be used in place of files that fail to load (or load incompletely):

CACHE MANIFEST

CACHE:
style.css
script.js
index.htm

NETWORK:
style2.css

FALLBACK:
main_image.jpg backup_image.jpg

The fallback content is supposed to be cached and only used in case the main content does not load. In the above example, backup_image.jpg is cached by AppCache so if main_image.jpg cannot load, backup_image.jpg will load in its place. Check out my manifest backup example — if you go to this page, and disconnect from the internet and then re-load the page, the browser will try to load the image, but since you're not online (and the image is not cached) it will not load, and hence the fallback content will load in its place instead. (The browser will first take a little time trying to load the main content, and only then load the fallback content...so be patient!)

This manifest file is utilized in my example that provides fallback content for a number of images.

Using the application cache API and events to make sure the latest files are used in your cache

One of the great things about application cache is that now you, the programmer, have access to how the cache could behave. You have access to events which can tell you about the current state of the application cache, and have functions to asynchronously update it too. For example, you can use window.applicationCache to find out if the browser supports application cache or not. Let's take a look at some other ways in which you can gain programmatic control over the application cache.

Statuses

You can check the current status of the application cache using window.applicationCache.status, which returns a numeric value mapped to the following states:

0 - uncached
If the page is not linked to the application cache. Also, the very first time the application cache is being downloaded, then during the time it is being downloaded, the AppCache will have a status of uncached.
1 - idle
When the browser has the latest version of the AppCache, and there aren no updated versions to download, then the status is set to Idle.
2 - checking
The duration of when the page is checking for an updated manifest file, then the status is set to Checking.
3 - downloading
The duration of when the page is actually downloading a new cache (if an updated manifest file has been detected), the status is set to downloading
4 - updateready
Once the browser finishes downloading that new cache, it is ready to be used (but still not being used yet). During this time, the status is set as updateready
5 - obsolete
In case the manifest file cannot be found, then the status is set to obsolete and the application cache gets deleted. It is important to know, that in case the manifest file (or any file mentioned in the manifest file except those which have a fallback) fail to load, then it will be counted as an error and the old application cache will continue to be used.

Events

Certain events also get fired, depending on what going on with the AppCache at the moment.

checking
This gets fired when browser is checking for attempting to download the manifest for the first time, or is checking if there is an updated version of the manifest file.
noupdate
If there is no updated version of the manifest file on the server, then noupdate is fired.
downloading
If the browser is downloading the cache for the first time, or if is downloading an updated cache, then this is fired.
progress
This is fired for each and every file which is downloaded as part of the AppCache.
cached
This is fired when all the resources have finished downloading, and application is cached.
updateready
Once resources have finished re-downloading for an updated cached file, then updateready is called. Once this happens, then we can use swapCache() (as explained later in the article) to make the browser to use this newly updated cache.
obsolete
This is fired if the manifest file cannot be found (404 error or 410 error).
error
This can be fired for a number of reasons. If the manifest file can't be found, then the application cache download process has to be aborted, in which case this event can be fired. It can also be fired in case the manifest file is present, but any of the files mentioned in the manifest file can't be downloaded properly. It can even be fired in case the manifest file changes while the update is being run (in which case the browser will wait a while before trying again), or in any other case where there is a fatal error.

The event handlers for these events are all prefixed by 'on'. For example, onchecking, onupdateready, onerror, etc.

The application cache API has a few things worth noting:

  • window.applicationCache.update(): This will trigger the application cache download process, which is nearly the same as reloading the page. It simply checks if the manifest has changed, and if so downloads a fresh version of all the content in the cache (respecting any cache headers). Note that even though a new cache is created with this, the page will continue to use the old cache. To make the page use the new cache you have just downloaded, you must use the swapCache() function.

  • window.applicationCache.swapCache(): This function tells the browser to start using the new cache data if it is available. It is important to note that even if there is a new manifest file, the application will still continue using the old cache (as specified in the old manifest file) until swapCache() is called. Once swapCache() is called, then the cache will be used as specified from the new manifest file.

Normally you won’t need to use the update() function, as the browser should automatically do this when reloading a page. Most commonly the swapCache() function will be used in conjunction with the onupdateready event.

In the following example, if you change the manifest file and reload the page, the browser will download the new files in the cache, and then switch to the new cache (as the swapcache() function is called):


<html manifest="demo.manifest">
<head>
<script type="text/javascript">
window.applicationCache.addEventListener('updateready', function(){
		window.applicationCache.swapCache();
}, false);
</script>
</head>
<body>
...
</body>
</html>

If the page you build is unlikely to be reloaded by the user for a while, then you could periodically call the update() function to check for new updates to the manifest file, and if so, call the swapcache() function on an updateready event to download and switch to the new cache:

setInterval(function () { window.applicationCache.update(); }, 3600000); // Check for an updated manifest file every 60 minutes. If it's updated, download a new cache as defined by the new manifest file.

window.applicationCache.addEventListener('updateready', function(){ // when an updated cache is downloaded and ready to be used
		window.applicationCache.swapCache(); //swap to the newest version of the cache
}, false);

This code will check for an updated version of the manifest file every 60 minutes. If it finds a different version of the manifest file on the server than it previously encountered, it will download a new cache based on this new manifest. Once that happens, an updateready event will be fired, stating that an updated copy of the cache has finished downloading and is ready to be used. We can then explicitly use the swapCache() function to swap the old cache with the new one we just downloaded.

In this way, you can ensure that the user's cache will stay updated.

Summary

The introduction of the W3C HTML5 application cache provides exciting new possibilities to web developers. Web applications can now be cached for offline use, thereby making them even more powerful and useful than before.

Read more...