New Structural Elements in HTML5

Introduction

HTML5 brings two new things to the table: new APIs that add essential new features to the open standards web development model, and new structural elements that define specific web page features with much more accurate semantics than were available in HTML 4. You can find articles covering many of the new APIs by looking for Dev.Opera articles marked with the HTML5 tag.

This article, on the other hand, focuses on the latter — we will briefly look at how the new semantic elements were chosen, what the main new features are and how they are used, how headings work in HTML5, and browser support for these new elements, including how you can support them in older browsers.

The contents are as follows:

Introducing HTML5 structural elements

HTML4 already has a lot of semantic elements to allow you to clearly define the different features of a web page, like forms, lists, paragraphs, tables, etc. However, it does have its shortcomings. We still rely heavily on <div> and <span> elements with different id and class attributes to define various other features, such as navigation menus, headers, footers, main content, alert boxes, sidebars, etc. Something like <div id="header"> works in terms of developers and designers knowing what it is for, and being able to use CSS and JavaScript to apply custom styles and behaviour to make it understandable to end users.

But it could be so much better. There are still problems with this kind of set up:

  • Humans can tell the different content apart, but machines can't — the browser doesn't see the different divs as header, footer, etc. It sees them as different divs. Wouldn't it be more useful if browsers and screen readers were able to explicitly identify say, the navigation menu so a visually impaired user could find it more easily, or the different news items on a bunch of blogs so they could be easily syndicated in an RSS feed without any extra programming?
  • Even if you do use extra code to solve some of these problems, you can still only do it reliably for your web sites, as different web developers will use different class and ID names, especially when you consider the international audience — different web developers in different countries will use different languages to write their class and id names.

It therefore makes a lot of sense to define a consistent set of elements for everyone to use for these common structural blocks that appear on so many web sites. The new HTML5 elements we will cover in this article are:

  • <header>: Used to contain the header of a site.

  • <footer>: Contains the footer of a site.

  • <nav>: Contains the navigation functionality for the page.

  • <article>: Contains a standalone piece of content that would make sense if syndicated as an RSS item, for example a news item.

  • <section>: Used to either group different articles into different purposes or subjects, or to define the different sections of a single article.

  • <time>: Used for marking up times and dates.

  • <aside>: Defines a block of content that is related to the main content around it, but not central to the flow of it.

  • <hgroup>: Used to wrap more than one heading if you only want it to count as a single heading in the page's heading structure.

  • <figure> and <figcaption>: Used to encapsulate a figure as a single item, and contain a caption for the figure, respectively.

How were the element names decided upon?

During the creation of HTML5, editor Ian Hickson used Google's tools to mine data from over a billion web pages, surveying what ID and class names are most commonly used on the real world web. You can see one of the surveys published at Google Code: Web Authoring Statistics: Classes. To cut a long story short, the element names seen in this article were taken from the 20 most popular IDs and class names found in Hickson's surveys.

Note: Opera did a similar study, of 3.5 million URLs, calling it MAMA. MAMA had a smaller URL set, but looked at a larger and wider variety of web page statistics. For more information, take a look at MAMA common attributes, MAMA's id list, and MAMA's class list. For more options, go to the MAMA home page.

Why isn't there a <content> element?

While this may seem like a glaring omission, it really isn't. The main content will be the top level block of content that isn't the <header>, <nav> or <footer>, and depending on your particular circumstance, it might make more sense to mark the content up using an <article>, a <section>, or even a <div>. Bruce Lawson calls this The Scooby Doo algorithm, but to find out why, you'll have to ask him on Twitter, or at a conference!

Presenting an example HTML5 page

So now we've gone through a bit of background, and seen what the new elements on offer are, let's go through an example, and see exactly how to use them in the context of a real page. Go and have a look at my A history of Pop Will Eat Itself page — a history and discography of one of my favourite English bands from the 80s/90s (if you like alternative music, please, go check 'em out.) I took the original markup from the Pop will Eat Itself Wikipedia page, cleaned it up, and turned it into HTML5. Let's look more closely at what I did.

Keep the sample page open in a separate tab as you read through the article — you'll want to refer back to it.

My example uses the traditional tried and tested wrapper <div> to center the content, but Kroc Camen has published a nice article showing how to create centered designs without <div> wrappers, so I thought I'd share that here also. It also usefully advises to not use HTML5 <section> elements as glorified wrappers in HTML5 pages - that is just plain wrong!

Some meta-differences

The first thing you'll notice is that the doctype is much simpler than in older versions of HTML:

<!DOCTYPE html>

the creators of HTML5 chose the shortest possible doctype string for this purpose — after all, why should you, the developer, be expected to remember a huge great long string containing multiple URLs, when in reality the doctype is only there to put the browser into standards mode (as opposed to quirks mode)?

Next, I want to draw your attention to HTML5's apparent "lax syntax requirements". I have included quotes round all my attribute values, and written all the elements in lower case, but that's because I am used to writing using XHTML rules. But it may come as a surprise to you to discover that in HTML5, you can ignore these rules if you want. In fact, you don't even have to bother including the <head>, <body>, or <html> elements, and it will still validate!

Note: this is not true if you switch to using XHTML (HTML served with the XHTML doctype — application/xhtml+xml)

This is because such elements are assumed by the browser anyway. If you create a sample HTML5 page without these elements, load it into a browser, and view the source of the loaded page, you will see them inserted automatically by the browser. Alternatively, you could use Ian Hickson's Live DOM viewer utility to see the state of the DOM.

Note: Ok, so in fact you can also get HTML4 documents to validate if you don't include <head>, <body>, or <html>, but it is still worth mentioning here.

Another thing to mention is that the HTML5 spec strictly defines how to handle badly-formed markup (for example wrongly nested elements, or unclosed elements), defining a parsing algorithm for the first time. This means that even if you do get some of your markup wrong, the DOM will be consistent across HTML5-supporting browsers.

So does this mean we don't need to worry about validation and best practices any more? HECK NO! validation is still a very useful tool for making your pages as good as they can be. Even if your DOM is consistent across browsers, it still might not behave how you wanted it to in the first place, resulting in CSS and JavaScript headaches! And as you'll see as you explore HTML5 further, there are still very good reasons for making sure you declare document features like <html> up front. For example, You might want to declare the document's language on the <html> element for i18n and accessibility benefits, and certain related technologies also require it. A good example is AppCache.

To validate HTML5 documents, you can use the W3C validator, which can validate HTML5, as well as a wide range of other markup language flavours. Or for a dedicated HTML5 (+ WAI-ARIA and MathML) validator, go to validator.nu.

Last of all in this section, I want to draw your attention to this line:

<meta charset="utf-8" />

You need to declare the character set of your document within the first 512 bytes, to protect against a serious security risk. Unless you have a really good reason not to, you should use UTF-8.

The document's header looks like this:

<header>
	<hgroup>
		<h1>A history of Pop Will Eat Itself</h1>
		<h2>Introducing the legendary Grebo Gurus!</h2>
	</hgroup>
</header>

the purpose of the <header> element is to wrap the section of content that forms the header of the page, usually containing a company logo/graphic, main page title, etc.

<hgroup>

You'll notice that in the above code, the only contents of my header are an <hgroup> element, wrapping two headings. What I want to do here is specify the document's top level heading, plus a subtitle/tag line. I only want the top level heading to count in the document heading hierarchy, and that's exactly what <hgroup> does — it causes a group of headings to only count as a single heading for the purposes of the document structure. you'll find more out about how heading hierarchies work in HTML5, in the HTML5 outlines, and the HTML5 heading algorithm section, below.

If you go to the bottom of the document, you'll see this code:

<footer>

	<h3 id="copyright">Copyright and attribution</h3>

	

</footer>

<footer> should be used to contain your site's footer content — if you look at the bottom of a number of your favourite sites, you'll see that footers are used to contain a variety of things, from copyright notices and contact details, to accessibility statements, licensing information and various other secondary links.

Note: You are not restricted to one header and footer per page — you could have a page containing multiple articles, and have a header and footer per article.

Further up the document again, you'll come across this structure:

<nav>
	<h2>Contents</h2>
		<ul>
			<li><a href="#Intro">Introduction</a></li>
			<li><a href="#History">History</a>

			<!-- other navigation links... -->

		</ul>
</nav>

The <nav> element is for marking up the navigation links or other constructs (eg a search form) that will take you to different pages of the current site, or different areas of the current page. Other links, such as sponsored links, do not count. You can of course include headings and other structuring elements inside the <nav>, but it's not compulsory.

<aside>

Just underneath the document heading, we have the following:

<aside>
	<table>

		<!-- lots of quick facts inside here -->

	</table>
</aside>

The <aside> element is for marking up pieces of content that are related to the main content, but don't fit directly into the main flow. For for example in this case we have a bunch of quick fire facts and statistics about the band, which wouldn't work so well shoehorned into the main content. Other suitable candidates for <aside> elements include lists of links to external related content, background information, pull quotes, and sidebars.

<figure> and <figcaption>

The dynamic duo of <figure> and <figcaption> have been created to solve a very specific set of problems. For a start, doesn't it always feel a bit semantically dubious and unclean to mark up an image and its caption as two paragraphs, or a definition list pair, or something else? And second, what do you do when you want a figure to consist of an image, or two images, or two images and some text? <figure> is on hand to wrap around all the content you want to comprise a single figure, whether it is text, images, SVG, videos, or whatever. <figcaption> is then nested inside the <figure> element, and contains the descriptive caption for that figure. The figure I included in my example is a simple one, to get you started:

<figure>
	<img src="pwei.png" alt="Old poppies logo" />
	<figcaption>
		The old poppies logo, circa 1987.<br /> <a href="http://www.flickr.com/photos/bobcatrock/317261648/">Original picture on Flickr</a>, taken by bobcatrock.
	</figcaption>
</figure>

<time>

The <time> element allows you to define an unambiguous date and time value that is both human and machine readable. For example, I've marked up the release dates of the poppies' singles like so:

<time datetime="1989-03-13">1989</time>

The text in between the opening and closing tags can be anything you want, as appropriate for the people reading your site. If you wanted, you could also put it like this:

<time datetime="1989-03-13">13th March 1989</time>
<time datetime="1989-03-13">March 13 1989</time>
<time datetime="1989-03-13">My nineteenth birthday</time>

Conversely, the date inside the datetime attribute is an ISO standard (see W3C Tip: Use international date format (ISO) for more information) machine readable date, so you get the best of both worlds. You can also add a time onto the end of the ISO standard, like so:

<time datetime="1989-03-13T13:00">One o'clock in the afternoon, on the 13th of March 1989</time>

You can also add a timezone adjustment, so for example to make the last example pacific standard time, you would do this:

<time datetime="1989-03-13T13:00Z-08:00">One o'clock in the afternoon, on the 13th of March 1989</time>

<article> and <section>

Now we turn our attentions to probably the two most misunderstood elements in HTML5 — <article> and <section>. When you first meet them, the difference might appear unclear, but it really isn't so bad.

Basically, the <article> element is for standalone pieces of content that would make sense outside the context of the current page, and could be syndicated nicely. Such pieces of content include blog posts, a video and it's transcript, a news story, or a single part of a serial story.

The <section> element, on the other hand is for breaking the content of a page into different functions or subjects areas, or breaking an article or story up into different sections. So for example, in my PWEI history, the structure looks like so:

<article>
	<section id="Intro">
		<h2>Introduction</h2>
	</section>

	<section id="History">
		<h2>History</h2>
	</section>

	<section id="Discography">
		<h2>Discography</h2>
	</section>
</article>

But you could also have a structure like this:

<section id="rock">
	<h2>Rock bands</h2>
	<!-- multiple article elements could go in here -->
</section>

<section id="jazz">
	<h2>Jazz bands</h2>
	<!-- multiple article elements could go in here -->
</section>

<section id="hip-hop">
	<h2>Hip hop bands</h2>
	<!-- multiple article elements could go in here -->
</section>

Where does that leave <div>?

So, with all these great new elements to use on our pages, the days of the humble <div> are numbered, surely? NO. In fact, the <div> still has a perfectly valid use. You should use it when there is no other more suitable element available for grouping an area of content, which will often be when you are purely using an element to group content together for styling/visual purposes. The example in my PWEI history is the <div id="wrapper"> I have wrapped around the whole of the content. The only reason it is here is so that I could use CSS to center the content in the browser:

#wrapper {
	background-color: #ffffff;
	width: 800px;
	margin: 0 auto;
}

<mark>

The <mark> element is for highlighting terms of current relevance, or highlighting parts of content that you just want to draw attention to, but not change the semantic meaning of. It's like when you are going through a printed article and highlighting lines important to you with a highlighter pen. So for example, you might want to use this element to markup lines in a wiki that need to be given editorial attention, or to highlight instances of a search term that the user has just searched for on a page, and then give them appropriate styling in your CSS.

The hidden attribute

The hidden attribute, when applied to any element, hides it completely from any form of presentation/media, and should be used if you are intending to show content later on (for example, using JavaScript to remove the attribute) but don't wish to have it shown now. It shouldn't be used to hide content such as hidden tabs in a tabbed interface, because that is really a different way of presenting content in a smaller space, rather than hiding content altogether.

HTML5 outlines, and the HTML5 heading algorithm

Before we carry on our journey towards mastery of HTML5, there is another important difference we should discuss between HTML5, and previous versions of the spec. In HTML, we have the concept of the document outline, which is basically a breakdown of the document into it's headings, and their hierarchy relative to one another, exactly like when you are writing a document in a word processor and you look at your document in outline view. In effect, I have basically created a document outline for this document by nesting lists to create the table of contents at the start of the article. This article's document outline looks something like this:

- New structural elements in HTML5
	- Introducing HTML5 structural elements
		- How were the element names decided upon?
		- Why isn't there a <content> element?
	- Presenting an example HTML5 page
		- Some meta-differences
		- <header>
		- <hgroup>
		- <footer>
		- <nav>
		- <aside>
		- <figure> and <figcaption>
		- <time>
		- <article> and <section>
			- Where does that leave <div>?
	- HTML5 outlines, and the HTML5 heading algorithm
	- How to get it working in older browsers
	- Summary

So "New structural elements in HTML5" is an <h1>, "Introducing HTML5 structural elements" is an <h2>, and so on. In HTML4 we are used to the fact that there are six possible heading levels, and each heading's level is dictated by the actual element used, which means that it is perfectly possible to end up with a completely screwed up heading hierarch if you use the wrong heading levels, or even if some of your content is syndicated into a different CMS.

HTML5 solves this problem by generating it's heading hierarchy based on the relative nesting the different document sections. A new document section is created whenever you use so-called sectioning content<article>, <section>, <nav>, and <aside> elements. So for example, if you take the following example:

<h1>My title</h1>

<div>
	<h2>My subtitle</h2>
</div>

See the first outlining example running live.

HTML 4 will count this as a first level heading followed by a second level heading, but HTML5 will count this as two first level headings. Why? Because <div> is not a sectioning element, so does not create a new section in the hierarchy. To remedy this, you'd have to change the <div> to a sectioning element:

<h1>My title</h1>

<section>
	<h2>My subtitle</h2>
</section>

See the second outlining example running live.

No browsers currently implement the HTML5 outlining algorithm, but you can already get a feel for how it works by using the HTML5 Outliner Opera Extension or Geoffrey Sneddon's on-line HTML5 outliner, or the Google HTML5 outliner. Try running the above examples through one of these tools if you don't believe me. And in the future, you won't really need to bother with a hierarchy of h1, h2, h3, etc., as regardless of what actual heading elements you use, the algorithm will still work out the same hierarchy based on the nesting of the document sections. But you should still bother for now, as no browsers (or screen readers) support this yet!

So the big question now is "why bother with all this"? Well, this new way of working out the document outline/heading hierarchy has two major advantages over the old way:

  1. You can have as many heading levels as you like — you are not limited to six.
  2. If your content is transplanted into someone else's CMS, and this results in the h1, h2, h3, etc. levels going wrong, the algorithm will still work out the correct hierarchy regardless.

Note: The HTML5 heading hierarchy is actually a really old idea, originally envisaged by Tim Berners-Lee in 1991:

I would in fact prefer, instead of <H1>, <H2> etc for headings [those come from the AAP DTD] to have a nestable <SECTION>..</SECTION> element, and a generic <H>..</H> which at any level within the sections would produce the required level of heading.

How to get it working in older browsers

Older browsers: always the bane of our very existence when trying to get to grips with using shiny new toys on the Web! In fact, the problem here is all browsers - no browsers currently recognise and support these new HTML5 structural elements, as such. But never fear, you can still get them working across browsers today with the minimum of effort.

first of all, if you put an unknown element into a web page, by default the browser will just treat it like a <span>, ie, an anonymous inline element. Most of the HTML5 elements we have looked at in this article are supposed to behave like block elements, therefore the easiest way to make them behave properly in older browsers is by setting them to display:block; in your CSS:

article, section, aside, hgroup, nav, header, footer, figure, figcaption {
	display: block;
}

This solves all your problems for all browsers except one. Have a guess which one? ... Yup, amazing isn't it, that IE should prove to be trickier than the other browsers, and refuse to style elements it doesn't recognise? The fix for IE is illogical, but fortunately pretty simple. For each HTML5 element you are using, you need to insert a line of JavaScript into the head of your document, like so:

<script>
		document.createElement('article');
		document.createElement('section');
		document.createElement('aside');
		document.createElement('hgroup');
		document.createElement('nav');
		document.createElement('header');
		document.createElement('footer');
		document.createElement('figure');
		document.createElement('figcaption');
</script>

IE will now magically apply styles to those elements. It is a pain having to use JavaScript to make your CSS work, but hey, at least we have a way forward? Why does this work exactly? no-one I've talked to actually knows. There is also a problem with these styles STILL not being carried through to the printer when you try to print HTML5 documents from IE.

Note: The IE print problem can be solved using the HTML5 Shiv JavaScript library, which also handles adding the document.createElement lines for you. You should wrap it up in Conditional comments for IE less than IE9, so modern browsers don't execute JS they don't need.

Summary

That rounds off our discussion on the new structural elements in HTML5. If you want more help with HTML5, we have a lot more to offer here on dev.opera.com, and you should also consult the HTML5 doctors.