The myth of automated heading outlines

There's been a revived interest in automatically generated, logically nested headings maps. The HTML5 solution was to use the H1 element in each SECTION, and the browser would then figure out the correct level by the nesting of SECTION elements.  But browsers have not actually implemented this part of the specification, so this method does not work.

Heading outlines are particularly important for people using assistive technologies (see Background section below) so it would be a Good Thing if technology could help us improve their quality.

Further reading: The HTML5 Document Outline, by Steve Faulkner

Let's make a new element!

Jonathan Neal has put out a proposal to add a new element, H, to indicate the heading for a SECTION.  The idea is to provide a contextual heading for a given piece of content without having to know exactly what level it is in terms of the overall page. Browsers could then generate a logical outline based on nesting of SECTIONs.

This really leaves us where we are now: there is no value unless browsers actually implement it.

Further reading: The State of <H>, Jonathan Neal

What, actually, is the problem to be solved?

I have some bad news: heading and content hierarchies are based on the authors' intent and the what they are trying to communicate.  No algorithm can read a person's mind, so if you want your heading levels useful and meaningful, you have to do it yourself.  It doesn't really matter if you do it with numbered heading elements or by correct nesting, only you can know the relationships and hierarchies and make sure to use the markup that will best represent it.  If the author doesn't care about or understand content structure, nested SECTIONs will be abused just as badly as numbered headings.

The current debate seems to be stuck on how to generate an algorithm that can fix missing structure markup without breaking good structure markup.  I for one can imagine why browser vendors aren't interested in tackling this problem: you are guaranteed to make somebody angry with you.

Further reading: Do we need a new heading element? We don't know, by Jake Archibald

But what about content re-use?

The only real-life example of where code could help is content re-use.  In some publishing environments, you may have pieces of content that can be used in different places, and they may need different heading levels in those different places.  This is a legitimate issue, but in reality, it's not a very common one.

Content re-use is very appealing: there's only one place to keep information up-to-date while being able to pop it in everyplace it's useful.  We did a lot of work on enabling content re-use in Mass.Gov for these reasons. But it was almost never used: content often needed to be edited to make sense in the new context, and/or it was easier and better to just link to the existing content. When it was used, it created new content maintenance problems when the original was edited or deleted.  So, great idea, but not practical in most situations.

The wonders of content management systems

Where content re-use does make sense is in an environment with highly controlled information architecture and content curation, which requires a sophisticated content management system.  The obvious place to build automation for correct outlining is within that CMS, as part of the controls.  The page-generating code and templates control where different pieces of content go, so should also be able to control what heading levels are being used in those different places.

And that's pretty much what sites like this are doing already, so a browser-based algorithm is not going to solve any problems for them.

In less-rigid environments, with more content creators with greater latitude, you can still configure your CMS to make it easier for them to do the right thing. For instance, you can configure its WYSIWYG to not offer heading levels higher than authors should use.

Browsers are amazing, and are forgiving of so much bad markup, but it's not reasonable to expect them to fix everything.  When it comes to issues that have to with the actual meaning and purpose of content, it may not even be desirable to expect them to fix them.  Sometimes you just have to fix your own problems.

Background: Why headings are important

We use headings in written content to break up long chunks of text to make it easier to absorb.  A sighted user can easily scan the content to see what it is about and how it is organized.  Screen reader software gives users a list of headings and their levels that they can navigate from, or they can use keyboard commands to move from heading to heading, either sequentially or within a specific heading level.  There are browser extensions that give keyboard-only users similar methods.

If you have used heading levels correctly, it will be a logical outline of the content.  The most common errors are using only visual effects instead of heading markup, skipping levels, and incorrect nesting.