In this post I look at how RDFa and other techniques to add extra information to documents evolved around the same time as Microformats, as part of the work at the W3C on the next generation of web languages. Whilst the traditional approach taken to language design at the W3C – and now used in HTML5 – seeks to anticipate authors’ every needs, the RDFa approach resulted from a focus on clearly defined extension points that give authors control. In this post I look at the different approaches taken to markup languages, as a way to explain the problems @role and RDFa solve, and conclude that although HTML5 is unlikely to shake its monolithic structure (and will therefore not support extension mechanisms like RDFa), RDFa itself will continue to flourish.

Microformats

Microformats was a great development. But despite the way that Microformats has allowed people to do some neat things at the nexus of web-page/browser/desktop, they’ve never realised their full potential because of some fundamental flaws. The key problem is that to create a vocabulary requires something akin to a full standards-process, and standards- making has proved over the years, to be laborious, time-consuming, and often acrimonious. It’s therefore no accident that there remain very few Microformats.

Extensible

Microformats began its life at much the same time that the XHTML 2 Working Group (at the time called the HTML Working Group) began rethinking the way that @class, meta and link should work in XHTML. The key design decision taken at that time was that we needed a markup language that was extensible. Unfortunately, over the years the notion of extensibility has been reduced to the blunt instrument of schema combination; using tools such as XML Schema or RELAX NG, language creators could bolt other languages together by combining and adding to their schemas. But anyone who has tried this will know that it is something of a black art, and it rarely happens that anyone but a specialist would add any extensions to a language.

Content adaptation

However, alongside this notion of extensibility another one has been gathering momentum; that the language should itself contain extension points that authors and organisations can use, without having to go back to standards bodies. A good illustration of the need for extensibility came back in 2004. I was presenting some of the ‘new ideas’ from XHTML 2 at a W3C workshop on content adaptation, and was struck by the fact that in two of the other papers, companies had effectively invented their own markup language by adding well-defined @class values, and a few new elements and attributes. Barry Haynes and Matthew Clough presented the ‘Orange Markup Language’, or OWL, whilst Dan Appelquist from Vodafone talked on VCML; both languages aimed at adding information to documents to:

  • help the reformat process when displaying on different devices;
  • determine which parts could be moved to menus and which parts needed to remain in the main window;
  • add semantics, so that, for example, a football score in column two of a table wouldn’t get separated from the team names in column one;
  • specify themes;
  • and more. It’s worth looking at the presentations (see W3C Workshop on Metadata for Content Adaptation for links to all the papers), just to realise the problems that organisations like mobile phone operators, news publishers, aggregators and so on are trying to solve. But from a standards point of view, the most interesting point about the two ‘languages’ was that they were completely different. It meant that if Reuters were to supply documents of news stories to both Orange and Vodafone, it would need to provide two separate feeds, each using the correct language.

Extension Points

The ‘extensible’ aspect of the new work at the W3C at that time was about trying to address these requirements, but not by constantly adding new language features – instead we were looking at providing extension points to a base language. There are two kinds of extension points that organisations likes Orange and Vodafone were in need of. The first was to able to describe aspects of the document itself, and the second was to be able to describe the content of the document.

Describing the document itself

When we talk about the document itself we need to be able to identify that some parts are menus, others are footers, and yet another segment is the main content. Armed with this basic information we can do some interesting things. For example, just knowing which div, p or section contains the main content greatly helps those who are using screen-readers; they could skip past menus, headers and adverts and get straight to reading the content. Similarly, knowing which parts of a document are the menus means that mobile operators could move all of that content to the shortcut buttons on the phone, and leave more room on the display to show the main content. In the past, indicating which elements played which role would have been achieved through the use of the class attribute. The problem was that this had been used to hold all sorts of values, and there was no guarantee that a value of menu in one organisation would have the same meaning in another. So we proposed a new attribute called role, which is specifically for describing the purpose of an element (see XHTML Role Attribute Module, as well as Using the role attribute to extend XHTML and The XHTML role attribute: small and perfectly formed). Not only that, this new attribute could contain identifiers that were unique across the web, which meant that we could really be sure that we had ‘a menu’, or ‘a footer’, because the identifier had been constructed to indicate exactly that. The role attribute was quickly taken up by the WAI- ARIA work, which defined a set of common identifiers, and the role attribute is now implemented in Firefox (see the Mozilla Accessibility Project) and IE 8 (see What’s New for Accessibility in Internet Explorer 8).

Elements and attributes

Some would say that there is no difference between adding a footer using an element:

<footer>
  webBackplane.com (C) 2008 webBackplane, a trading name of Backplane Ltd.
</footer>

and adding one using a role value:

<div role="footer">
  webBackplane.com (C) 2008 webBackplane, a trading name of Backplane Ltd.
</div>

but there are some important differences. The first relates to the standards process that I’ve mentioned; adding a footer element to a language takes time, and requires jumping through all sorts of hoops. In the end it’s those with the time and energy to devote to the standards process itself whose voices are heard, usually from companies who can afford to have someone spend a lot of time on the topic. But adding a footer role value takes no time at all, and you don’t need to ask anyone. Of course, if you want to create some role values that other people will use, perhaps in your industry or field of learning, then you’ll need to co-ordinate with them. But even so, that’s a far cry from having to appeal to the W3C to add your favourite element or value to their language. So whether you are chemists or TV guide publishers, you can determine you own set of values, within your own timeframe, and promote them however you want, without having to wait for the wheels of the standards process to turn. I would just add that there is another difference between elements and attributes and that concerns the semantics; with the element technique we are saying we have ‘a footer’, which in many situations is fine. But with the attribute technique we are saying we have ‘a div that is playing the role of a footer’, which is slightly different. It means that the structural aspect of what we have is preserved – it might be a div, but it could also be a span or a p – and we are merely augmenting that. To some extent this is a matter of taste, so there’s no point in trying to find principles here. A lot will depend on what else is going on in your document. For example, if you have a jQuery query that finds all divs with a @class value of panel, and sets them so that they are hidden until a user requests to see them, then using @role="footer" instead of footer plays nicely with that. Or if you have many roles attached to elements in a document (menus, main content, footers, headers, supporting content, sidebars, and so on), then you might find it much easier to read as a collection of sections with role values, then a collection of specifically-named elements.

Describing the document’s content

The second area of extensibility that many people need concerns marking up the document’s content. Once again the traditional solution was to use the class attribute to hold information such as ‘this is an address’, or ‘this is a location’. Microformats formalised some of these uses. But the problem here is that ‘this is a …’ only gets you so far, and you quickly want to add properties about the item; its longitude and latitude, or its Creative Commons license (see RelLicense), for example. Microformats tries to get round this by using existing HTML attributes to hold these values, but you soon run out of attributes to use (not to mention that the ‘meaning’ of an attribute often has to be bent a long way to make it fit its new use). The extensibility-based solution to this problem was to devise a new set of attributes. However, unlike the role attribute, these new attributes were specifically for describing content.

RDFa

The set of attributes became formalised as RDFa (see RDFa Primer and RDFa in XHTML: Syntax and Processing), allowing authors and organisations to add extra information about their documents’ content. And as with @role there is no need to wait for the terms and values that can go into the attributes to be passed down from on high; instead you can use your own terms, whether that’s in your organisation, across all mobile phone operators, within all hospitals, amongst research chemists, and so on.

Why object to extensibility?

Whilst I’ve explained the goals of some of the design decisions for @role and RDFa, it’s clear that not everyone likes the ‘extension points’ approach. The WHATWG for example are pursuing a much more monolithic approach with HTML5; they see no need for extension points, since the language itself will cover everything. But this means that if your company or organisation doesn’t have the resources to allocate someone to be involved in the standards process, your perfectly reasonable requirement will go unsupported. The Microformats approach is also counter to the idea of ‘extension points’ that are open to anyone, since it, too, attempts to centrally control the creation of new formats, stifling the evolution of new vocabularies by specialists within their sectors. Which means, unfortunately, that HTML5 is unlikely to add support for RDFa, since it’s completely counter to the aims and goals for HTML5. But lest anyone gets too worried about that, bear in mind the following:

  • RDFa is being parsed from HTML pages by Yahoo!’s SearchMonkey;
  • RDFa is being promoted by Creative Commons as the approved way of adding licensing information;
  • the London Gazette has all of its pages marked up using RDFa (see SemWebbing the London Gazette), making it the first major RDFa project in the world;
  • RDFa is central to a couple of projects I am working on for the UK government’s Central Office of Information;
  • RDFa is being incorporated into Drupal core. The prospects for being able to extend documents without having recourse to standards organisations is enormous, and consequently interest is growing fast in RDFa. We should therefore concentrate for now on its use for people who need extensibility points in their documents and applications.