Correct HTML

Monday, March 27, 2006

Why?

The most relevant standards body for the internet, the W3C, has stated the purpose of the World Wide Web as, “a network of information resources. The Web relies on mechanisms to make these resources readily available to the widest possible audience.” Most important in that statement are the phrases “information resources”, “readily available”, and “widest possible audience”. The very core purpose of the Web is to provide information as best as possible to as many users as possible.

HTML, or the Hypertext Markup Language, is one of the mechanisms mentioned and is define as follows by the official HTML specification:

To publish information for global distribution, one needs a universally understood language, a kind of publishing mother tongue that all computers may potentially understand. The publishing language used by the World Wide Web is HTML (from HyperText Markup Language).

HTML gives authors the means to:

  • Publish online documents with headings, text, tables, lists, photos, etc.
  • Retrieve online information via hypertext links, at the click of a button.
  • Design forms for conducting transactions with remote services, for use in searching for information, making reservations, ordering products, etc.
  • Include spread-sheets, video clips, sound clips, and other applications directly in their documents.

The central purpose in HTML authoring is the representation of information resources as best as possible to as many users as possible.

What This Means

To fulfill this very simple and straightforward purpose of HTML to the best of their ability, HTML authors must adhere closely to a set of standards. There are two very important parts of these standards, although the second is often neglected:

Valid Markup

All HTML markup produced must be well-formed and valid. This includes, but is not limited to:

  • Proper tag nesting
  • Strict conformance to an HTML (or XHTML) DTD, including use of tags only where they are explicitly allowed
  • Using closing tags on elements which require a closing tag

Checking for correctness is made easier with validation tools such as the W3C’s Markup Validator. There is also a nice client-side validation tool in the form of a Firefox extension, HTML Validator. Using tools like this should be part of the default development procedure.

Semantic Markup

Correctness is not limited to validity, however; soundness is required as well. This means that not only must the logical structure be valid, but the meanings of the elements in the logical structure must also be true or correct. This is a universal logical truth and is no less important in the world of HTML.

This means that tags in HTML used to represent elements must represent the meaning and intent of the author as accurately as possible. One of the most glaring errors of modern HTML authors is with tag choice and the methods they use to make that choice.

The common error comes from asking the question “How can I make this markup produce the visual result I want?”. This is the wrong question to ask because, as we’ve seen, the purpose of HTML is to represent information in an informational semantic structure — it has nothing to do with visual presentation.

Instead, when one is choosing tags, one should ask the question: “What does this element represent?”, or, “What does this data mean?”. The semantic meaning of the data being represented commands the choice of HTML tag and thus ensures an accurate conveyance of the intended meaning. This means that tag names, class names, ids, and other identifiers should be named according to their semantic meaning with regard to the data they represent.

For example, a div element should only be used when there is no other more specific element that describes the data to be outputted (e.g. an address element for an address, a cite element for a citation, etc.). Furthermore, when use of a div is warranted, a class and/or id attribute should be used in addition with some data-descriptive value. This means “header” is a good id, but “redsmall” is a terrible class name — it describes presentation, not the meaning of the data. Further required reading: W3C: Use class with semantics in mind.

Practical Exceptions

Unfortunately we don’t live in a perfect world and exceptions to the perfectly semantic rule must sometimes be made. For instance, a fieldset element isn’t rendered as a proper content box in certain web browsers, so in very specific circumstances it may be sacrificed for a more general container such as a div. But again, semantic id and class attributes should be used to compensate for the lack of semantic description in the tag name.

That said, exceptions should only be made in the very limited number of cases where technical limitations prevent use of semantic markup. One should always strive to represent semantic meaning as accurately as possible in all markup decisions.

Microformats

In the same vein as semantic markup are microformats — standardized structure and formatting for certain common types of data. Examples include hCard, hCalendar, XFN, XOXO, rel-tag, and others. They can all be found at microformats.org.

What We Gain

Maintainability

Correct markup accurately represents data to users so future changes to an HTML document should be a direct reflection of data that is changing. When semantic markup is used, the changes that need to be made follow the same tag and structure choice procedures, making the changes straightforward and consistent.

This means that developers will spend less time updating HTML documents for presentational changes in the future due to the correctness of their semantic markup.

Accessibility

The central purpose in HTML authoring, again, is the representation of information resources as best as possible to as many users as possible. This means that just shooting for the “eighty percent case” is not acceptable. Representing data to as many users as possible involves representing the data in an easily comprehensible format, whether the user is an average American adult male with good eyesight using a mobile phone browser or a blind teenaged Sri Lankan basket weaver at a braile terminal.

Correct HTML (valid and semantic) provides a consistent representation of data that can be interpreted in a number of ways, including output to a computer screen, print media, a braile terminal, mobile devices, etc. Malformed or semantically meaningless HTML cannot represent the data accurately or as successfully to these different output media.

Predictable Structure and Layout

Last but certainly not least is the generated output from browsers that process HTML. A web browser will construct a DOM tree which is an object-based representation of the HTML document as a series of nodes in a tree structure. When valid markup is used, the consistency of the DOM tree, especially across different web browsers, is greatly improved. A consistent DOM makes the client-side worlds of JavaScript and CSS much easier and simpler, thus making the entire development process more lightweight.

Analogous to this benefit is that of predictable default browser layout. Text-based browsers, screen readers, braile terminals, and other output devices all depend on semantic element meanings to properly render the document and present the information to the user.

Conclusion

The take-home here is to think of HTML as what it is — a markup language — instead of something more contrived like a presentation language (which it is not). This means marking up data using valid structure and correct semantic meaning. For presentation, one must use a style language. In the case of the Web, this is CSS; use it. The world will be better for it.

written by Brad Fults

Add your thoughts | Trackback URL

Archived at: http://h3h.net/2006/03/correct-html/

1 response

  1. Devin Johnston » Blog Archive » Updated Blogging Dippers Blogroll Code

    [...] Q: Why has the blogroll code been changed? A: It has been changed in order to be more standards comliant and in order to allow for easier theming with CSS. The new code should integrate more easily into WordPress and other blogging platforms’ templates. The new code is also more semantically correct. [...]

  2. Comment Preview

Leave a comment

Comments are posted at the discretion of the site owner. Please try to be respectful, insightful and otherwise useful to society as a whole.

(X)HTML is allowed. You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <blockquote cite=""> <cite> <code> <dfn> <em> <kbd> <q cite=""> <samp> <strike> <strong> <sub> <sup> <var>