Sending XHTML as text/html Considered Harmful to Feelings

Wednesday, December 21, 2005

Ian Hickson wrote a piece awhile ago called Sending XHTML as text/html Considered Harmful. He introduces several main points against sending XHTML documents with a text/html MIME type, which, I believe, are wholly unconvincing. I’ll comment on or refute each of his points below.

<script> and <style> elements in XHTML sent as text/html have to be escaped using ridiculously complicated strings.

This is not an issue if one includes scripts and stylesheets as external resources from their own files. This helps caching, organization, and modularization of code and should be used in all production environments. In the very rare case that scripts or styles need to included on the XHTML page itself, two extra lines delimiting CDATA blocks are not “harmful” to any degree. Not Harmful

Documents sent as text/html are handled as tag soup by most UAs…Since most authors only check their documents using one or two UAs, rather than using a validator, this means that authors are not checking for validity, and thus most documents that claim to be XHTML on the web now are invalid.

Which means that they are handled in exactly the same manner as HTML 4.01 Strict and other HTML documents. Is there a harmful issue here? No. Whether or not documents validate is a completely separate issue from the MIME type as which they are sent. It is up to authors alone to worry about validating their documents with a validator. Not Harmful

If you ever switch your documents that claim to be XHTML from text/html to application/xhtml+xml, then you will in all likelyhood end up with a considerable number of XML errors, meaning your content won’t be readable by users.

That only applies if the document wasn’t valid in the first place, which, as we’ve seen, is a completely orthogonal issue. Not Harmful

If a user saves such an text/html document to disk and later reopens it locally, triggering the content type sniffing code since filesystems typically do not include file type information, the document could be reopened as XML, potentially resulting in validation errors, parsing differences, or styling differences. (The same differences as if you start sending the file with an XML MIME type.)

Again, this is author-dependent and has nothing to do with the initial decision to send an XHTML document as text/html. If one writes valid XHTML in the first place, none of these problems will arise. Not Harmful

The only real advantage to using XHTML rather than HTML4 is that it is then possible to use XML tools with it. However, if tools are being used, then the same tools might as well produce HTML4 for you. Alternatively, the tools could take SGML as input instead of XML.

I disagree here and this difference might be telling. I believe the biggest advantages to XHTML are its readability, uniformity, well-formedness as it pertains to authoring, and the consistency of the rendered DOM (which is also a result of any well-formed HTML document). Either way, just because a certain advantage of XHTML can be duplicated via other older methods doesn’t mean that XHTML is harmful in itself. Not Harmful

HTML 4.01 contains everything that XHTML 1.0 contains, so there is little reason to use XHTML in the real world. It appears the main reason is simply “jumping on the bandwagon” of using the latest and (perceived) greatest thing.

I disagree for the reasons stated in the previous point. Some authors may use XHTML due to some trendy force, but that doesn’t make the use of XHTML harmful. Indeed, I’ve refuted the main arguments against XHTML’s harmfulness and provided several reasons for its usefulness. Not Harmful

Then he goes on to talk about how XHTML documents aren’t compatible with HTML documents and lists several points. Needless to say, none of these points about incompatibility with HTML means incompatibility with current browsers, which all handle XHTML sent as text/html as expected.

The “/>” empty tag syntax actually has totally different meaning in HTML4.

That’s all well and good, but all modern browsers deal with the empty tag as a single tag without using HTML’s different meaning. Not Harmful

Script and style elements cannot have their contents hidden from legacy UAs.

Again, this is not a problem when one includes these resources from external files. Not Harmful

The “xmlns” attribute is invalid HTML4. The XHTML DOCTYPEs are not valid HTML4 DOCTYPEs.

These have nothing to do with the harmfulness of XHTML in itself. Not Harmful

He goes on to analyze why XHTML sent as text/html can’t be interpreted as XML, but that again is orthogonal to the main benefits of XHTML in my opinion. In conclusion, he sums up:

There are few advantages to using XHTML if you are sending the content as text/html, and many disadvantages.

There are not nearly as many disadvantages (if any) to sending XHTML as text/html as he claims and the advantages I mentioned above make it well worth using in my humble opinion. There are some subtle footnotes and parentheticals indicating that the harmfulness only applies to authors that don’t know the pitfalls of this practice, but much like the “Do not eat” label on the little packets of silica gel, Ian’s advisory seems to be common sense and not worth mentioning to any author who actually knows what XHTML is and how to write it.

written by Brad Fults

Add your thoughts | Trackback URL

Archived at: http://h3h.net/2005/12/xhtml-harmful-to-feelings/

Links to this post

Currently tracking 1 reactions to this post. Here is a sampling:

57 responses

  1. Ian Hickson

    As I say in my document, it is not aimed at authors who know what they’re doing. Sure, if you’re one of the five authors who knows how to write XHTML that is well-formed, who knows how to write CSS that works with HTML and XHTML, who knows how to write JS/DOM that works in HTML and XHTML, and so forth, then _you_ won’t have any problems.

    The issue is that the overwhelming majority of authors don’t know any of these things, so they will trip up on all those problems.

    And the reason they use XHTML, is because they are copying and pasting code from authors such as yourself who _do_ know XHTML. They think “oh he’s using XHMTL [sic], it must be teh bomb, I’ll use that too”, then they screw it up, and all the problems I mentioned will arise and hit them.

  2. complex

    Only five authors? I call dibs on a spot. Who are the other two?

  3. Willow

    ME!

  4. elmer fudd

    we’re getting full circle. first we had html1-3 with all the idiosyncracies to get evry Joe and his dog onto the band wagon,. then we started having problems because of unmanageable parsers and portable devices that cannot handle complex, often ambiguous html with the solution being: let’s go to what the publishing industry has been doing for a while. Now we have XHTML a poor man’s variant on SGML. What’s next? shall we go back to HTML1? or plain text documents? The buck stops at XHTML for documents. Fancy stuff gets done with SVG or flash. If somebody wants to publish cheap he can either learn XHTML or do what ever picks his fancy. He cannot pretend to get slick publication style for no effort though!
    Consider me half an author. But i’m sure i don’t take much space ;-)

  5. Sue D. Nymme

    So… paraphrase Hickson’s article as “If you’re an incompetent boob, use HTML 4 over XHTML. If you know what you’re doing, disregard.”

  6. Solutions Log » Blog Archive » XHTML vs HTML

    [...] Update Oct 25, 2006: I like this! Be social: [...]

  7. Jeffrey Zeldman Presents : Monday breakfast links

    [...] Sending XHTML as text/html Considered Harmful to Feelings [...]

  8. Horst Kinkelberry

    Yeah, looks like Hickson has still a few things to learn.

    is used to escape script so that Netscape 1.0 doesn’t flash script fragment while parsing the document. This problem was fixed in Netscape 2.0. Every other browser works correctly, so

    is more than sufficient. But if you want to go wild with escaping, you can

    /* */

    This is btw the same thing that W3C documentation recommends. http://www.w3.org/TR/xhtml1/#h-4.8

  9. David Dorward

    Well, lets see. XHTML as text/html is required to conform to Appendix C isn’t it? C14 requires an XML processing instruction for each style element in the document. So where is the processing instruction for the style element in this page?

    Serving XHTML as text/html correctly is /hard/, and is rather lacking of benefits on the client side. (And on the server side we can always transform it to HTML with XSLT or other mechanisms).

  10. Daniel Axelrod

    One point that I didn’t see you address is that with XHTML, “HTML content will be able to be mixed-and-matched with content from other well-known namespaces (in particular, MathML)”. If a user-agent cannot tell that it is being given an XML document (and it can’t except for the mime-type), how can it take advantage of this feature?

    The crux of Ian Hickman’s article seems to be that XHTML served as text/html confers almost none of the benefits of XML because user agents can’t handle it as such, and that there are disadvantages in terms of handling of formedness problems, and a few compatability caveats.

    While you assert that making correctly-formed XHTML is the author’s problem, and not a problem with mime-types, creating a website with correct XHTML is *hard* for most people (see Mark Pilgrim’s Thought Experiment http://diveintomark.org/archives/2004/01/14/thought_experiment ). Using text/html just makes it harder for those people to tell when they’ve made a mistake, because of the two different error-handling models the document might be subjected to. This is the main assertion of harm.

    Still, interesting analysis!

  11. Alan

    Sue: That’s almost it, but you missed one key undertone.

    “If you’re an incompetent boob, use HTML 4 over XHTML. If you know what you’re doing, disregard. Since you’re not a software engineer or one of ‘five’ people, you don’t know what you’re doing.”

  12. Neo

    MIME types and document content are intertwined, for the exact reason stated originally; if a document claims to be XHTML it should have the appropriate MIME type. If you are sending MIME of text/html then why are you using XHTML? I don’t use MIME type of application/octet-stream for PNG or TIFF files.

    XML is a very strict format not to be hurtful, but for heavily automated parsing to produce the exact data that was intended by the author, not an interpretation of the data by the processor/UA.

    HTML has evolved so many variations to accommodate sloppy programming *cough* Internet Explorer *cough* that calling HTML a standard is almost laughable. And using HTML 4.0.1 Strict for what is actually an XHTML file is completely anti-standards (why have a standard if you have to hack around it?) because of all the “quirks” you must employ for the cross-standard conflicts and interpretation in the UA.

    Geeze, what is with all the HTML prima-donnas thinking they are the only ones who are right and if you point out their anti-standards contradictions you are a moron?

  13. Andrew Hedges

    Regardless of whether it *works* in current UAs, isn’t the point that sending XHTML as HTML rather than XML is using the wrong MIME type? I mean, if it’s going to end up being interpreted as HTML, why not prepare and send it as such? Pardon me if I’m missing some point about why XHTML is so great (I’m a former convert to the church of XHTML who has now lost his way). I think it makes more sense to send the correct metadata about a document (that is, a MIME type of HTML with an HTML document, and a MIME type of XML with an XHTML document) than not to. The problem, as I understand it is that certain UAs (particularly one built in Redmond that commands the lion’s share of the market) don’t understand the correct MIME type for XHTML. What am I missing?

  14. Ben

    If what you’re saying is true Sue, then that means the article will be read by precisely zero people.

  15. Tim Roberts

    A take on this that is often overlooked is that it is always best to prepare for the future, as far in advance as possible. Many developers may be under the false impression that their document never needs to valid XML. How could you ever know that?

    Fudge XHTML, and when you need to make it compliant XML because “you never thought of that” when you built, and the day has come when all major browsers treat XHTML with the respect it deserves, all you need to do is add one saucy little line.

    Now doesn’t that beat trawling through code to re-tool it.

  16. Sam Hill

    Much agreeing with Daniel, and Mark Pilgrim who he links too.

    For myself, I think you keyed on too much of the “harmful” bit and in an attempt to refute each of his points, you forgot that there is some middle ground here.

    how exactly is this:

    I believe the biggest advantages to XHTML are its readability, uniformity, well-formedness as it pertains to authoring, and the consistency of the rendered DOM (which is also a result of any well-formed HTML document).

    different from HTML 4.01 strict?

  17. Jared

    David Hammond’s writeup on webdevout takes Ian’s article one step further with real world examples of why using XHTML improperly can be dangerous.

    http://www.webdevout.net/articles/beware_of_xhtml.php

  18. Brian

    Thanks for the point-by-point breakdown of this article–I was skeptical of it when I first read it and I’m glad to read I was not alone!

  19. dusoft

    Sure, if you’re one of the five authors who knows how to write XHTML

    Oh, common, don’t generalize, when you don’t check the facts.

  20. Brad

    Sam:

    how exactly is this:

    different from HTML 4.01 strict?

    Very simply, XHTML is more aesthetically and logically pleasing than HTML. It makes more sense logically (close tags, empty tags) and just turns out looking better. Obviously this is a matter of taste, but it is also not the issue at hand. Whether or not it is prettier, the original question was concerning the harm of writing XHTML and then sending it as text/html.

  21. kL

    So it’s intepreted as HTML, it has features and limitations of HTML, it can’t be generated with XML tools, because it’s slashed HTML and has to obey HTML rules (try <script />).

    Maybe not harmful, but just so completly pointless…

    Honestly, how often do you say “No! it has to be XHTML! There’s no way to make it work as HTML!”? I did few times, but mostly as an excuse, and once because page was a showcase for OMFGHXHTMLOLZ ;)

  22. Watts

    While I agree with Brad’s comment about aesthetic pleasing-ness, I think the “harmful” crowd downplays the advantage of being able to treat HTML documents as XML in the first place–something you can’t do terribly well with HTML 4 documents. You can extract microformats from XHTML documents. You can easily apply XSLT directly to the XHTML for different transformations. (The response of “then you can transform it to HTML 4 for serving to the browser” misses the point that the XHTML could potentially be used by the client, which again can’t be done if you’ve blithely de-XMLed it.)

    The concern about whether browsers will somehow do “the wrong thing” with MIME types strikes me — even after reading Mr. Hickson’s generally well-reasoned argument — as a concern for purity for its own sake. There doesn’t seem to be any damage done to my web site if Internet Explorer parses my XHTML as “tag soup” as long as it’s being displayed with full functionality; thus, even if the advantage of giving consumers a document parsable as XML is largely theoretical, there’s no practical reason not to give them that advantage.

    I would re-state the real concern as “if you’re going to serve XHTML documents, make sure they’re properly validating.” This can be easier said than done even if you know what you’re doing (my home page inserts a weblog fragment from LiveJournal, for instance, which insists on using proprietary attributes that generate warnings), but avoiding genuine errors is just not that difficult. That many people don’t do it is not sufficient reason to tell them “just use HTML 4 instead”: I know I’m going out on a limb here, but people who don’t validate their XHTML will probably not validate their HTML, either.

  23. Jacques Distler

    My favourite incompatibility (”favourite”, because it invariably manifests itself in catastrophic and unexpected ways) between text/html and application/xhtml+xml is the XML normalization of white-spaces in attribute values.

    If the author thinks that

    If you ever switch your documents that claim to be XHTML from text/html to application/xhtml+xml, then you will ….
    That only applies if the document wasn’t valid in the first place, …

    I suggest he try converting this blog to application/xhtml+xml. That should be something of an eye-opener.

    While I’ve nothing against tag-soup XHTML (XHTML as text/html), no one should delude themselves into thinking that it would function as “real” XHTML, without a major recoding effort.

  24. Brad

    Jacques: That’s a bit of a silly point because I didn’t write the code for this blog. It’s WordPress.

    If you were to check a site that I did write by hand, you’d see that the content-type switch would be inconsequential for browsers that support it.

  25. Jacques Distler

    If you were to check a site that I did write by hand, you’d see that the content-type switch would be inconsequential for browsers that support it.

    Too funny!

    You didn’t actually try it, did you?

    The first thing I looked at was the javascript on this page. Guess what? It uses document.write. Totally non-functional, when served as application/xhtml+xml. I didn’t look any further, but I’m sure that’s far from the only thing which would break.

    Face it, Hixie’s right. Even the simplest of faux-XHTML sites, which I’m sure you lovingly ran through the W3C Validator, would break when served with the correct MIME-type. Now imagine a complicated, feature-rich site (like this blog).

  26. John Hansen

    Great write-up Brad! I wrote something very similar about a month ago. I didn’t name Hickson by name because I was responding to a growing attitude among many.

    My journal entry is at http://yellow5.us/journal/why_use_html_strict/.

    Thanks for giving my thoughts clearer words, and in some way, validation.

  27. Asbjørn Ulsberg

    Ian is absolutely right. Although more than 5 authors know how to write valid and well-formed XHTML, they (or may I say “we”) are a tiny minority. Most web authors and developers doesn’t even know that W3C exists and if they do, don’t know what they do, and if they do, don’t know how to apply their standards and the validator, and if they do, don’t know how to fix the problems in their documents.

    Anyhoo, most web developers are incompotent fools who should be doing something completely different, and these are the ones Ian is addressing in his article. Not you. Nor me.

  28. Asbjørn Ulsberg

    Oh and Brad, it’s absolutely not acceptable to serve XHTML 1.1 as ‘text/html’. XHTML 1.0 “may” be served as ‘text/html’, but XHTML 1.1 “should not”. Also, having document.write in “XHTML documents” is rather … interesting! :-)

  29. Sam Hill

    as well, regarding xkr.us, in your css you declare styles from body, but not html, so these styles would not work.

    body { font-family: Verdana, sans-serif; font-size: 13px;
    margin: 2em 15% 2em 15%; cursor: default; background-color: #fff; color: #000;
    padding: 0 3em; border-right: 1px solid #333; border-left: 1px solid #333 }

  30. Sophie Dennis

    I also found Ian’s article unconvincing when I read it a while ago. It raises a lot of valid issues about authoring XHTML for serving as XML, but I don’t think it ultimately offers any knock-out argument for the dogmatic “HTML good, XHTML (as text/html) bad” position some people have adopted. There’s a decent argument that XHTML as text/html has little or no advantage over HTML 4 Strict, but this doesn’t mean there’s anything inherently wrong/evil/stupid in a choice of XHTML instead.

    IMHO the choice of XHTML vs HTML 4 Strict is one of personal preference not dogma - despite what certain people want to make it. It is more important that your document properly validates to the declared doctype, than exactly what that doctype is. Invalid HTML 4 Strict is as “harmful” as invalid XHTML, no matter how they are served. The programmers I work with tell me that XHTML is easier to parse - in terms of manipulating content via PHP or whatever - than HTML due to the reliability of closing tags etc… But no matter which is used, invalid code is the real problem.

    I don’t entirely buy the argument that valid XHTML is harder than valid HTML4, though valid *XML* certainly is (because a doc can be valid XHTML, but not valid XML due to character sets, script, styles etc… as outlined in Hickson’s article).

  31. RB

    Oh and Brad, it’s absolutely not acceptable to serve XHTML 1.1 as ‘text/html’. XHTML 1.0 “may” be served as ‘text/html’, but XHTML 1.1 “should not”.

    Not to metion the fact that an XML declaration is required when serving an XML document with a character encoding that is not UTF-8 or UTF-16. And even then, including an XML declaration is strongly encouraged

    Yeah, nice try.

  32. yeah

    I must disagree with Hixie because I’ve been doing xhtml 1 strict for years. I avoid script and inline CSS and serve with HTTP vary and as application/xhtml+xml to supporting clients.

    The article authors site http://xkr.us/ is a disaster, the issues with using document.write() instead of createTextNode() or whatever() have already been mentioned. I’ll list some more obvious issues:

    It’s served as text/html even though my UA advertises the correct MIME type.

    Lacks XML declaration

    Doctype is XHTML 1.1 which cannot be served as text/html

    Uses meta tag with application/xhtml+xml content type when MIME is already set to text/html. Somewhat ironic that this meta tag would have no effect on an XML document.

    charset=iso-8859-1 is set in a meta tag, ignored by XML which expects utf8 encoding. See also lack of XML declaration.

    Now and only now am I starting to see Hixies point.

  33. Mike Cherim

    Nice article. I always felt the word “harmful” was a little strong.

  34. Evan

    I don’t entirely buy the argument that valid XHTML is harder than valid HTML4, though valid *XML* certainly is (because a doc can be valid XHTML, but not valid XML due to character sets, script, styles etc…

    Um, no… XHTML is XML. That is the whole point of the exercise.

  35. En webbplats på svenska om xhtml » Varför jag skriver giltig XHTML

    [...] Webben är fantastisk. Just när jag började komma fram till ett sätt att introducera Sending XHTML as text/html Considered Harmful to Feelings på utan att blåsa onödigt liv i den onödigt inflammerade debatten om HTML kontra XHTML så upptäckte jag att det faktiskt redan var gjort borta på about.com. [...]

  36. Jacques Distler

    Now that we’ve had our fun picking apart http://xkr.us/ (To Yeah’s summary, I’d add the CSS issue noted by Sam Hill, and some instances of forms with hidden form fields, whose values are supposed to be white-space significant. Quite accidentally, the actual example is OK, but the technique itself is unsafe in an XML context.), what can we learn from the wreckage?

    I think the answer is clear. If you want a site to work as real XHTML, you need to develop and test it using the correct MIME type. Faux XHTML, even when written by a conscientious developer, who assiduously runs his pages through the W3C validator, will fail miserably when served as XML.

    That was the point of Hixie’s article, and I think we’ve seen a fabulous illustration.

  37. Max Design - standards based web design, development and training » Some links for light reading (2/11/06)

    [...] Sending XHTML as text/html Considered Harmful to Feelings [...]

  38. Web Standards Group: Here’s a Bit of Light Reading : Jason Ruyle

    [...] Sending XHTML as text/html Considered Harmful to Feelings http://h3h.net/2005/12/xhtml-harmful-to-feelings/ [...]

  39. mike

    I’m afraid that your entire logic is flawed: merely proving that all of the other authors individual arguments were incorrect does not in itself prove that his overall argument was wrong.
    Even if we assume that you are correct that there is nothing harmful about serving XHTML as text/html you have not made the case that there is any good reason why we _should_ do so.

  40. Pixel Surfers » Blog Archive » Light Reading From WSG

    [...] Sending XHTML as text/html Considered Harmful to Feelings http://h3h.net/2005/12/xhtml-harmful-to-feelings/ [...]

  41. Brad

    Jacques:

    Hah. Completely my fault for being cocky and assuming it would work. Those pages were never done with the intention of sending them as true XML. Oh well. I’m also aware of the XHTML 1.1 problem — some of those pages were made when I was experimenting in the XHTML scene and hadn’t read the spec or heard that advice yet; I have since recanted and use XHTML 1.0 Strict consistently.

    So, Ian’s point is valid for authors who satisfy both of the following criteria: (1) they have no (overt) intentions of sending their document as application/xhtml+xml and (2) they end up sending their document as application/xhtml+xml. For all cases that don’t satisfy these two criteria, though, the point remains that there’s nothing inherently harmful about sending XHTML (1.0) documents as text/html as things stand today.

    Mike:

    I did supply a positive argument against the original point actually: XHTML is pretty and logical. I’m well aware that this argument won’t be enough for many people, but it only needs to be enough for me.

    Thanks for all of the discussion!

  42. Jacques Distler

    As I said, right from the outset, I don’t see anything horrible about “faux XHTML” (XHTML, conforming to “Appendix C”, and served as text/html). Just as long as no one has any illusions that it will work as “real XHTML” (XHTML served as application/xhtml+xml).

    So, Ian’s point is valid for authors who satisfy both of the following criteria: (1) they have no (overt) intentions of sending their document as application/xhtml+xmland (2) they end up sending their document as application/xhtml+xml.

    Ian’s point is valid for people who think that they might someday have reason to send their document as application/xhtml+xml. Even relatively clueful authors, such as yourself, will be horribly disappointed to discover that won’t work.

    The W3C’s rationale for creating “Appendix C XHTML” in the first place was to ease the transition to “real XHTML.” Maybe someday, down the road, you’ll want to include some inline SVG or MathML or whatever. Or maybe you just want documents that can be reliably processed (server-side or client-side) using XML tools. [Treating text/html documents as XML is evil! Sorry, but one has to draw the line somewhere.]

    Whatever the case, faux XHTML has failed rather badly as a stepping stone to real XHTML. It has become, in practice, simply another dialect in which one can compose tag-soup. It's not clear that the world really needed another such dialect and, from the point of view of browser authors, like Ian Hickson, it would seem only to have made their job a wee bit harder.

    Ian's article may not do much to stem the tide of faux XHTML. But, at least, it can serve to raise awareness that faux XHTML is not the same as (and is unlikely to be interoperable with) real XHTML.

  43. Ben 'Cerbera' Millard

    Jacques, I’d just like to praise you on the way you’ve handled this article. It’s unfortunate that most people who know about how web technologies are very dismissive of articles which don’t match the realities. But you’ve patiently described the issues, with examples. You’ve educated instead of insulted. :)

    Brad, I’m glad you’re starting to realise that text/html xHTML isn’t compatible with application/xhtml+xml XHTML. Since that’s the case, there is no advantage to authoring Appendix C xHTML in preperation for some sort of future switchover.

    Indeed, the slightly more verbose syntax of xHTML means it will be slightly slower to transfer than the equivalent HTML. Additional attributes such as xmlns, using xml:lang for every lang and so on mean that HTML is demonstrably more efficient than Appendix C compatible xHTML.

    I think everyone who gets “sold” on the “purity” of xHTML is unwilling to admit it might be faulted. Personally, it took me a long to to get over the hype xHTML has and realise that HTML is the better format in text/html environments.

    (Could you indicate somewhere what markup elements, if any, are allowed for people who want to enrich their comments?)

  44. Ephram Zerb » Heuristics for Choosing a DOCTYPE in 2006 (First Stab)

    [...] The recommendation is largely derived from the DOCTYPE used by Roger Johansson, who unknowingly is my accessibility mentor. He certainly has thought about the DOCTYPE in the context of accessibility more so than I have. He has argued for Strict DOCTYPES in the past and it would appear that other accessibility professionals share that view - judged by a quick survey of the websites they produce. However, the divisive Sending XHTML as text/html Considered Harmful, apart from other considerations, still leaves Strict XHTML versus Strict HTML unresolved. [...]

  45. Martin Payne

    I can kind of see the original author’s point about people copying the code from other peoples’ sites, and ending up with invalid code, but that would happen no matter which version of (X)HTML was in use. It’s just the way many people (myself included) learned HTML.

    It’s not all that hard to write XHTML code which still works as intended with HTML UAs though. My scripts and styles work the same no matter which of the two MIME types my pages are sent with. I think the main problem is that many modern web sites are made by graphic designers, and graphic designers seem to know little about coding (likewise with coders trying to do graphic design).

  46. Max_B

    Could someone outline a clean, safe process to validate xhtml/js for both case of mime type?
    Is it enough to validate the markup at W3C and then serve the file as html, then as xml to an xhtml capable browser?

  47. Ghetto Pixel

    XHTML 1 was merely based off HTML 4.0 strict which was very ill-semantic in well-formed-ness. While XHTML 1.1 was merely meant to merge the technology and semantically capable ability of XML it was not meant in anyway to be in-secure if served as text/html. The fact is if you are truely serving your document with standard HTML then you can serve it as text/html, no harm done, as far as the complicated strings thats bullshit because we must remember that all XML data has to be CDDATA escaped to be truely well-formed. The other fact is that XHTML served as an XML application cannot contain some of the obtrusive and DOM h4×0ring that people have because the DOM cant be tampered with mid parse in XHTML. The question shouldnt be is it insucure it should be ‘What are they going to do now that Microsoft claims to be adapting this technology into their browsers?’

  48. Nathan Logan

    Jacques and a few others seem to be posturing here. The fact of the matter is that if we end up swapping a particular page/site over to XML, the transition will be much easier from XHTML than from HTML (strict or not). We’re talking about the difference between a few tweaks and an entire rewrite.

    I shudder to think of a custom CMS written to HTML (and not XHTML) spec needing to make the transition to XML. We are talking about some major logic rewrites, not just a tweak here or there.

    Thanks for the great discussion. I’m learning a lot.

  49. cjm

    Nathan - help me out here. What is this “major logic rewrite” you’re so concerned about? Literally the only relevant difference I can think of between strict HTML 4.01 and XHTML is the self-closing tags (assuming you aren’t doing something silly like mixed-case tags in the HTML). Maybe I’m missing something obvious. But given that XHTML 2.0 is not even backwards-compatible with 1.0 or 1.1, I’m not seeing that writing faux XHTML now gives any real advantage over strict HTML 4.01 in terms of some hypothetical future transition to real X(HT)ML.

  50. Nathan Logan

    assuming you aren’t doing something silly like mixed-case tags in the HTML

    Ironically, you partially proved my point right there. Additionally, though, there are closing tags that aren’t self-closing (like p tags) that may have to be re-written. The difficulty could come in complex/clever pieces of code that don’t take closing tags into account. I would prefer to not contrive something, for fear that you would call it a straw man, but the fact of the matter is that writing server-side logic without closing tags taken into account may result in some very different code than when it’s taken into account.

    And just in case you don’t believe me, I’ll give you an example from an application I wrote recently. I wrote a CMS (of sorts) that took normal line-break formatted text and output it properly wrapped in p tags. This function had to be much different than PHP’s nl2br() or a simple find and replace, directly due to the fact that I had to keep track of when a paragraph starts AND when it ends. So it’s that kind of logic that can either be written to begin with or rewritten later at much more expense.

  51. I'm holding out for CSS4, myself « {coyote.yaps}

    [...] XHTML, just a reformulation of HTML in an XML-compatible format (and even that isn’t without controversy). Budd’s proposal is, in a nutshell, “backporting” from the in-gestation CSS3 [...]

  52. h3h.net - Technorati Cosmos Links Display for WordPress

    [...] can see the script in action on one of my more controversial posts, like this one. Scroll down just past the article, before the comments and see the “Links to this [...]

  53. xhtml « Rofrol blog

    [...] 2005-12-21 Brad Fults - Sending XHTML as text/html Considered Harmful to Feelings [...]

  54. HTML, XHTML, HTML 5:miti e leggende metropolitane | biroblu

    [...] a base di colpi di / tristi comportamenti non dimostrati dei parser, correndo poi a dire “ma no, non hai capito, io parlavo soltanto con chi le pagine Web non le sa fare” (che tristezza…), nonostante l’esperienza pratica di chiunque abbia mai sviluppato [...]

  55. JT

    I agree with Brad. I’ve been using 1.1 for development for a little over year. I couldn’t make a case for 1.0 strict and the declarations are messy. It forced some good habits on me. Yes, I validate. It’s a tough call, and I can see both sides. Sometimes I wonder myself if I’m on the right side. However, IMHO:

    - If you do 4.01 strict, you pick up many of the disadvantages of XHTML I don’t know what to say here other than you can use a consistent content type in IE, but I don’t see any indication that is an advantage. If I were using 4.01, I’d probably use transitional. I have more pros to offset the cons of not using XHTML.

    - You can’t really make a case that a lot of developers don’t validate or know what the W3C is without also making the case the HTML 4 spec is pointless also. The only thing you need for a web page is a file extension the web server understands. We validate so we aren’t debugging strange behaviors in different browsers forever.

    - HTML 5 is going where 1.1 is now and beyond. They are a LOT alike.

    - XHTML 2 is not relevant. XHTML 5 will be the next version. XHTML 5 will be backwards compatible with 1.1, not 2.

    - XHTML 5 and HTML 5 are on parallel and coordinated paths and are designed to be compatible with each other.

    - HTML is like having a compiler with poor error checking. So many things are legal that you can write ambiguous code, and THAT’S what keeps people busy trying to make it work in all browsers. IE doesn’t have trouble as long as you are feediing it text/html. For FF, it will choke if it isn’t right, which is what I want.

    - I miss my iframes and dynamically code my away around them based on browser, but they are going away anyway just like most of the other things that HTML 4 has. I hack around a little with target= with some javascript, but it’s in place now, and I don’t think about it anymore. I may not be able to do that forever, but they are going to have to come up with something or HTML and XHTML 5 won’t be used by many outside of those on the committees. XHTML 1.1 stretches me about as far as I want to be stretched at the moment. They are going to need to replace a few things, not just take them away if either of the version 5s are to be actually used.

    XHTML and the next versions of HTML most certainly force the division of content and presentation than is the case with HTML 4. It took me awhile to just to get over the tag.

    - I can see both sides. It’s a tough call. Some might have to do with if you generally code static or dynamic pages. Everything I do is PHP whether it needs to be or not simply so I can use code anywhere I want to control things. The doc and content type are taken care of automatically based on the browser by using an include at the top of every page. If I did more of a mix or heavy static, I would probably see things differently. There are always times when I wish for HTML 4 transitional, but I don’t want to code the way I used to either.

  56. JT

    PS: I’m not W3C shill either. As much trouble as IE gives me, I believe they had one thing right that should have been made into a standard, and that is, pixel based fon’ts don’t zoom unless the graphics zoom. Now we have no font for use around graphics entities such as mastheads. EMs and Points can take care of the rest. % is pointless for fonts because withf the nesting issue, the same css used two different places result in different size characters. Also, I would have never gone to XHTML 2.0 with what they had on the table. It is/was nuts.

  57. SneakyWho_am_i

    The article here reads as if you’re refuting the idea of XHTML ITSELF being harmful, which of course you aren’t and it’s not. Sending XHTML as HTML is definitely harmful in many cases. My own “main” site is a classic example of this. Some browsers fail spectacularly to render the pages properly. Opera Mini is a surprising and notable example. Hey, I can’t even fairly say that they don’t render it properly because the pages are JUST NOT VALID and I have NO IDEA yet what the problem is.

    Saying that XHTML as XML and XHTML as HTML are rendered exactly the same is utter rubbish (who said that, if anyone?) .. For one thing, XML documents have no incremental load, and therefore no document.write. For another, css becomes case sensitive. For a third, Firefox/Opera/whatever will chew up and spit out pages with blatant well-formedness errors.

    When you change the mime type, you rightly change all the rules for the page. The danger might be that we can never completely abandon HTML4 because there are so many fake XHTML pages (and there will be many more, you must agree) which are closer to validating as HTML (and vice versa!)

    Seriously, most of the web designers I’ve spoken to recently who’ve had 18 months of experience and taken classes in this sort of thing are still incapable of closing their tags properly in XHTML, and NOT doing it in HTML (let alone writing CSS and Javascript that can work in both kinds of documents)…

    Is it harmful to send XHTML as HTML? In my opinion YES! Absolutely. If I’d bothered to write valid code the first time around, my life would be a lot easier now. If my browser had treated the XHTML as XML the first time, I would have kept it valid.

    I’m not arguing for or against XHTML. I LIKE xhtml. I dislike Internet Explorer, partly for not parsing my xhtml correctly after all these years.
    I am saying, however, that whenever you teach someone to write (x/)html, the very first things you teach them should be “validate your markup lest you find it breaking down the track” and “know well the differences between xhtml and html. You must write one or the other, not both within the same document.”

    For a final, almost unrelated note… Many XML parsing tools can now parse HTML anyway, and can even be made to fetch text/plain and parse it as XHTML. So it all comes back to human, graphical user agents really. The first step is to get everyone using modern browsers. I personally will continue to use XHTML for most things, but I recommend that anyone new to it should learn to recognize XHTML while writing HTML.

    – I’m looking forward to the next version, when we can attach a src attribute to any old thing, and define our own separators.

  58. Comment Preview

Leave a comment

Comments are posted at the discretion of the site owner. Please try to be respectful, insightful and otherwise useful to society as a whole.

(X)HTML is allowed. You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <blockquote cite=""> <cite> <code> <dfn> <em> <kbd> <q cite=""> <samp> <strike> <strong> <sub> <sup> <var>