XHMTL

XHTML has been the successor to HTML since 2000, but it hasn’t seen much uptake because most of its advantage lies in its use of XML as the tranport format. Using XML would seem to be an advantage if you know much about it, but in the wacky world wide web, nothing is quite so simple. XHTML allows you to send pages as text/html if you like, but it really prefers application/xhtml+xml, application/xml, or text/xml in that order. The problem is twofold. First, some web developers balk at the pickiness of XML. If a document is not well-formed (that is, it contains syntax errors like <p<b>), user agents are supposed to refuse to render it. Second, IE fails to recognize the application/xhtml+xml mime type, instead offering to download the page; if you try to give it application/xml, it doesn’t understand that it’s looking at XHTML and doesn’t do any styling.

Naturally, this situation has dampered developer enthusiasm for XHTML. But there are real advantages here: a cleaned up HTML with few presentation details inside a robust, internationalizeable markup, and XML plays well with other XML formats, like MathML or SVG. XHTML is the way of the 21st century, and it’s time we used it.

The first problem is just web developers being stubborn. If your document is not well-formed, how can you expect user agents to properly understand it? You can’t always rely on what current browsers happen to do when given malformed markup.

The second problem we can actually do something about. We could do some sort of browser sniffing and serve the document as XML to those that can and text/html to those who can’t, but browser sniffing is notoriously bad and not all browsers declare their capability to read XML. We would also have to make sure that we never took advantage of XML because we had to remain backward-compatible for those reading the HTML version, and the differences are large enough to cause problems when XHTML is sent as text/html. Maintaining two versions of the document is too large a headache to be worth it. Thankfully, the only modern browser holding out is IE, and we can surprisingly solve that problem. The W3C itself has discovered a workaround for IE, letting it display XML as HTML. First, you are going to want to make the top of your XHTML file look like this:

<?xml version="1.0" encoding="utf-8"?>

<!-- To trick Microsoft Internet Explorer -->
<?xml-stylesheet href="copy.xsl" type="text/xsl"?>

<!DOCTYPE html 
     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xml:lang="en-us" xmlns="http://www.w3.org/1999/xhtml">
Most of that is standard fare required by the W3C for XHTML files. However, you’ll notice the XML stylesheet linking to copy.xsl, a file that will look like this:
<stylesheet version="1.0"
     xmlns="http://www.w3.org/1999/XSL/Transform">
    <template match="/">
        <copy-of select="."/>
    </template>
</stylesheet>
This is an XSL transformation that essentially makes a copy of the current page. No change for intelligent browsers, but for some reason IE thinks you are translating from XML to HTML and displays it as such.

The last step is to serve your pages as application/xml. IE will still not touch application/xhtml+xml, so we’ll have to settle for the next best thing. You can do this by changing the file extension to .xml (which shouldn’t mess up your links if you’ve been paying attention) or configuring your web server to serve it as application/xml for you.

This works on Windows IE 6.0 and should work on Windows IE 5.0/5.5. It does not work on Mac IE 5. I am interested in reports of other versions of IE for which it does or does not work. Note that Google does understand application/xml, so your ranking shouldn't suffer.

And, of course, don’t forget to always validate!

Metadata

The semantic web’s big deal is metadata: inter-linking sources of data and metadata all joined at the hip. A widely adopted format for metadata is RDF, which we will use to talk about our website and ourselves. You’ll want to create a file at the top-level of your website called metadata.rdf or whatever. It will basically look like this:

<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF
	xml:lang="en-us"
	xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
	xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
	xmlns:foaf="http://xmlns.com/foaf/0.1/"
	xmlns:dc="http://purl.org/dc/elements/1.1/">

</rdf:RDF>
Inside of the rdf:RDF tags, you will insert metadata of various kinds. I suggest you read up on common kinds, like FOAF or Dublin Core. Here is some sample metadata for Jane Doe and her page about airplanes:
<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF
	xml:lang="en-us"
	xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
	xmlns:foaf="http://xmlns.com/foaf/0.1/"
	xmlns:dc="http://purl.org/dc/elements/1.1/">

	<foaf:Person rdf:nodeID="janedoe">
		<foaf:name>Jane Doe</foaf:name>
		<foaf:mbox rdf:resource="janedoe@example.org"/>
	</foaf:Person>

	<rdf:Description rdf:about="http://www.example.org/airplanes">
		<dc:creator>Jane Doe</dc:creator>
		<dc:title>Commercial Airplanes of the 1960s</dc:title>
		<dc:date>2004-02-19</dc:date>
		<foaf:maker rdf:nodeID="janedoe"/>
	</rdf:Description>

</rdf:RDF>
There are all kinds of RDF vocabularies, covering a wide range of topics. I recommend you explore your options, because there’s a lot more you can do than the simple example above. You can stick as much metadata in the file as you like, about any number of people or pages. You are welcome to take a look at my metadata file as an example.

In order to let people know about your metadata, you need to add a tag to your XHTML pages to point to metadata.rdf. In your head section, add the line:

<link rel="meta" type="application/rdf+xml" href="metadata.rdf"/>
Now user agents reading your website will see a possible source of metadata if they want. Most will not use it, but some user agents can. You can also register your file with some of the FOAF repositories floating around for wider distribution.

Bam! You are now a part of the semantic web.