Content negotiation is how an HTTP server decides what version of a resource to return.
When your browser requests something by HTTP, it sends its file-format preferences in the
Accept header, which looks something like this:
Accept: text/xml, application/xml, application/xhtml+xml, text/html;q=0.9, text/plain;q=0.8, video/x-mng, image/png, image/jpeg, image/gif;q=0.2, */*;q=0.1
That's a list of the mime types it can handle, with the
q= parts as weighting factors to show preferences between formats of the same approximate type. The browser in this example prefers HTML to plaintext wherever possible, and will accept any format of anything in a pinch. (Most other formats would probably make it ask the user to pick a handler program or save the file the disk.) The browser sends similar headers to list preferred languages, character sets, and arbitrary other features.
Why does the browser bother with this exposition if it's already sending the URI of the file it wants? Because the URI is looser than a filename: it specifies a resource (a conceptual entity, like
today's weather), not a bitstream (like the string
partly cloudy). To the server, the URI is something like the name of a function to which the Accept headers and the server's own environment are passed as arguments. It is not and never has been guaranteed that any two different user agents will get the same page from a given URI.
For example, one URI of this page is http://everything2.net/index.pl?node=1507017, but the rendered page you're seeing is not a perl script, as the
.pl suggests – it's the output of a perl script. The perl was only an intermediate step in producing HTML, and the server sent a
Content-type: text/html header to make that clear. Dynamic pages demonstrate that URIs are not just aliases to static data; they are constructors for relatively abstract and conditional objects. All HTTP resources, like functions but unlike regular files, have a chance to adapt to their environment and caller.
The accept headers are instructions for adapting them. When a browser begins to render an XHTML page containing, say,
... there were <img alt='flourescent fish' src='http://example.com/~mbogo/fish' /> so I called ...
it sends off a request for the resource (resource, not file!) called
fish, plus an accept header containing, for instance,
image/svg;q=0.9, image/png;q=0.8, image/jpeg, image/gif. Apache (or whatever) in turn takes
/~mbogo/fish and finds everything matching
/~mbogo/fish.* – all the files that start with the name of the resource – and sends the file whose type corresponds with the highest weight in the headers. That's content negotiation.
The disadvantage is that server has to check for several files, which adds a few millisecond per request. The main advantage is that people always see the most appropriate page possible: language preferences and so forth are factored in the same algorithm, so people with Danish-localized browsers see Danish pages and people with Thai-localized browsers see Thai pages at the same URI (which further headers keep from mis-caching).
The principle here is that URIs are abstracted; just as they rarely map exactly to paths on the server (what you see at ~user/ over HTTP is usually something like ~user/html/, for instance, and what you see in a directory is presumably merely its index.suffix file), they can refer to a yet-to-be-resolved variant of anything.
The mechanism for storing variants on the server can be virtualized to any degree. You could use a database, files with associated custom type-map files, header-checking scripts, or whatever. Apache's default happens to be as files named x.y.z, where x is the resource name, y is the mime-type suffix (according to the default type-map), and z is the language name. This is ugly, but it's only one step from the inured ugliness of file suffixes. Luckily, since the server can return any format it pleases for a given URI, and always specifies it in the
Content-type header, files suffixes are completely unnecessary in URIs. http://example.com/zoot is a valid address whether or not
zoot is a directory: it happens not to resolve to anything, but nothing structurally necessary is missing. (Wikipedia is one of several well-known sites to eschew file suffixes more or less entirely.)
As Tim Berners-Lee pointed out,3 there's no reason to put any artifacts of implementation in URIs. If your page is at something.php3, you have to put up a redirect (or ditch anyone who linked to you) the moment you upgrade your PHP, which is annoying for everyone and in the long run probably cancels out any speed gain from not using content negotiation. Your version of PHP (and the fact that you use PHP at all instead of perl or ASP or whatever), is as irrelevant to your users as the architecture of your server's processor. Mime types don't belong in ordinary addresses.
So content negotiation lets any number of people use a single URI to give them the best and most customized version possible of a given thing, and keeps the URI clean. Good work, content negotiation!
You can turn it on in Apache with the
multiviews directory-level option. For example, in httpd.conf:
References and further reading:
- RFC 2396: Remote Variant Selection Algorithm 1.0: http://www.faqs.org/rfcs/rfc2296
- Content Negotiation: http://httpd.apache.org/docs/content-negotiation
- Cool URIs don't change, by Tim Berners-Lee: http://www.w3.org/Provider/Style/URI