Tumbled Logic

A ragtag blog filled with random technical nuggets, rants, raves, occasional pretty pictures, and links to things.

Oct 9

303 Considered Harmful

I recently came across some guidelines for those getting started in the linked data world prescribing the use of 303 HTTP responses — not as a considered last resort, but as a first resort. It’s my view that actively encouraging 303 responses is generally unhelpful.

Obligatory side-note: I’m sure you know plenty of scenarios where 303 responses are perfectly reasonable and valid. I do too. My argument here is that encouraging 303s as the default approach to serving linked data is foolish.

Let’s start with some first principles. I don’t want to get into a particular merits and otherwise of httpRange-14, but you want to provide a way to distinguish between three different things:

  1. A resource
  2. A serialisation of that resource
  3. Something described by that resource

Before you run off on a “but I don’t need to be able to distinguish between those things!” kick, be aware that people consuming your data may well do. If they don’t, and you can be sure of that (you can’t), then this entire discussion is moot anyway.

The differentiation between (3) and the others is accomplished through the use of fragment identifiers — something with a fragment identifier in the context of linked data (e.g., #id) is a thing described by a resource, while something without one is a resource or particular serialisation of that resource.

Now, linked data works on a principle which I’ve talked about in the past, and expressed in presentations as my golden rule:

Give everything a URI, and make the information about that thing accessible at the URI

It also takes advantage of a property of most HTTP user agents, in that they’ll strip off any fragment identifier present in a URI before requesting it from a server, and also the ability of HTTP servers to perform content negotiation to serve a resource in one of a number of formats (serialisations).

Let us take the most recent — at the time of writing — episode of the BBC television series QI as an example. The episode itself is assigned the identifier http://www.bbc.co.uk/programmes/b015qqbc#programme. That is the canonical public URI for episode 5 of series 9 of QI. Now, you want to obtain information about it in a machine-readable form? Easy:

  1. You tell your user agent what the episode URI is.
  2. A request is made to GET the resource /programmes/b015qqbc (note the missing fragment identifier from the request-URI) from www.bbc.co.uk, setting the Accept request header according to the kinds of structured data your agent understands — for example, you might accept application/json.
  3. The server responds.

Okay, so that’s ambiguous. The detail of step 3 depends upon on how things are set up — and this is all assuming the server is actually able to serve the resource serialised in the way that you requested. In a 303-based system, step 3 is actually:

  • The server responds with a 303 See Other response, with a Location header containing the URL of the resource serialised according to your request. For example, in the above example it might (it doesn’t) respond with a Location header set to http://open.bbc.co.uk/data/programmes/json/b015qqbc.json.

Sounds reasonable enough, doesn’t it? Well, it’s not terrible — if you have no choice but to serve your various different serialisations from completely different locations — but there are downsides:

Pretty straightforward, this: your user-agent has to go through an additional request/response cycle before you actually get the data you want.
A redirect response typically causes anything which presents URIs in a user interface (such as that most useful of debugging and testing tools, the web browser) to update its display accordingly. This means that the canonical URI for your thing never sticks around. The whole point of linked data is to serve up information for both humans and machines at the same place, and this breaks if you are overtly punted to a specific representation depending upon who you are. Chances are, you’ll put all of your content-negotiation logic in the place serving the canonical URI and won’t bother for the specific representations. This means that if somebody copies and pastes a URI, there’s a pretty good chance it won’t be one which can negotiate a serialisation that a user agent can understand.
Never underestimate the little things: in particular, hackable URIs. If you’re debugging linked data systems, it’s very useful to know that you can — when testing — just throw in a “.json” extension to the path in the request-URI and force the return of a JSON version. Does your 303-based system understand that? Probably not.

The good news is that there is an easier way. It replaces the step 3 described above with one which reads:

  • The server responds with the serialisation that you requested, setting the Content-location response header to the canonical URI of that specific serialisation.

See? Isn’t that just all-round better?

Couple this with a dose of predictable hackability, and in our above example what will happen is that the server will respond with the JSON data for the episode, and include a Content-location header of /programmes/b015qqbc.json.

The whole exchange occurs within a single request/response round-trip, you’re not redirected anywhere (so you don’t lose visibility of the canonical URI), and — if you name your serialisations sensibly — you can make life easier for developers and systems which aren’t fond of content-negotiation (e.g., some lumps of Javascript in web pages).

Side-note: It’s certainly true that you could serve up a 200 response, enclosing the serialisation, but specify a Content-location somewhere else entirely (as with the 303 example) breaking hackability — but why on earth would you snatch defeat from the jaws of victory like that‽

The fun part is that the non-303 variant is exactly what you get if you go the really simple route and just throw a bunch of sensibly-named files in a directory and enable Apache’s mod_negotiation and Options +MultiViews.

Now, I will cheerfully accept that there are situations where serving up 303 responses is the result of a choice between that and not making linked data available in an easily-discoverable location at all, and in those situations you should absolutely do so — but it being the default choice is crazy. For the sake of my sanity as a developer consuming your data, make it a last resort.

  1. nevali posted this