Tumbled Logic

A ragtag blog filled with random technical nuggets, rants, raves, occasional pretty pictures, and links to things.

May 24

An essay on digital identity and democracy

This is not my first post on this topic, but it is probably the longest. It doesn’t contain any source code nor details of protocol exchanges, and it’s written in something vaguely approaching plain English. If you know me quite well, none of this is new, but it’s possibly the first time I’ve strung it all together into something attempting to be coherent.

First, a gentle introduction to public-key cryptography

I promise this won’t be as scary as it sounds.

Public-key cryptography was one of the more important developments of the twentieth century. It underpins most kinds of secure communications around the world, to the extent that problems with specific implementations get geeks even more nervous than reports of vulnerabilities in Adobe Flash and Oracle Java.

The principle is quite straightforward: instead of having the same key which both encrypts and decrypts data, there are two keys which have a particular mathematical relationship. If you encrypt with one, you can only decrypt with the other, and vice versa. What usually happens is that one key is designated the “public key” and distributed to anybody who might need it, while the other is the “private key” and is kept as secret as is humanly possible.

The mathematical relationship is such that although it’s quite easy to generate the pair of keys, it’s computationally extremely difficult to calculate the private key if all you have is the public key.

(If you’re interested, it’s because determining the prime factors for extraordinarily large numbers is not something even a huge fleet of computers can do quickly, even though your smartphone can do the opposite—that is, find some large prime numbers and multiply them together—without much effort).

The nature of this public-private “keypair” means that you can generate your own keys at will, and distribute the public key far and wide. Indeed, there are servers on the Internet whose sole purpose is to make it easy to redistribute public keys. Then, anybody who can find your public key can encrypt a message that only you can decrypt (or, if you’re careless, somebody else with your private key).

Although this all represented a revolution in cryptography, it’s not actually the most interesting thing that it made possible. The most interesting thing is digital signatures.

Digital signatures are a way of using public-key cryptography to create a kind of message which can only have been generated by a particular private key, but can be independently verified by anybody with access to the corresponding public key—in principle, verifiable by everybody.

They work by employing the public and private keys the opposite way around to when you want to encrypt something normally. First, you take the message you want to “sign” and generate a cryptographic hash of it: that is, you (or rather, your computer) performs a specialised mathematical operation which generates a fixed-length code based upon the contents of the message.

The key properties (no pun intended) of the hash function are that changing the message will change the hash value, and that it’s practically impossible to figure out how to change a particular part of a message in a way which would result in a predictable hash value.

In other words, if you have a copy of both the message and what the hash value should be, and you independently calculate the hash value yourself and find it doesn’t match what you received, it means the message has been tampered with.

Once you have the hash value, you encrypt it with the private key to generate a digital signature. This means that everybody with your public key is able to decrypt it and check that it matches the hash value they calculate for the same message. If it can’t be decrypted with your public key, it means you didn’t use your private key to create it, and if the hash values don’t match, then the message has been tampered with in transit. If all goes well, we can say that the signature was verified.

In case you’re wondering, this isn’t some kind of bleeding-edge Tommorrow’s World kind of thing: cryptographic hashes and digital signatures are used extensively by the military and security services, by your phone when it talks to the network, by the chip in your bank card, whenever you press the “Connect to Facebook” button in an app, by your web browser whenever you visit a secure website, and much more besides.

If you didn’t quite follow all of that, here is a nice video that explains it (with a bit more maths):—

Assuming you don’t do anything crazy and keep your private keys safe, properly-implemented digital signatures are “strong” enough for use in legal contexts, and are much less prone to attack than a hand-written signature on a piece of paper which can be copied and faxed (yes, faxed—which is what people often have to do with hand-written signatures…)

Cryptography and you

You can use a digital signature to identify yourself to certain services, similar to a user-name and password. This is also a technology which is in common use, although the implementation in web browsers is very ugly to the point that normal humans tend to need a step-by-step guide with screenshots to be able to make use of it properly.

The exchange goes a little like this under the hood:

  • Server: Hello, who are you?
  • You: Hi! Here is my public key
  • Server: Thanks! The randomly-generated number for this exchange is 472648. Now, is that really your public key?
  • You: Yes, it really is! My signature for the message ‘472648’, generated with my private key, is ‘…’
  • Server: Perfect. I was able to decrypt that signature with your public key and confirm that the hash value matches what I thought it should be. Welcome!

If somebody else comes along, then they won’t be able to complete this exchange: the randomly-generated number (which in reality would usually be a bit longer) means that if somebody’s snooping on the exchange, they can’t go back to the server and pretend to hold the private key by sending the same message that you did—this is called a replay attack.

If they’re not trying to perform a replay attack, but simply don’t have your private key, they’ll fail to be able to generate a signature which can be decrypted with your public key, and the signature won’t be verifiable.

You can think of the public key itself as being the username in a traditional username/password setup, with “your ability to demonstrate possession of the corresponding private key” being the password.

A brief discourse on identification versus assurance

Like a username and password, public-key cryptography and digital signatures allow you to verify that the person who created an account was the same person trying to log in this time. What it won’t tell you—on its own—is what the person’s name is, or whether they have a bank account with the TSB, or where they live.

In other words, it allows a person to identify themselves, but it doesn’t provide any assurance about any claims that they make about who they actually are. The holder of a key can say all kinds of things, but there are only some circumstances where it makes sense to simply take their word for it.

In a traditional paper-and-ink world, identity and assurance were kept at a distance from one another. This worked quite well, and is in line with the “data minimisation” principles of data protection: only store what you actually need to.

For example, if you open a bank account, you provide a signature sample as your identification, along with the “one from column A, one from column B” forms of assurance, which are pieces of information about you that are provided by a third party your bank trusts to not lie about it (although slightly confusingly they are often termed “forms of ID”).

In the digital world, identification and assurance have tended to be mixed up together, resulting in it being actually very difficult to perform any kind of meaningful assurance processes: for each organisation which you might wish to issue you some piece of assurance data, you have a separate username and password, and nothing tangible which you can present to somebody else if they need it.

It gets really tricky in the many circumstances where you might need to provide multiple sets of assurance data, and at the moment the only practical way to achieve it is to outsource the whole process to one of a small group of “assurance brokers” who are able to establish relationships with everyone who might need to provide assurance data. Or, they can perform “proxy assurance”, which involves performing a traditional paper-based process to them periodically, and then them essentially asserting “we saw evidence on the 28th May 2014 that Rachel Jones has a bank account at the TSB”.

The model used to be one where the individual physically controls access to the flow of information, is able to provide it only where it’s needed, and only if they believe the entity they’re handing it over is trustworthy and will provide something worthwhile in exchange. As more services move online, we’ve shifted to a model where the individual has essentially no control over the flow of information about them and is told who they must trust.

It doesn’t matter especially whether you get instinctively twitchy at the loss of control, because this shift has plenty of other downsides that widespread Internet access was supposed to obviate. Creating a market for “identity assurance providers” (i.e., the middle-men), many of whom are profit-making companies, creates an imperative for what used to be a straightforward exchange of information for a service into an opaque black-box with strings and costs attached—and that applies whether you’re the service-provider at either end of the chain, or the individual in the middle.

For example, GOV.UK Verify is such an “assurance broker” scheme. Those participating in the scheme bid on quite large contracts put out to tender in order to provide their brokerage services to government departments and executive agencies who need assurance data. They had to be “certified” (that is, audited to help ensure they wouldn’t accidentally leave your assurance data on a USB stick on a train), and those contracts only cover the exchange of assurance information for government services. If you’re a service provider of some other kind and need some assurance data, you’ll need to contract one or more of them yourselves—and even then, it’s pretty likely that as an end-user you’d have to proffer up that data multiple times for multiple kinds of service.

In other words, in the process of transitioning services online the taxpayer has had to pay a small number of companies to perform a job that an ordinary human being could do (and, in fact, still has to do) on paper and made the world a more confusing and murky place into the bargain.

This isn’t to pick on GDS: they’ve done what they needed to in order to get the job done, but it’s very definitely a retrograde step in the grand scheme of things.

Chains of trust

Meanwhile, on your computers, phones and tablets are these things called “certification authorities”. These are the public keys of organisations around the world who for a (variable) fee perform a kind of assurance service.

The way that it works is this: you need to run a secure website, and so you need to get a digital certificate, which is a standard way of representing a piece of assurance information online. It contains your public key (to identify you, the subject of the assurance statement), along with a signature and associated public key from the certification authority who issues it.

Anybody receiving the certificate can unwrap the signature and confirm that it really was issued by certification authority, and that it really does contain your key as the subject. The same signature-verification identification process described earlier plays out in reverse when you connect to a secure website: your device asks it to verify that it holds the private key, and cross-checks it with the certificate to determine whether it’s trustworthy.

In practice you actually end up with chains of these certificates, with the website you’re visiting at the bottom, and what’s known as a root certification authority at the top. Your browser verifies each one in turn, going up the chain until it reaches the root, or encounters something which couldn’t be verified—resulting in a horrid and purposefully scary warning message.

Unlike the others below it in the chain, the root authority’s certificate isn’t issued by anybody, which is why a copy of it is stored on your device. Any certificates in the “root CA list” on your device are automatically deemed to be valid and verified, and as a consequence so are the active certificates issued by them.

This system was borne out of a pre-Web era where the national telecoms operator was the king and it seemed like a good idea for them not only to be the gatekeeper for your telephone and data lines, but also for the services that you accessed. This wasn’t quite as crazy as it sounds: prior to widespread de-regulation, there was often only one national operator, and because they installed your lines and had a billing relationship with you, they at least knew who you were and where you lived or worked. They were, in effect, the first digital assurance brokers.

Prior to the emergence Web, the thinking was very much of a federated model whereby services like Prestel constituted the online world, and so anybody wishing to provide online services also had to go through the same telecoms operators—meaning the assurance brokerage actually worked in both directions.

As the Web exploded, these same technologies were employed to ensure that secure websites could be verified, except that the “national telecoms operator” approach didn’t really work anymore.

Instead, organisations set themselves up as certification authorities—some of them independent, some part of larger corporations, and some of them government—and did deals with browser makers to ensure their certificates were included in the bundled “root CA list” (if they’re not in that list, it’s not be possible for your browser to verify the certificates issued to websites, because the root certificate at the top of the chain would be unknown).

This system persists more-or-less today, but is horribly broken (that link is not by any means to the only example, but is possibly the most well-known). For example, your computer probably trusts the China Internet Network Information Centre, GoDaddy, Swisscom, Visa, Wells Fargo Bank, the US government, the Taiwanese government and a whole heap of people you will never have heard of.

All of these entities, as well as the many more in the middle of these chains of trust, are trusted by your computer to perform assurance of web sites and services on your behalf.

In days gone by, this didn’t really matter too much: the stuff you did online wasn’t ever going to be that interesting to anybody (except perhaps your immediate friends and family), and you could always cancel your credit cards if the worst happened.

Nowadays, “the stuff you do online” encompasses so much more, and the list is growing all the time: banking, health, interactions with government. We actually do important things online now, and people haven’t yet stopped talking about trying to do really important things like voting in elections online. The ramifications in the event of a screw-up are growing in significance every day from the mere “minor inconvenience” that they used to be.

So, voting…

Voting in a modern free and fair election has a number of necessary constraints placed upon it which mean it’s not remotely as straightforward to shift online as, say, voting in The X Factor or Strictly.

Every stage of the process must be verifiable—by both the candidates (and their representatives), and by observers who keep an eye on behalf the electorate and anybody else impacted by the election. In other words, just about everyone.

The votes themselves must only be cast by those who are actually eligible to vote in the first place, and must be done so in secret and anonymously. This is so that duress cannot be applied either before or after the election has taken place.

Once cast, the votes must remain sealed (and disassociated with any information about who cast them), until the ballots close, to prevent undue influence being exerted over those who are yet to vote based upon information on votes which have already been cast.

Finally, there must not be any impediment to somebody who is entitled to vote actually doing so in practice.

All of this is quite tricky to replicate online without compromising some aspect of it.

The practical effect is that the process can’t readily rely upon voters’ digital identities being managed by third-party corporations such as Google, Facebook or Experian. All of the software involved needs to be open source and open to inspection by anybody, as do the network protocols and data flows, as well as any hardware that’s been installed, and the only “terms of use” which must be applied can be electoral law.

You can’t get into situations where somebody can’t vote because their account has been suspended, or an election has to be declared void because a system was compromised by a rogue employee, or because a dubious implementation meant that information about the votes themselves leaked.

On the plus side, we can already do online voter registration without huge problems, which means that the guts of the “eligibility” part of the puzzle has already been solved.

Putting it all together

Let’s start with the end result, because it’s useful to design things from the top down, defining requirements as we go:—

  • Ballots are unsealed and anonymous votes are counted.

To do this, votes must be encrypted when they are cast in a way which means they can only be decrypted for counting when the polls close—that is, the encryption and decryption keys must be different—a fairly clear-cut use-case for public-key cryptography.

The knotty part of this is actually keeping the decryption key secret until the right time, which isn’t how public-key cryptography is usually deployed. While somebody could make and sell a black box which generates a keypair and hands over the public key immediately, but holds back the private key until a certain time and date, it would lack flexibility and could well struggle to be sufficiently verifiable. Let’s put that in the “nice idea in theory” bucket.

Instead, we can rely upon three things: the fact that in a given constituency, the candidates are all competing with one another, that the returning officer is a person who exists, and that the law doesn’t cease to apply or be implemented once you do some things electronically.

Encryption keys are bits of data. If you plug them into a some software that does encryption or decryption then they become useful, but until then they’re just lumps of opaque binary goo like any other. You can copy them around, split them into chunks, get them printed on t-shirts, or turn them into abstract digital art.

The solution to the key hold-back problem is to generate and issue the public and private keys at the same time, but for the private key to be broken into portions and itself encrypted with the public keys corresponding to each of the candidates and the returning officer.

That means that you can only get at the private key for decrypting the votes when all of the candidates and the returning officer getting together and combine their decrypted portions of the private key into a single “constituency private key” which can unlock the votes—and they won’t do that until after the polls close, or they’ll find themselves getting arrested.

Of course, unfortunate events can occur, and so you could perform that process several times over, cutting the key different ways, so that only a certain proportion of the group is required for it to be quorate (but you need to take care that you don’t end up in a situation where a small subset can get together and derive the private key from their collected chunks).

This does require that each candidate and returning officer has their own keypair, but we’ll come to that.

So, we now have a set of encrypted individual anonymous votes, encrypted with the “constitutency public key”, and a way to decrypt them for counting when the time comes.

The actual vote-casting itself can work very similarly to postal voting today: an inner envelope containing the anonymous vote cast, inside another envelope which confirms your identity.

We have each voter generate their actual vote, which is simply a piece of information formatted in a particular way so that it can be counted—the digital equivalent of “place a cross inside exactly one box”. If it’s incorrectly-formatted or contains something else, the ballot can be considered spoiled. The piece of software used by the constituent and responsible for actually generating this data can make sure that it happens consistently, so spoiled votes only occur intentionally.

Once generated—whether a valid vote or a spoil—the voter can encrypt it with the constituency public key. This means that unless they choose to share their vote with somebody else, it’s only readable once the polls close and the constituency private key needed to decrypt it has been reconstituted (but once that happens, the vote itself can be read by anybody).

Having created an encrypted (sealed) vote, they can then sign the vote using their own private key. This is the direct equivalent of putting an envelope containing a ballot paper inside another envelope containing information that identifies the voter: except for the fact that it’s now really impossible to open the inner envelope until the polls close and the constituency private key is released.

For their vote to be valid, their public key must appear on the Electoral Roll, having been placed there when they performed voter registration.

This of course requires two things: that everyone has their own keypair that they control themselves, and that voter registration be extended to use the sort of key-verification process described earlier in order to place public keys on the Electoral Roll.

It does make voter-eligibility verification quite straightforward, but also the area which is likely to require the greatest degree of scrutiny; this is contrast to the paper-based system where counting is where nearly all of the auditing happens.

When voter registration closes prior to an election, a list of all of the registered public keys for a constituency can be generated. In fact, it only needs to contain the public keys themselves—there’s no need for other details about the electorate. The list can be digitally signed by the voter registration system generating the list so that it can itself be verified by each constituency’s voting system upon receipt. These lists can be published openly, allowing people to check that their own keys are actually on the list, and also that the numbers aren’t out-of-step with the actual population, and also distributed to each of the constituencies themselves.

At this point, we have a set of sealed and signed votes ready to submit, and a set of valid public keys for the electorate in that constituency. Conceivably, the votes could just be sent via e-mail to a specific address, but a somewhat more robust system that can provide instant feedback would be more sensible. For the sake of argument, let’s say that it’s web-based, but it needn’t necessarily be.

Upon receipt of a vote, the system can verify its signature against the key from the Electoral Roll. If successfully verified, it can strip it out, and store the encrypted (and now anonymous) ballot somewhere secure. Because the ballot is encrypted, it doesn’t need to be secret, just secured against data loss. In fact, it would be sensible for a copy of every vote cast to be forwarded to every voting system in every constituency—that way, the local counts in a General Election (for example) could be cross-checked 650 times once the constituency’s private key is released, and so also serves as a sanity check against the counting system.

So, end-to-end:—

  • a person in possession of a private key that only they control registers to vote (but otherwise much as they do today);
  • the corresponding public key is stored on the Electoral Roll;
  • it’s then distributed to the constituency’s voting system along with those of everyone eligible to vote in that constituency;
  • when voting opens, the public and private keys for the constituency are generated; the public key is immediately distributed far and wide (but principally to the constituents), while the private key is split into chunks and distributed between the candidates and the returning officer;
  • the person generates an encrypted vote, signs it with their private key, and submits it to the voting system;
  • the voting system checks that the public key is on the list of those eligible in this constituency, verifies the digital signature, and then both stores and forwards the vote;
  • when the ballots close, the candidates and the returning officer get together and decrypt the constituency private key which is then provided to the counting system and distributed widely;
  • the counting system decrypts each vote and tallies those against each candidate (and those spoiled)—others around the country do the same for both their own votes, and those from other constituencies for cross-checking purposes;
  • about an hour after polls close, the results can be declared nationally.

There’s a “but”…

This is not magic, and I have not described the precise details to the extent that you could confidently say “if you did it like this, then it would all go swimmingly”: there are attack vectors and things left undecided to a sufficient degree that you still need to think carefully about implementation (and consult widely with real experts) before pushing ahead with it.

To name a few, you need to think about to prevent denial-of-service attacks; what the protocols actually look like (taking care to prevent replay attacks and the like); how to ensure that private keys are actually kept secure; and define what “quorate” actually means for a group of candidates and returning officers (with delegates, presumably).

We also need to ensure that all candidates and constituents are able to generate and manage their own keypairs which only they (or their legal proxy, where needed) actually control.

However, doing all of these things, and implementing it as a well-documented, open-source system which is open to inspection throughout its development and operation—and, in particular, going to the trouble of inviting people to do so—is merely in the realms of “challenges” rather than really difficult system design: all of the technologies required to actually do it have been invented, tried and tested already.

By way of example, the devices that issue the constituency keys could be purchased as commodity PCs from a range of suppliers, all set up together in a controlled environment and running a particular audit-able stack, and then physically secured to prevent tampering ahead of distribution to each constituency. It’s not a great leap to think about security of these devices in similar terms as we’re used to in thinking about the security of tens of thousands of ballot boxes on election night.

What you really can’t do is compromise on the basic principles. What difference might it make, for example, to swap the signature-based voter-eligibility checks with one of a group of outsourced identity providers, for example?

Well, as much as their CEOs might be well-intentioned, you’ve just created a system where a group of private corporations literally control access to democracy, and introduced a whole suite of potential failure points which don’t exist in a decentralised world (private corporations tend to be resistant to forensic audit by members of the public and their staff now control en masse something which wouldn’t otherwise be collected together, and so become targets for criminals).

The approach set out above seeks to minimise the volume and scale of new things which could go wrong compared to paper voting, while also taking advantage of the efficiency gains that technologies of have brought.

However, there is one big challenge which remains, and needs serious effort: that of user experience. Public-key cryptography is extremely widespread, but individuals using it to identify themselves online tend to be confined to security experts and corporate users. For any of this to become a reality, operating systems and browsers need to get to a place where ordinary human beings can willingly use the technologies and understand what’s happening on their behalf. It needs to be both as easy and transparent as ticking a box and putting it into an envelope—and that’s no mean feat.

May cause side-effects

If we manage to get all of this in place, there are other things that we can do.

With everyone having their own keypairs, people can sign up for and into services without having to remember usernames and passwords, and without having to delegate the function to third parties who might suspend your account for spurious reasons.

We can break the link between the process of identification and the function of assurance, meaning individuals can control all of their identity, not just the bit of it they use to identify themselves as returning service-users. Making use of public-key cryptography and digital signatures means that assurance data becomes tangible again—something which can be kept safe by the individual until it’s needed, and passed on when required. Moreover, because it’s based on digital signatures, it can’t be forged or tampered with.

We can apply that to servers and services, too. Instead of relying on a shaky model of a near-endless list of certification authorities to tell us both that our connection to a website is secure and that they’re trustworthy for entering credit card details into, we can split that up and make it more granular–and more useful. With relatively little in the way of protocol changes, we could replace a blanket approach to trust with model where we trust the authorities which are actually meaningful for particular categories of transaction.

For example, if I’m visiting a site and all I’m doing is providing some personal details so that they can send me stuff, I don’t need to know whether their handling of money is up to scratch—but I do want to know that their registration with the Office of the Information Commissioner is in good standing, so I want a piece of assurance data sent by their web server, issued by the ICO, containing a copy of their Data Protection Registration certificate. If I do get as far as giving them some money, I want to know that their bank hasn’t frozen their accounts, that their annual return to Companies House is up to scratch, and that HM Revenue and Customs wasn’t at last check in the process of taking them to court over unpaid VAT.

Of course, people in different countries will want to trust different bodies to make assurances about these kinds of information, and the beauty of making the whole thing more granular is that it’s possible to do in any kind of a sensible way (currently, everybody everywhere would have to trust every authority equivalently, which would be nonsensical).

It also means that a certificate doesn’t have to be issued for sites which don’t collect any data—in other words, the barriers to the whole of the web being accessed securely (which also means free from tampering by intermediaries such as mobile phone and hotel WiFi operators) are vastly reduced because it now costs nothing more than a little configuration to make traffic to and from your web server secure.

Finally, we can simultaneously enhance privacy and impact upon reputation for actions. Because keys are generated and controlled by the individuals to which they pertain, there’s nothing stopping you generating a new key and using that for certain services: but with the caveat that the new identity would have no particular history associated with it, and would only have meaningful assurance data if you made it the key you use for accessing those assured services.

If you only ever used that key for writing abusive comments underneath YouTube videos, it would trivial for it to be summarily and automatically ignored by everybody who came into contact with it, because its entire associated history would consist of abusive comments underneath YouTube videos.

In contrast, if you do the same thing with a key that you use elsewhere, you run the risk of the link between your unpleasant activities and those conducted more normally being made quite easily.

Us geeks figured out a long time ago that public-key cryptography was pretty monumental, and it changed the way that we approached system design and security—and along with it, how a great many things that you use every day work behind the scenes. What we haven’t been able to do so far is put the technologies to good use in fixing some of the horrid technical hacks that have prevailed over the last couple of decades and in putting it in the hands of ordinary people so that we can all do important stuff better.

It’s time to change that. It’s time that technology stopped being an excuse to give people less control instead of more over their own lives, and it’s time that it lived up to its potential for making democratic expression more accessible and efficient instead of merely enabling a chaotic echo-chamber.

Mar 31

To all whom these presents shall come, greeting!

Reminder: this is my personal blog, conveying my personal opinions

When John Reith lobbied the government of the day for the creation of the British Broadcasting Corporation (having served as the British Broadcasting Company’s general manager for a year), it was in recognition of the fact that spectrum was a limited resource, and so broadcasting was too important to be left solely in the hands of those who would provide anything less than universal service. The result was a corporation was publicly-funded, but independent of both the government’s editorial agenda and the finer detail of who paid the bills; the BBC was conceived as being truly universal.

In 2015, this structure remains at the heart of the constitutional make-up of the corporation, and despite the protestation of some commercially-interested critics, is far from an anachronism. Indeed, the BBC is a uniquely good force in providing services for nearly everyone, while being simultaneously funded by nearly everyone (in terms of coverage, the household-level Licence Fee has a greater level of universality than Income Tax). Nowhere is this made clearer than in the definition of “licence fee payer” in the current Charter:

In this Charter, a reference to a “licence fee payer” is not to be taken literally but includes […] any other person in the UK who watches, listens to or uses any BBC service, or may do so or wish to do so in the future.

Side-note: If you ignore the “Elizabeth the Second by Grace of God of the United Kingdom…” stuff, the Charter is intended to be read by the likes of you and me, rather than just politicians and civil servants. Don’t forget the agreement, too, which goes into a bit more of the nitty-gritty about it all.

Given the impending Charter renewal, and the increasing ubiquity of the Internet, it is crucial that the BBC continues to reflect its public service credentials upon new media as they emerge. The BBC’s Public Purposes are intended to codify this, albeit in not the most accessible of language—each one of the six embodies an ideal of building upon independence and universality to do something only the BBC can, across citizenship, education, culture, inclusivity, the global community, and crucially in this case, the benefits of communications technologies.

The mistake that everyone makes with the idea of “digital public space” that’s been kicked around for some time now is to assume the interventions need to happen at the upper layers of the Internet. “The Web’s not at all broken beyond repair!”, people shout. They are right. In fact, I count myself amongst them.

Most of the principles which underpin digital public space can be implemented, using existing commodity technologies, on the Internet today: HTTP gives us the ability to serve rich descriptive machine-to-machine metadata alongside the pages we browse, to assign persistent resolveable identifiers for things, and be explicit about how and where things can be reused, remixed and redistributed; assymmetric cryptography gives us the ability to put people at the heart of digital identity, to stop building gigantic federated systems (where somebody other than you or I is ultimately in control of our own identity), and to stop treating “assurance” as a simplistic monolithic creature, inexorably linked to identification online, and bearing no relation to the processes and constraints of the real world.

We can do all of those things on the Internet now, in principle. The problems aren’t technical in nature, but the vision is in conflict with some of the commercial imperatives of the more influential corporations inhabiting this world. Large companies want to be in control of everyone’s identity, because it puts them at the centre of the value exchange and allows them to collect more data that they can use for commercial gain later on—even if you don’t actually know this, you can deduce it: logically, why would they bother with all of the significant effort of building, securing and maintaining federated login systems that any site can buy into, with no money changing hands, unless there was something in it for them?

And so many of those institutions who have been working together to shape this vision have done so in partial recognition that publicly-funded bodies are in unique position: while they do need to spend money responsibly, there’s little in it for them to behave quite as a Silicon Valley tech start-up would; they exist to serve the public, en masse, rather than the advertisers whose collectively deep pockets make shareholders’ eyes light up.

By my reading, DOT EVERYONE seeks to encapsulate these parts of the vision: putting control back into the hands of the public, being altruistic in nature, and a recognition that the public sector has different imperatives than the private sector. Is it a good thing? On balance, quite probably. Does it need a new quango? Quite possibly not. Let’s try to make it a good thing, though, rather than assuming that it will or won’t be.

There is one area where the digital public space vision diverges from the Internet as we know it, and it’s an area of potentially great importance to the BBC (but not by any means only the BBC), and was the thrust of Tony’s Royal Holloway speech. Of all the fantastic, amazing things the Internet is and has made possible, a true provision of universal service which is comparable to broadcasting is not amongst them.

There’s a good reason for this: ring-fencing a bunch of stuff on the Internet and saying that anybody will be able to access it, even if they can’t get at anything else, they’ve reached their data cap, or let their account lapse has the effect of violating net neutrality to quite a significant and disastrous extent. The Internet is in a way a perfect free-market—often referred to as “the great leveller”—and that’s what makes it work, so well that it’s changed the world (and to a far greater extent than I came close to realising when I got excited by it in the early 1990s).

So, we can’t do that. I don’t want to do that. Given a choice between “delivering Freeview-style universal service in the UK using new technologies” and “the Internet”, I’d pick the Internet every time, despite believing that the former is going to be pretty crucial to the long-term survival of my employer and an institution I believe it’s incredibly important to maintain.

On the other hand, just as the licence fee funds universal access to television and radio broadcasts, I don’t believe the solution to the problems which come about as a result of people shifting away from them and onto the Internet for media consumption is to rope one broadcaster’s VoD service specifically into the licence fee: that strikes me (although I could be wrong) as being perilously close to the path to a subscription model.

Brief aside: the problem with a subscription model is that it rides roughshod through that universal service provision. It’s quite difficult now not to be a slave to audience figures (across all platforms); it could only be more difficult if the funding model is absolutely tied to them. We can’t deliver a universal service if we need to get as big an audience as possible at every turn to ensure that the bills get paid.

Instead, we need to come up with a model which allows for delivery of public services over IP—which can conceivably extend beyond public service broadcasting, but is not necessarily over the Internet. We can’t not provide universal access to public services as they’re increasingly delivered using Internet technologies, but by the same token we can’t destroy the Internet in the process.

I don’t know what that model looks like, but I’m pretty sure we need one. At the very least, we need to be absolutely sure of that the answer to “how do we ensure everyone gets public services (of all kinds, including public service broadcasting) to their devices in years to come?” really has been thought through properly.

Sep 17


Although I’m not technically prohibited from expressing my IndyRef preference prior to September 18th, I still don’t intend to — it’s a matter between me and the postal ballot I sent back a while ago (I didn’t know if I’d actually be in Scotland on polling day).

However, I am finding the mechanics and politics of it all rather fascinating. Obviously, there’s the usual stuff: it’s a historic event; being in the thick of it is eerily similar to being in The Thick of It; all of this week, Huw Edwards has been tapping away at his laptop just a few metres behind me when he’s not on air; and on Friday things will start to get really interesting, regardless of the outcome.

Both sides have claimed to have witnessed some degree of unpleasantness from the other (and I don’t doubt the claims, to be clear). I do wonder how much of it is a consequence of having a political event which has got people so engaged that even the, usually apathetic, thugs have joined in the fray—with a predicted record turn-out (my personal reckoning is that it’ll be north of 80%) and the outcome too close to call, ordinary folk are being forced to take more than the usual amount of rough with the smooth.

Meanwhile, there’s a lot which could be said about the “marketing” of it all. I struggle to get away from an awareness that the No Thanks campaign really screwed up from the outset: as was noted to me recently, negative messages peak much more quickly than positive ones, and it’s hard to say “No thanks” without sounding negative, and a little bit defensive.

And that’s without the huge apparent mis-step of taking the “devo max” option off the table in the Edinburgh agreement, in the belief that in its absence, the majority of people in Scotland would rather stick with the status quo than take a leap of faith. Ironically, this was exactly the sort of thing which many dissatisfied with the current state of affairs interpreted as arrogance on the part of Westminster, and it almost certainly helped sway some undecided voters towards the Yes camp.

The approach taken by those voting Yes seem—to my eye—to be a little different to those voting No, too.

Rather than evaluating it as a choice between “remain as we are” or “take a significant risk in separation”, as it is on paper, it seems like a lot of people are subconsciously evaluating the choice as being one at year zero: i.e., one between joining the UK (because they already feel relatively disenfranchised, and a No vote would put pay to any further independence attempts for at least a generation) and the risk of going it alone.

My hunch is, though, that regardless of views of independence, very few people in Scotland are genuinely happy with things just as they are today—so in many respects it’s not an unreasonable way of looking at things.

The reality is that this is a vote between one set of to-be-negotiated changes and another, lesser, set of the same, and neither is what anybody could call an ideal outcome. Do you want known unknowns or unknown unknowns?

Anybody who thinks that a vote either way is clear-cut and absolute almost certainly hasn’t been paying close enough attention: all we really know is that waking up on Friday morning, we’ll find ourselves at the start of either negotiations for self-governance, or negotiations to adjust the dividing line of authority between Westminster and Holyrood, and the political landscape in Scotland will likely change significantly in either case.

And, irrespective of the final outcome, if it is nearly as close as predicted, it’s a huge vote of no-confidence in the coalition Government—what message does “half of the population of Scotland voted in favour of leaving the UK altogether” send?

Mar 21


Okay. I’ve sat and thought about it, and here is my conclusion:

The only people that decriminalising the licence fee actually helps is politicians who are voted for by people who will never actually not pay the licence fee.

A civil case is, as most people actually know underneath it all, a grossly inefficient way of dealing with it.

The criminal cases, on the other hand, are heard in batches, usually settled before it ever gets in front of a magistrate (and for those which do, the majority don’t show up anyway), and there’s a certain amount of latitude available in terms of penalty: the maximum fine or imprisonment is generally a last resort applied to those who are defiantly persistent about not paying. If it’s the ‘criminal record’ part of it which is problematic (and I can understand that argument—to an extent, though I don’t see anybody complaining driving an uninsured vehicle being criminal is terribly unfair), then adjust the way that people are deemed to have been rehabilitated from the offence and associated disclosure rules. Good luck with that.

Constitutionally, putting in place a system where an organisation which is exists through royal charter and collects a levy defined in law, but issues civil proceedings for non-payment is even weirder than a system where an organisation which is exists through royal charter and collects a levy defined in law.

But no, this isn’t anything about the fairness of anything. There are already people seizing upon the idea as a stepping stone to simply making the licence fee a subscription. This misses (perhaps wilfully) a really big part of the point of the BBC: the fact that the definition of licence fee payer is deliberately distinct (go and read the Charter if you haven’t already) from “people who pay the licence fee” is the other side of the coin whereby the BBC isn’t a state broadcaster. To put it another way, it’s a very purposeful middle-ground where the BBC is neither directly beholden to the state nor to the people with the money. The BBC provides services, across a variety of platforms, which are for everyone. Those services are funded by the subset of the households who receive broadcast television.

(And yes, I know that the BBC chases ratings, underserves some audiences, and unquestioningly toes the line of the government of the day from time to time. We are not perfect. But the solution is cannot possibly be to remove the very mechanism which makes it possible for the BBC to do any better—do that, and the BBC of today is the best-constructed that it could possibly be, and I don’t buy that for a second).

While, hypothetically, you could make things fairer by rolling it into general taxation, you’d have to both have a parliament who was extremely friendly to the idea, and so many strings attached to prevent future abuse (for example, requiring an absolute majority in both houses to undo it), that it’s so unlikely to happen as to be discountable.

And, of course, we’re far from the exception in the “licence fee nation” stakes. Plenty of other countries have had them. Some of them even struggled for years under voluntary collection schemes as have been proposed by some quarters before switching to the British model.

And certainly, IP distribution presents some challenges to the funding model, but as I’ve noted previously: the TV Licence didn’t begin life as a TV Licence, it began as a Radio Licence. It’s evolved over time and can do so again—we just need to be careful to not tie too closely the media to the output in doing so.

Dec 14

Casualty of the storm

The pink, lifeless frame lay disfigured almost beyond recognition upon the path next to the river; another victim of the mighty, but dispassionate elements.

People glanced nervously downward as they passed, not being able to help themselves but to look. For a brief, fleeting moment, they desperately hoped that the same fate wouldn’t befall them, before self-confidence returned and they hurried away to go about their daily lives.

I did the same, of course, and as I hunched my shoulders and quickened pace, I hoped that my umbrella was made from much sterner stuff.

Nov 17

Modelling activity streams with quads

I’ve been thinking for a while about how one might use a quad-store as a database of activity streams, and consequentially how to model them.

We have four key pieces of information, plus a set of ancillary properties. These are:

  • A unique, generated identifier for the action, so that we can refer to it later
  • The person (or agent) performing the action
  • The action being performed (verb)
  • The thing the action is being performed upon

On top of these, we might have various other pieces of information: the action’s timestamp, the system it was performed upon, a policy associated with it (such as whether it can be shared anonymously), and so on.

The thought occurs that the verb could be easily represented as a predicate (previous approaches that I’ve seen represent the verb as an instance of a class instead, but my gut instinct is that it makes dealing with the data harder than it need to be).

Now, it’s entirely up to you which way around you put the agent or the action: but I’ve opted to use the agent as the subject (although that does affect the way I name my terms — I’ve opted for watched instead of watchedBy, for example).

Finally, the identifier for the action is used as the graph name, but is also annotated with additional properties in a separate graph controlled by the activity store itself. This is, I’ll admit, not the usual way to approach named graphs, but it does mean that when you reduce down to triples, you’re left with a stream of the three most important pieces of information: agent, verb, thing.

Enough waffling. Here’s an example:

@prefix act: <http://example.com/activity/>
@prefix dct: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix event: <http://purl.org/NET/c4dm/event.owl#>
@prefix foaf: <http://xmlns.com/foaf/0.1/>
@prefix tl: <http://purl.org/NET/c4dm/timeline.owl#>

    foaf:PrimaryTopic </2013/11/17/19/13/38078878b361481a9a05210d6611b89a#id> .

</2013/11/17/19/13/38078878b361481a9a05210d6611b89a#id> {
    <http://neva.li/#me> act:watched <http://www.bbc.co.uk/programmes/b03hy7hm#programme> .

</#id> {
    a event:Event ;
    event:time [
        a tl:Interval ;
        tl:start "2013-11-17T18:25:33Z"^^xsd:dateTime ;
        tl:duration "PT48M14S"^^xsd:duration
    ] ;
    dct:created "2013-11-17T19:13:47Z"^^xsd:dateTime ;
    dct:source <http://www.bbc.co.uk/iplayer/#id> .

You’ll note that my activity identifiers contain a date and time (down to the minute level, at least) and a UUID: this is to allow for both logical navigation patterns to retrieve aggregations (e.g., “return all activity from November 2013”), while not being reliant upon a single naming authority. There are, of course, a whole host of different ways of doing this.


Update: Ryan Adams points me at Tin Can, which I must have seen when it was announced, but haven’t actually looked at the spec for until after posting this. Interestingly, it looks like it would be fairly straightforward to express Tin Can statements in the form above (and even easier if the experience verbs had an RDF vocab representation).

Sep 11

Ofcom has today published the final findings of a year-long study into how and why internet users access music, films, TV programmes, software, books and video games online – both legally and illegally.

The fourth wave of the Online Copyright Infringement Tracker shows results for the period March – May 2013.

The High Volume Infringers Analysis Report considered the behaviour of the most active copyright infringers over the period May 2012 – May 2013. This report’s findings include:

  • Over the year, 17% of internet users infringed copyright online;
  • 2% of internet users were responsible for 74% all online copyright infringement
  • However, infringers also accounted for 32% by volume, and 40% by spend, of legally consumed digital content, spending more on average than non-infringers on both digital and physical content.

All the research was funded by the Intellectual Property Office (IPO) and carried out by Kantar Media on behalf of Ofcom. The reports contain details about the methodology used.

Source: Ofcom

Aug 12

A short bullet-pointed rant about both Spotify and Skype on the Mac

I use both Spotify and Skype on the Mac. In fact, I pay for both. Here is a short list of the things which irritate me today about both, in no particular order. Given the age of both, I don’t expect either to be occurring by now.

  • Both: either use Sparkle or get distributed through the Mac App Store. Your own updaters are pitiful. I run as a non-admin user, but have admin credentials which I can provide when needed. In the case of Skype this means I can only ever update manually (it actually goes to the trouble of downloading the update, then fails with a cryptic error when it can’t overwrite Skype.app). With Spotify, it’s almost worse: it updates entirely automatically, but when it discovers the install location isn’t writeable, it dumps the updated copy in ~/Applications, cheerfully creating it if it doesn’t exist. Great. Now I have two different copies of Spotify. Both Sparkle and the Mac App Store handle this completely gracefully (i.e., they prompt for elevated credentials when needed). Update: Skype fixes this as of version 6.6.
  • Spotify: Don’t breathlessly tell me a new album is available, and then tell me it’s not available when I hit the “Play” button. Good grief.
  • Spotify: What on earth is with the randomly de-authorising machines for offline use? Urgh. Don’t try to make it automatic and byzantine; either give me a list of devices, or make it work properly.
  • Spotify: Stop spamming syslog with Spotify[21722]: setShowsApplicationBadge: is not yet implemented for the NSApp dockTile.
  • Skype: I don’t know why you think Skype[21446]: WebFlashData cannot find Macrodmedia flash player SharedObjects directory, error Error Domain=NSCocoaErrorDomain Code=260 "The folder “#SharedObjects” doesn’t exist." UserInfo=0x1a2a2c0 {NSUserStringVariant=( Folder ), NSUnderlyingError=0x1a111a0 "The operation couldn’t be completed. (OSStatus error -43.)", NSFilePath=/Users/mo/Library/Preferences/Macromedia/Flash Player/#SharedObjects} appearing in my logs is fine: whether it’s the misspelling of “Macromedia”, the miscapitalisation of “Flash”, or the fact that you’re dicking around in there in the first place. Stop it.
  • Skype: Turning off emoticons (particularly animated emoticons) should not have a measurable effect upon CPU usage in this day and age.
  • Skype: To load more than the last few lines of conversation history should not take multiple seconds. Update: Skype fixes this as of version 6.4.
  • Skype: Sync read status between devices, or don’t let me log into multiple devices simultaneously, for goodness’ sake. I’m bored of missing messages because I forgot to quit Skype at home before heading to work.
  • Skype: Given that Lync for Mac is utterly horrible, when will I be able to log into my corporate Lync server at the same time as logging into my external account? I suppose group conversations or group video/voice calls between corporate Lync and public Skype participants will be out of the question?

There, that wasn’t so hard, was it?

In the interests of fairness I should note that Microsoft may have fixed some of the Skype issues since I last updated. That was a while ago, because updating it is a ball-ache and it doesn’t even tell me that there’s a new version available to manually download and install.

May 31


There is some utter twaddle being bandied today around about images of child abuse, and much more besides, on the Web.

First there is clear blue water between perfectly legal (whether you find it savoury or not) pornography and completely illegal images of child abuse (themselves depicting illegal activities). It’s difficult to see the conflation of the two as anything other than a disgusting attempt to further an agenda under the guise of solving a problem.

Second, despite some somewhat ill-informed statements to the contrary, images of child abuse are not a technical problem—they are a three-part problem of entirely human making: (a) abusing children, (b) capturing that abuse on camera, © distributing that abuse. In that order. There are people such as Mark Bridger who will seek out this material, and history has shown that they will almost certainly be successful if it is there.

Blocking it from search engines may help in a small way prevent the cases of “accidentally encountering it online”, but it doesn’t solve the underlying problems of its very existence, nor does it solve the cases which don’t involve “click on a link on Google and are immediately confronted by child abuse” (and I have a very strong suspicion they are very much in the minority). It doesn’t prevent people finding it if they’re sent direct links, find it through some intermediary service created specially to do this, or if the material isn’t itself on the Web in the first place (pro tip: the Web and the Internet are not the same thing).

Politicians, campaign groups and newspapers are very keen to apply pressure. Reportedly only 1% of sites hosting this material are within the UK; is there an international treaty governing the procedures for the removal of child abuse images (and following that, the tracing of the originators and bringing them to trial)? If not, why the fuck not?

People who know absolutely nothing about the way the Internet or the Web work (or, it’d seem, have never thought for more than a fraction of a second about human behaviour), would do well to stop trying to fundamentally redefine it because bad people use it to do bad things.

Instead concentrate on actually reaching consensus on dealing with the things we can all agree on: chiefly, that these images and the actions which precipitate them have no business existing in the first place.

May 29
Henry Wood on Flickr.

Henry Wood on Flickr.

Page 1 of 98