I’m a Computer Scientist. Here’s Why You Should Never Trust a Computer.
Especially when it comes to voting.
It’s 2018. We live in the future. We can order a pizza, watch it get made, and watch it get delivered to our house. So why can’t we vote online?
Let’s start with some background on what programming languages are, and why we need them. Soon, you’ll see why you should never want to vote online (and why you never want a computer anywhere near you when you’re voting).
You probably know computers run on binary, 1s and 0s. And writing in binary is hard—so hard, in fact, that basically nobody wants to do it. Even if you succeed in doing it, what you’re producing is just a bunch of numbers, and it’ll be very hard for anyone—including you in a few weeks, once you forget what you wrote—to figure out what your code actually does.
So instead, we computer scientists invented “machine languages.” These are abstractions that change binary code into something that’s at least a little closer to languages humans speak. They’re still basic, but they’re a step in the right direction. Machine languages are based on—and tied to—the hardware of whatever machine they’re designed for. So while you can’t say something easy like “add 10 and 20 together and print that result to the screen,” you can say “place the value 10 in register one, place the value 20 in register two, feed both these registers into adder one, and put the output in register three, and print the contents of register three to the screen.” The machine language is then translated—this is called “compiled”—into the binary 1s and 0s required to actually run on your computer.
There are obvious downsides here: You need to be familiar with your computer’s hardware to write in a machine language, and every computer’s architecture is slightly different. Plus, you have to explicitly specify every step of the process. That’s a pain. But the upside is that when you’re looking at a program written in machine language down the road, what’s happening is much clearer—especially compared to looking at an endless stream of 1s and 0s in binary.
No matter what you write, you’re trusting the compiler to accurately turn what you wrote into binary code. If I wanted to mess with your results, all I’d need to do is mess with your compiler.
The next step up is to abstract away the hardware, so you don’t actually need to know the location of things like “adders” and “registers.” If you build a smart enough compiler, you can design machine-independent programming languages, with more abstract instructions that could easily handle things like “add 10 and 20 together, and print that result to the screen.” You’d then rely on the compiler to translate that into machine language and then into binary.
While all of these programming languages take different approaches to solving this problem, they share the same goal: to make computer code easier for humans to read, which makes it easier to understand and easier to maintain. Programming languages today make printing the result of 10+20 as simple as writing this:
print 10+20
Did you spot the reason you can’t trust any computer?
I’ll give you a hint: It’s there in the compiler.
No matter what you write, you’re trusting the compiler to accurately turn what you wrote into binary code. If I wanted to mess with your results, all I’d need to do is mess with your compiler.
For example, if I changed the “print” command so it always added 1 to the numbers you gave it, your program wouldn’t run properly—even though you programmed it correctly. You’d never find the glitch just by looking at your source code because that’s not where the glitch is. It’s hidden in the compiler.
That example is basic and you’d detect it pretty quickly because your program would obviously be broken. But what if I did something more subtle? What if instead of messing with the “print” command, I changed the compiler so that whenever it detected code involving passwords, it made it so the password “ryaniscool” also worked?
It’s not the end of the world if someone hacks in and sees my pizza being delivered. Nobody cares enough to try to break it. But voting is not one of those cases.
If I did that, I’d have what’s called a “back door” into every computer program you build with my compiler. In other words, you can lock your front door all you want, but it doesn’t matter because I have a secret door around back nobody knows about. No matter what you write, no matter how secure your password code is, my password of “ryaniscool” is also going to work—and you won’t even know it.
Obviously, this is a problem. And you might think, “But compilers are computer programs like any other. I could look at the source code of my compiler to make sure there’s no malicious code there. All I’d need to do is find the part that talks about adding ‘ryaniscool’ as a password, take it out, and I’d be fine. Right?”
And you could. Except, as you said, compilers are computer programs like any other. And that means they themselves are compiled.
Here’s all I’d need to do to exploit that:
Step 1
As before, I’d write code that adds “ryaniscool” as a valid password to anything it compiles and put this in the compiler. At this point, I’m adding a back door to anything the compiler compiles, but I’ll get caught if anyone looks at the source of my compiler. So I go on to Step 2.
Step 2
I write code for the compiler that detects when it’s compiling itself, and when that happens, it adds in the code for step 1 into the compiler. Now, when I compile the compiler, it makes a new version of itself that will add in compiler instructions for how to insert the “ryaniscool” password whenever the compiler is rebuilt. And to cover my tracks, all I’d need to do is remove the malicious instructions from the compiler source, and I’m done.
Whenever the compiler is rebuilt, it’ll build itself such that it’ll contain instructions to add my back door. Whenever that compiler builds something else, it’ll follow those instructions and build my back door right in. And there won’t be a single line of malicious code left in any source code that reveals it.
The only way to detect this bug is to go over the binary code yourself—a task that starts out hard and becomes literally impossible as programs become more complex. The complete works of William Shakespeare come in at under 6 megabytes. The Firefox browser alone requires 200 megabytes just to install it, and that’s only one program on your computer. There is not a human alive who has read all 200 megabytes of that code. It’s not even written in a language designed for humans to read.
Why are we using these nightmare machines?
None of this is new. In 1984, Ken Thompson—the man who designed and implemented Unix, the progenitor of the operating systems most computers and phones run on—presented a paper called “Reflections on Trusting Trust” and reached this conclusion:
The moral is obvious. You can’t trust code that you did not totally create yourself… No amount of source-level verification or scrutiny will protect you from using untrusted code.
By “totally create yourself,” Ken doesn’t just mean a program that you wrote, but one you wrote the entire stack for: everything down to the compiler. Very few people have the time, skills, and money to build a computer from the ground up, including all the software on it. This would seem to be a bullet in the head for trusting computers with anything.
And yet, we trust computers with all sorts of things. So, what gives? Why are we using these nightmare machines?
Well, for one thing, computers are really fun and convenient. And they’re practical in a lot of ways. Besides, a compiler hack can be tricky to pull off in practice: You’d need time and motivation to target someone. The truth is, there are many cases where you don’t need absolute trust in your computer: After all, it’s not the end of the world if someone hacks in and sees my pizza being delivered. Nobody cares enough to try to break it.
But voting is not one of those cases.
The only safe-ish way to vote with a computer is one in which a paper ballot is printed in sight of the voter, approved, and then stored in a ballot box.
Voting is a case where the outcome of a hack can have huge effects. Voting is also relatively easy to target (you know when and where it’s going to happen), and there’s a very strong motivation to alter the outcome. As easily as I could add that “ryaniscool” password, I could change the “add” command so that, when it was tallying votes, it added some extra for the party of my choice.
How much should I add? Honestly, at this point, it’s entirely up to me. Hence this conclusion: Online voting will never be safe. Computer voting will never be safe.
The only safe-ish way to vote with a computer is one in which a paper ballot is printed in sight of the voter, approved, and then stored in a ballot box. That way, if someone thinks the computer systems were compromised—if there’s any reason at all to suspect someone added the votes improperly—then there’s a paper trail. In other words, the computer adding up the votes is a convenience, nothing more. The real vote, the real power, still lies in the paper ballot.
Without that paper trail, you’re left trusting the computer.
And nobody should ever trust a computer.
UPDATES:
There’s been some recurring themes in the discussion around this essay, so I thought I’d incorporate them here in Q+A format! The essay above has not been altered, but I thought the below might be useful if you’d like to do some more reading on this subject!
Q: What do you mean by computer voting?
A: I’m talking about a system in which you exclusively vote on a computer: no paper trail is generated. In this case, the computer is the authority on what you voted: there’s no other source you can double check.
An ethical and safer way to use computers in voting is to use them not as an authority, but as a convenience. If you vote on a paper ballot, and a computer scans that to add up a result, you can feel safer, because if anything goes wrong there’s still a physical paper trail. If you vote on a computer but then it prints out a paper ballot, which you have to confirm as being accurate before your vote is recorded, then you can feel safer too, because in both these situations the computer is a convenience. It’s when the computer becomes an authority that the problems arise.
Q: Couldn’t this be fixed by giving each person who votes a secret code, or some sort of key, or maybe we could biometrically scan their eyes or fingerprints or something? Or what if we backed up the votes somewhere on the internet the second they were made?
A: Nope. Codes and keys can be intercepted or duplicated, and any biometric scanner would be a computer, vulnerable to the exact same issues discussed here. And any networked system — in which the computer shares it vote to “back it up” somewhere else — again depends on that not being tampered. Sorry.
Q: Okay, but maybe we could test our programs and see if our compiled code acts differently than what we expect?
A: This doesn’t work for a couple of reasons. Sure, in my “change the value of what 10+20 adds up to” example, that’d be easy to test and catch any changes. But even if you thought to test that in the first place — and why would you? — that still doesn’t solve the problem. My malicious code could detect when it’s being tested and do nothing bad, only becoming active when you’re not looking.
Sounds like scifi, right?
Wel, it’s already been done: in the 2015 Volkswagen emissions scandal, the car’s onboard computers detected when their emissions were being tested and ran in a low-power environmentally friendly mode, and switched to a high-power polluting mode when the test was over. The computers detected when they were being tested, acted on their best behaviour, and then stopped when the test was over. This scandal cost Volkswagen $18.32 billion to fix, by the way, not including the $2.8 billion dollar fine they paid.
The only reason Volkswagen would do this in the first place is because it’d be profitable to them and they thought they wouldn’t get caught. The same incentives apply to an election.
Q: If I’m forced to vote on a computer, does that mean I shouldn’t vote?
A: No, you should absolutely vote anyway. The purpose of meddling in an election is to disenfranchise you. If you don’t vote, you’re disenfranchising yourself already with 100% efficiency. Go vote, and afterwards, do what you need to do to ensure you never have to use computer voting ever again.
Q: Does this mean we shouldn’t trust computers for anything?
A: In an absolute sense: yes. You should not have 100% faith in any computer system. But that’s obviously not practical, and in most cases, you don’t need to have 100% faith in a computer. One of the few cases in which you would is in voting. The next question goes into this in more detail.
Q: Come on. We do banking online, billions of dollars moves digitally every day, and you even wrote this on a computer. Surely you’re being alarmist and/or hypocritical?
A: This is where the idea of absolute trust comes in. I don’t absolutely trust computers, but I do bank online. But that’s because, if something goes wrong, the bank can fix it afterwards. You use a credit card, knowing there’s a chance your information could be stolen — but if that happens, you have trust the credit card company will fix it. And they will — because the profit they make from you using their card everyday makes up for the expense for covering for fraud and broken software.
But there’s no way to correct a broken election after the fact.
It’s all a matter of compromise: publishing this online was convenient, and I did it knowing that my words could be altered. There’s a risk they could be — but in the end, it’s not the end of the world. Low stakes, and the benefits outweigh the downsides. Similarly, I do banking online — because it’s also convenient, and I’m willing to compromise because I know that while there’s a chance my data and/or money could be stolen, I feel relatively assured the bank will cover it. And yes, I use computers to send friends $10 to pay them back for dinner, but I do it because the stakes are so low. It’s just $10.
The stakes are not low when it comes to voting.
And an electoral system — a democracy — is not the sort of thing you want to be compromising on.
Q: What about blockchain? You should’ve mentioned blockchain .That’s a new technology that didn’t exist in 1984 and that could definitely solve this problem.
A: Nope, nope, nope. Sorry. I wish it worked too.
Q: You don’t know what you’re talking about, and who made you an authority? Why should I trust you?
A: Like I say in the essay, these aren’t my brilliant original ideas. I’m basically rephrasing what Ken Thompson argued in 1984 in his Reflections on Trusting Trust paper. Ken’s argument is actually stronger: his example is the login program, an analogue of which is used in just about every computer. I’m just talking about voting. Ken’s paper has stood as a seminal paper in computer science for over 30 years, but it’s not well-known outside computer science circles. That’s why I wanted to write this essay.
(Incidentally, there is a way to correct for the issues Ken raised and I paraphrased: you could compile your code twice, once with a new compiler and once with a known-good compiler. If you compare the two outputs and they’re the same, you know your compiler is good. This, of course, raises the question of where you’d get that known-good compiler from — here’s a PhD thesis on that subject).
Q: Okay, sure this is pretty dour, but this is all hypothetical. We use computers to control nuclear reactors, for crying out loud. If things weren’t safe, we’d know about it.
A: An attack very similar to the one discussed here — in which the evidence of the attack was hidden — was done in real life just a few years ago, with the Stuxnut worm in 2010. And that attacked — you guessed it — nuclear centrifuges.
These attacks are already happening. Whoops.
Q: Computer voting may be bad, but paper ballots can be altered too, you know. They’re not perfect either.
A: Absolutely. But paper ballots have a few huge advantages: their downsides are well understood (nobody is writing big essays on why you paper can’t be trusted), and their vulnerabilities are limited to physical access.
If I want to mess with a paper ballot election, I need to either steal ballots, or alter ballots, or stuff ballot — either way, I need physical access to that ballot box. And that limits the amount of damage one bad actor can do. A bored teen half the world away can’t effect a paper ballot election from his basement. The same can’t be said for computer voting.
And, on top of all of this, there’s a simple fact: programmers aren’t perfect. Even if the attack in this essay isn’t used, that doesn’t mean your computer voting system is secure. Heck, Google — who I think we can all agree hire some very smart people — have a bounty system in which they pay you cash money if you help them find bugs in their own software, because they can’t guarantee they haven’t made mistakes.
Software programming is hard. Computers are hard. Even a brilliant, well-intentioned software developer can make a single mistake that opens up an entire software stack to intrusion. The Heartbleed bug was introduced by accident in 2011 — in open source software that in theory anyone on the planet could’ve looked at, examine, and detected — but it was not found until 2014, at which 17% of the servers on the internet were now vulnerable.
That was done by accident. Imagine what someone can do if they were trying.
Look, I know it sucks to hear that computer voting is bad. It sucks to line up outside, in physical space, when it’s so easy to imagine just voting on an app on your phone on your lunch break and be done with it. But more important than an election being convenient is it being accurate, and while computer voting sure would be convenient, I hope I’ve convinced you that you should not to trust it to be accurate.
Q: Is there a relevant xkcd?
A: There is always a relevant xkcd.
written by
Ryan North
Writer and cartoonist | http://www.ryannorth.ca