Paul Romer

If your AI is so great at coding, why is your software so buggy?

Fri, Mar 24, 2023

After feeding the hype about the terrifying power of artificial intelligence, Sam Altman, the CEO of OpenAI, now confronts the downside of unrealistic expectations.

Last week, he disclosed a bug in OpenAI’s software that leaked sensitive user information. The natural response is the question posed in the title. If his AI systems are so powerful that they can read and write code, why didn’t his team use AI to spot and fix the problem in its own software?

When people signed up for paid access to OpenAI’s much touted ChatGPT, the information they supplied was managed by registration software. Sometimes it got confused. If John Doe signed up for the premium chat product, this software sometimes sent his confirmation email to Melanie Smith. If Sara Burrows checked the webpage with her payment details, the software might have shown her the payment details page for Albert Jones. As a result of such mistakes, this software leaked information about people in the position of John and Albert that included

name
email address
payment address
the last four digits of the credit card number
the credit card expiration date

Many companies have built on-line registration and payment systems that do not get people mixed up. You have to wonder why a company that is supposed to be on the cutting edge of human and artificial intelligence couldn’t manage the same task by following conventional software development strategies. It is an interesting question, but not the one that poses the real threat.

For people who benefit from the hype, the dangerous question is why OpenAI didn’t it use its AI systems to find and fix the underlying bug before putting its customers at risk. If these systems really can read and write code, couldn’t developers at OpenAI have used them to scan for bugs in the open source software library that Altman blamed the leaks on? Better still, if OpenAI has developed code that can write code, why didn’t the OpenAI developers use its AI to write a correct, in-house replacement for the open source library?

Based on the hype in the news reports, these may sound like reasonable questions. In fact, anyone who has used AI knows that such questions are premised on a vast overestimate of what actual AI systems can do.

The best way to discover the AI’s limitations is to try it. GitHub offers developers a product called “Copilot” that relies on an AI system that Microsoft (GitHub’s owner) licensed from OpenAI. It is not a party trick. It is supposed to do real work. If you dig into the documentation, you’ll discover that GitHub understands that even its relatively sophisticated software developers have unrealistic expectations about what a product like Copilot can do.

“However, GitHub Copilot does not write perfect code. It is designed to generate the best code possible given the context it has access to, but it doesn’t test the code it suggests so the code may not always work, or even make sense.”

I’ll leave it to you to decide if this is consistent with what you expected on the basis of the news accounts you’ve read.

If you do try Copilot, you’ll quickly discover that it

does not have superhuman coding skills;
does not have the skill of an average developer;
does not even have the skill of a high school student who knows how to check the documentation for a programming language;
does indeed make some suggestions that will break your code;
is less like a copilot than an overeager golden retriever trained to fetch code fragments instead of sticks.

If you give Copilot a prompt–“Go fetch some code that says something about saving a variable to a file”–it runs off, brings back a bunch of code fragments, and begs for a belly rub. It’s up to you to figure out what to do with the mess of code that it drops in your lap.

Now honestly, I’d be impressed if someone trained a dog to fetch pieces of paper with variations on the words “file” and “variable” from an enormous print-out of millions of lines of code. But even so, I would not expect the dog to understand what “file” and “variable” mean. Nor would I hope that a dog could assemble a bunch of code fragments into a coherent program. I wouldn’t even think of asking a dog to look at a long program, infer what it does, and warn me if what it does (sometimes send John’s confirmation email to Melanie) is different from what it is supposed to do (always send John’s confirmation email to John).

At this point, Altman might be in position to take the money and run. Satya Nadella, Microsoft’s CEO, may end up being the person who suffers from the unrealistic expectations that Altman has spawned.

If the AI systems that Microsoft licensed from OpenAI (apparently for more than a billion dollars) are so great at writing code, shouldn’t the public expect Microsoft’s own developers to use them to write better code? As a result, shouldn’t everyone expect software sold by Microsoft to be free of the many security flaws that threaten users? Shouldn’t it be possible to leave behind the endless stream of patches required to fix all the problems that human developers failed to spot?

No.

OpenAI’s bug tells us to expect no departure from status quo, with all its bugs, vulnerabilities, and after-the-fact patches to which we have become accustomed. In fact, this bug is a leading indicator of how the AI revolution is likely to play out. Just as in the Internet and cyber-currency revolutions:

there will be lots of hype;
a few people will become billionaires;
everyone else will be profoundly disappointed, left to cope with the harmful side effects of the unfamiliar predatory strategies that the new technology enables.

Lazy Looting

Cockpit Insecurity