What You Need to Know About ChatGPT

How Large Language Models work, and when they don’t...

Feb 29, 2024

ChatGPT and Google’s Gemini – formerly known as Bard – can plan parties, write stories, hold conversations and pass exams. These incredible Large Language Model (LLM) systems might look like the all-knowing AI promised by science fiction. But they are not.

As powerful, impressive and revolutionary as they are, LLMs are essentially elaborate bluffing systems which use statistics to piece sentences together.

It is only by understanding how these systems work that we will get the best out of them, and avoid getting caught out by their limitations.

Watch my short video to find out more

Communicating with computers

When you stop and think about it, it’s quite incredible that we can now operate our computers just by talking to them.

Rewind 70 years and people had to manually feed huge computers with machine code – strings of 1s and 0s – using pieces of card punched with holes. The position of the holes told a computer which transistors to switch on or off to do its processing. Usually the only way to correct an error was to scrap the entire card and punch a new one.

Today’s computers are vastly more powerful and sophisticated (not to mention smaller), but they still operate in the same language. Microsoft Windows, a 20GB programme, is essentially a sequence of 170 billion 1s and 0s. That’s a lot of punched cards…

Fortunately we don’t have to contemplate how difficult or time-consuming it would be to operate a modern computer using this laborious binary system. The person we have to thank for this is Grace Hopper.

The Queen of Code

Born in New York City in 1906, Hopper was a fascinating character. She was the kind of child who dismantled her parents’ alarm clocks to see how they worked, and the kind of adult who kept a backwards clock on her office wall as a symbol of her intent to do things differently.

After studying mathematics, Hopper went into teaching before signing up to the US Navy Reserve during the Second World War. It was here that she was assigned to work on the Mark I computer – an unwieldy contraption built by Harvard professor Howard Aiken — which the Navy hoped would help speed up its calculations.

Despite Aiken being resistant to the idea of having a woman join the team, Hopper was soon tasked with writing an operating manual for the Harvard Mark I. It was arduous work trying to make sense of a temperamental computer which was prone to frequent errors. Hopper is said to have popularised the term ‘debugging’ – at one point literally removing a moth that had caused the machine to malfunction.

After the war, Hopper continued working in the newly emerging field of computing. Spurred on by the monotony of feeding endless lines of code into the Harvard machine, she had an idea. What if these computers could be programmed using words instead of symbols?

Tim Harford’s book Fifty Inventions That Shaped the Modern Economy tells the story:

Hopper and her colleagues started filling notebooks with bits of tried-and-tested, reusable code. By 1951, computers had advanced enough to store these chunks — called "subroutines" — in their own memory systems. Hopper was then working for a company called Remington Rand. She tried to sell her employer on letting programmers call up these subroutines in familiar words — to say things like "subtract income tax from pay" instead of, as Hopper put it, "trying to write that in octal code or using all kinds of symbols."
Hopper later claimed that “no one thought of that earlier because they weren't as lazy as I was.” That's tongue-in-cheek self-deprecation – Grace was famed for hard work. But it does have a kernel of truth: the idea Hopper called a "compiler" involved a trade-off. It made programming quicker, but the resulting programs ran more slowly.

Hopper’s idea would revolutionise the world of computing, but not immediately. Remington Rand dismissed it – they weren’t interested in anything that would slow their systems down.

Not one to take ‘no’ for an answer, Hopper set about writing the first compiler in her spare time. This evolved into the programming language COBOL. “More fundamentally”, as Harford puts it, “it paved the way for the now familiar distinction between hardware and software.” People could now program and operate computers without needing to worry about switches and wires.

Hopper’s innovation kickstarted the journey towards more user-friendly computing: programming languages, graphical user interfaces, touchscreens, web editors, voice assistants – all helping to make advanced technology accessible without the need for advanced technical knowledge.

Today, this concept has reached a whole new level with the arrival of generative AI programs like ChatGPT and Gemini.

How Large Language Models work

These LLM systems are so impressive that we could almost be lulled into believing that we’re no longer dealing with wires and switches but something on a par with human intelligence. Perhaps this isn’t surprising considering that, in the space of a human lifetime, we’ve progressed from manually feeding machine code into a Harvard Mark I to making conversation with ChatGPT about what to cook for dinner tonight using what’s in our fridge.

Neural nets have been key to this breakthrough. This is a method of machine learning that teaches computers to process data in a way that is inspired by the human brain. Here’s an iluli explainer on the topic from 2021:

As I said at the time: “The danger of neural nets is that we are so accustomed to thinking of computers and machines as dumb mechanisms that just do as they’re told that, as soon as they do something beyond that expectation – something that appears the tiniest bit human – we are so amazed that we overestimate what they are capable of.”

So how do Large Language Models like ChatGPT work? And what are they capable of?

The short answer is… statistics. These AIs are trained on vast quantities of human-created text (from the web, books, news articles etc.) and then model the statistical relationship between these billions of words. This information is used to calculate what the next word in any sentence should be. Essentially, they are like autocomplete and predictive text on steroids.

In a brilliant and insightful blog on the workings of ChatGPT (which is well worth reading in full), Stephen Wolfram writes:

The first thing to explain is that what ChatGPT is always fundamentally trying to do is to produce a ‘reasonable continuation’ of whatever text it’s got so far, where by ‘reasonable’ we mean ‘what one might expect someone to write after seeing what people have written on billions of webpages, etc.’

He continues:

The basic concept of ChatGPT is at some level rather simple. Start from a huge sample of human-created text from the web, books, etc. Then train a neural net to generate text that’s ‘like this’. And in particular, make it able to start from a ‘prompt’ and then continue with text that’s ‘like what it’s been trained with’.

There’s no two ways about it, these systems are incredible. But it would be a mistake to see them as ‘intelligent’. They can churn out reams of human-sounding text (as well as images and videos) that give the illusion that they ‘understand’ our prompts and know what they are talking about. But, like a highly trained parrot, they repeat familiar phrases without ‘knowing’ what these words and sounds mean to a human.

This partly explains why these LLMs will often ‘hallucinate’ by making up their own facts and information.

When news outlet CNET employed AI to start writing articles and features, it ended up having to issue dozens of corrections because the pieces included false information and provided misleading financial advice. Lawyers in New York used ChatGPT to write their legal briefs – only to find that the AI chatbot was inventing its own fictitious cases. There have even been cases of ChatGPT making up salacious accusations against real people by citing fictitious news articles.

Like a highly accomplished blagger, LLMs have a hugely impressive ability to write convincingly and with authority on any subject. Just don’t count on them to be factually accurate.

Will AI be coming for our jobs?

Predictions around where this technology will lead have been pretty stark. However it’s important to remember that LLMs are no match for human intelligence in some crucial ways.

As Oliver Brown, an associate professor at the University of New South Wales, puts it:

LLMs are not models of brains, but of language itself, its patterns, structures, and probabilities. At heart their job description is incredibly simple: Given some text, they tell us what text comes next. It’s worth keeping front and center, however, that there is not always one right response.

AI is already changing the world of work, but that doesn’t mean it’s going to replace the need for people.

Michael Littman, Professor of Computer Science at Brown University, believes that it will ultimately lead to new types of jobs for humans and increase the demand for roles like engineers and data scientists at all sorts of companies.

Law and accountancy firms are now using AI-based technologies to scour through reams of paperwork and data to identify patterns in lease contracts or suspicious transactions. Rather than replacing human work, this is said to be facilitating new insights and services.

Other firms have been using AI to do things like creating company policies. HR experts have described the results as impressive but concluded that, “this technology works best as a support to the real people operating it, rather than a replacement.”

Back to the Future

I asked OpenAI’s DALL·E to create an image incorporating early computer programming, AI and a time-traveling DeLorean

LLMs like ChatGPT might mark the pinnacle of Grace Hopper’s ambition to break down the barriers of communication between humans and computers.

Back when Hopper was trying to get her idea off the ground, she believed that the concern over processing speed wasn’t the only factor driving resistance from her colleagues. The ‘high priests’, as she called them, were resistant to change because they wanted to guard their prestigious status as the exclusive few who were able to operate and program these complex new machines.

Of course, in the end Hopper’s initiative didn’t reduce the need for expertise: it created new specialisms which paved the way for the revolution in computing that followed.

The potential for LLMs may be even more exciting and transformative. Just as long as we don’t lose sight of how this technology works. And when it doesn’t.

iluli by Mike Lamb

Discussion about this post