The Unintended Consequences of Measuring Things
Goodhart's Law, ridiculous outcomes and the meaning of life…
Why do we measure things? That might seem like a silly question until you consider just how much we do it. From population data and GDP to steps and screentime, we measure almost every aspect of our lives.
And the reason we do this is always the same – to make things better.
But what if our obsession with measuring things is actually making things worse?
Watch my short explainer to find out more:
The metric minefield
A metric is basically a system for measuring something and stating it in numbers. The simplicity that comes from this process of quantifying can be incredibly useful in a world full of complexity and ambiguity.
Take schools. We’d all agree that we want every child to have a great education. Sounds simple, right?
But you don’t have to scratch too far below the surface before things start to get a bit more complicated. For a start, what exactly is “a great education”? Is it about filling children’s heads with facts and figures? Or instilling them with curiosity and critical thinking skills?
Then there are the practical issues. Should we prioritise the technical expertise that will help the next generation thrive in an increasingly technological world? Or do we favour a more rounded curriculum that gives children a solid grounding in the arts, literature, and philosophy?
Once you’ve resolved those thorny dilemmas, you face another one: how do you judge whether children are actually getting this “great education”?
This is where exams and assessments come in. Exam grades provide a tangible and simple metric for measuring how well a child, a class, a school, or a whole country is performing.
But this simplification has come at a cost.
Exam results are linked to funding, teachers’ careers and even the continued existence of educational institutions. This means that we’ve created a system where many schools feel pressured to “teach to the test”, with exam preparation prioritised over creativity and critical thinking. In trying to measure the quality of a child’s education, we’ve risked losing sight of what a great education should actually be.
British economist Charles Goodhart coined a neat explanation for this phenomenon, which is now popularly expressed as: “Once a measure becomes a target, it ceases to be a good measure.”1
You can see examples of what is now called Goodhart’s Law all around us. Our focus on crime statistics has led to crimes going unrecorded. A fixation on reducing health waiting lists has made it harder for people to get on the waiting list in the first place.
Snakes, rats and ridiculous outcomes
When is a hospital bed not a hospital bed? How could a pest reduction scheme actually increase the number of pests? Why might a government want to encourage more drug dealers and fewer unpaid carers? The answers to all of these riddles can be found in Goodhart’s Law.
In the video, I talked about the possibly apocryphal tale of how a scheme to cull poisonous snakes in 19th-century India backfired. This gave rise to the term “Cobra Effect”, which describes how attempts to solve problems can sometimes make them worse.2
But perhaps the “Rat Effect” would be more fitting. Unlike the Indian snake tale, we know that the Great Hanoi Rat Massacre of 1902 actually happened.
Amid a pandemic, French authorities in Vietnam feared that rats were spreading disease. They hired teams of professional rat catchers who, by the summer of 1902, were killing more than 10,000 rats per day. But this wasn’t enough – the city was still overrun with potentially infectious rodents.
So, to step these efforts up, they encouraged the public to join the rat resistance too: citizens were offered a 1¢ reward for each rat they killed. The authorities measured their progress (and issued rewards) based on the number of rat tails people handed in.
The scheme was hugely successful – at least according to the metrics. Rewards were being paid out left, right and centre as locals presented the evidence of their rat-catching prowess.
But officials soon spotted something unusual – a preponderance of tailless rats roaming the city. Enterprising locals had been maiming rather than killing the rodents. What’s more, they were releasing the live rats into sewers to reproduce and keep their profitable pest supply chain going.
The result? Hanoi’s Great Rat Massacre didn’t just fail to curb the city’s rat infestation – the rodent population went up!
It’s an example of how an oversimplified metric can lead us astray, sometimes with farcical results.
During the COVID pandemic, the UK government set a target to source 1 billion pieces of personal protective equipment (PPE) for health workers. Updates on the number of items procured featured prominently in news headlines – it was one of the key statistics used for tracking the nation’s progress. But it emerged later that, in order to meet this impressive target, the government's metric resorted to counting each pair of gloves as two pieces of equipment.
This wasn’t the first time that top-down targets and metrics led to some creative counting in the health system. In the early 2000s, it was even claimed that hospital staff were removing the wheels from trolleys so that they could be reclassified as beds.
System failure
Between the rats in Hanoi, snakes in India and classification of medical equipment in the UK, you might be inclined to think that the issue here isn’t the metrics – it’s people gaming the system.
After all, the Vietnamese rat maimers knew that the real goal should have been to curb the number of infectious rodents, not accumulate the world’s most extensive rat tail collection.
But this pattern of unintended consequences happens too often for it to be dismissed so easily. The problem is the system itself.
History professor and author of The Tyranny of Metrics, Jerry Z Muller, argues that when we fixate on simple numbers to define success, the bigger picture gets lost.
When reward is tied to measured performance, metric fixation invites just this sort of gaming. But metric fixation also leads to a variety of more subtle unintended negative consequences. These include goal displacement, which comes in many varieties: when performance is judged by a few measures, and the stakes are high (keeping one’s job, getting a pay rise or raising the stock price at the time that stock options are vested), people focus on satisfying those measures – often at the expense of other, more important organisational goals that are not measured.
In education, this leads to “teaching to the test”. In other fields, attempts to measure productivity have been shown to stifle initiative and make people more risk-averse, ultimately leading to worse outcomes. As Muller writes:
The source of the trouble is that when people are judged by performance metrics they are incentivised to do what the metrics measure, and what the metrics measure will be some established goal. But that impedes innovation, which means doing something not yet established, indeed that hasn’t even been tried out. Innovation involves experimentation. And experimentation includes the possibility, perhaps probability, of failure.
These attempts to measure success are often doomed to fail. So what can we do about it?
A measured approach
One reason the metric minefield is such an issue is that, simply put, we measure stuff a lot more than we used to. In this information age, we use metrics to inform many of our decisions.
So, should we stop? Well, no: clearly metrics are useful. If we stopped measuring hospital waiting times then we wouldn’t know the scale of the backlog or whether initiatives to get patients treated more quickly were working.
The real question is, how do we ensure we measure what matters and avoid developing tunnel vision?
One metric that has a lot of currency (excuse the pun…) in the world today is Gross Domestic Product (GDP). Simply put, GDP measures the total financial value of all the goods and services produced within a country.
It’s hard to overstate how important this metric is. It dictates the balance of global power. Movements in GDP lead to the rise and fall of governments. Exchange rates, wages, inflation, investments and jobs are all impacted by it.
Such is the pressure on countries to keep this metric up that, in 2014, the UK government decided to broaden how it counted GDP — it started including the proceeds of illegal drug dealing and other criminal enterprises. This added a whopping £10 billion to the UK economy.
What would happen if you took this kind of thinking to its logical conclusion? A government solely focused on GDP would, in theory at least, do well to steer people away from important unpaid work like caring for elderly relatives and encourage them to take up drug dealing instead.
Thankfully, no one has adopted this policy yet. There are many good arguments against focusing too much on GDP. But even putting those aside, it’s generally understood that this isn’t the only measure we need to look at to get a sense of how well a country is faring.
To measure what matters, we need to avoid looking at metrics in isolation.
On a practical, day-to-day level, a lot of this can be solved with one word: while. We want to reduce costs, while maintaining quality; to increase productivity while maintaining reasonable working hours.
Now, that might sound like we’re just adding more metrics, but the key is choosing metrics that are in tension with each other.
It's easy to reduce costs if you're happy to sacrifice quality, or to increase productivity if you don't mind sleeping three hours a night. But by using multiple metrics and understanding the relationship between them, you can reduce your blindspots and maintain a more balanced view.
More than metrics
“O Deep Thought computer," he said, "the task we have designed you to perform is this. We want you to tell us...." he paused, "The Answer."
"The Answer?" said Deep Thought. "The Answer to what?"
"Life!" urged Fook.
"The Universe!" said Lunkwill.
"Everything!" they said in chorus.
— Douglas Adams, The Hitchhiker’s Guide to the Galaxy
After seven-and-a-half million years mulling it over, Deep Thought – the supercomputer in Douglas Adams’ cult classic novel – finally delivers its underwhelming verdict: “The Answer to the Great Question… of Life, the Universe and Everything…is…. Forty-two.”
It’s arguably the best and definitely the most famous joke about the limitation of metrics. Some things defy measurement, however eager we may be to simplify and quantify.
As the sociologist William Bruce Cameron wrote (in a line so good it is often misattributed to Albert Einstein): “Not everything that can be counted counts, and not everything that counts can be counted.”
Metrics use broad strokes and paint an incomplete picture. Acknowledging this is key to making better use of them. Considering the central role that metrics play in our lives today, that has to be essential. Perhaps immeasurably so.
Recommended links and further reading
What data can’t do (New Yorker - subscription required)
Against metrics: how measuring performance by numbers backfires (Aeon)
VW is not alone: how metrics gaming is commonplace in companies (The Conversation)
Time to discard the metric that decides how science is rated (The Conversation)
The actual phrase used by Goodhart in his 1975 critique of UK monetary policy was: “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.” The adage we now know as Goodhart’s Law evolved over a couple of decades from other writers generalising this brilliant insight into something broader (and much more snappy).
The Cobra Effect is the name of a book about perverse incentives by German economist Horst Siebert. It cites the story of the colonial government in India offering a reward for the killing of poisonous snakes. Enterprising locals, it is said, responded to the scheme by breeding more cobras. The evidence that this actually happened is, at best, sketchy.