Data-Driven Hallucinations
The British advertising strategist Rory Sutherland makes an observation worth sitting with: most people, when trying to be rigorous, copy second-rate mathematicians. Second-rate mathematicians think mathematics is about numbers. First-rate mathematicians are barely interested in numbers at all.
Look at what actually happens at the higher levels of the discipline. Abstract algebra, topology, category theory: these fields are almost entirely free of numbers. They deal in structures, relationships, and the properties that survive transformation. A number, when it appears, is almost an implementation detail. The real work is in asking the right question about the right structure.
If your model of rigour is "use more numbers", you have copied the wrong mathematicians.
The Cargo Cult
"Data-driven" entered the corporate lexicon as shorthand for "not making decisions on gut feel". That was a legitimate reaction. Gut-feel decision-making at the executive level, where a single instinct can shape a quarter's priorities, is a real and costly dysfunction. The desire to ground decisions in evidence is sound.
But somewhere along the way the phrase curdled. In practice, "data-driven" has come to mean something closer to:
- We have a dashboard, therefore we are rigorous
- We ran an A/B test, therefore the decision is correct
- The metric went up, therefore we succeeded
None of these are wrong exactly. But they share a hidden assumption: that the right questions are already being asked. They take the question as given and reach straight for the instrument.
A/B testing is a good example. It is a genuinely useful tool; controlled experiments are one of the few ways to make a credible causal claim from product data. But an A/B test can only optimise within your current idea space. It finds the local maximum of the thing you thought to test. You will never A/B test your way to a fundamentally different product, because that would require asking a different question first.
This points at something deeper. Statistics surfaces correlation. What you need, almost always, is causality: not just that a single metric went up after a model retrain, but why. Was it leveraging a previously untapped feature? Was it ignoring dirty data that previously confused results? Armed with a metric alone, one might have very different ideas about what to build next, none of which the metric can surface. Causality requires a hypothesis about mechanism. You have to think about why something would work before you can design a meaningful test of it, and make decisions about what to build. The thinking precedes the measurement. Data cannot supply the hypothesis; it can only evaluate one.
Metrics Define What Exists
The deeper problem is not that the wrong questions are being asked. It is that the measurement system does not just track reality: it constitutes what is real within the organisation. Things that are not measured do not surface as concerns. They accumulate invisibly.
Goodhart's Law gets cited here often: "when a measure becomes a target, it ceases to be a good measure". That explains what happens to a metric once it is targeted. But the prior failure is choosing the metric in the first place. That choice defines the problem space. It determines which things can appear as issues and which cannot.
A single-metric organisation is not just optimising the wrong thing. It is structurally incapable of seeing the things it is not measuring.
ARR and the Invisible Debt
I spent several years working on a highly complex data processing system. The dominant metric was ARR growth (Annual Recurring Revenue), which in daily decision-making meant one question: "can we sell it?"
Technical compromises were made freely, because there was no mechanism to surface their cost against a metric that only asked whether the product could close deals. The system ended up supporting many customers in a genuinely poor technical state. The cleanup cost in the low millions of dollars every year in ongoing infrastructure and maintenance costs, and many millions of dollars more in opportunity cost every year as the difficulty of product improvement cost the business clients; potentially more than the short-term ARR uplift that justified the original compromises.
The number went up. The decisions looked correct. The damage was real but off the ledger.
The problem was not laziness or incompetence. The team was capable and working hard. The problem was that ARR was the only question being asked, so everything else was structurally invisible. Technical health had no representation in the measurement system and could not surface as a concern; the metric had no language for it.
This was also a complex data processing system. Complexity compounds technical debt in ways a simpler system does not. A CRUD application can survive a lot of shortcuts; a data pipeline cannot. The metric did not know that, and nothing in the measurement system asked it to.
The problem space had been artificially collapsed.
Think Before You Instrument
The tempting response is to measure more things. Add technical health to the dashboard. Track debt alongside ARR. This is better than a single metric, but it is still the same mistake: it treats the question as secondary to the instrument.
The first job is to think. To play with the idea before you instrument it. To ask: what are we actually trying to understand here? What would change our minds?
Sutherland calls this alchemy: pursuing interventions that work for reasons you cannot yet fully articulate. That is not unscientific. It is how you find the right question. Intuition, qualitative judgment, customer conversations, code review friction, a founder's sense that something is structurally wrong: these are legitimate inputs. They belong before the dashboard, not after it.
Form the hypothesis before you build the metric. Know what you are testing before you write the instrumentation. The question is the expensive part; data is relatively cheap once you know what you are looking for.
Data-Informed
First-rate mathematicians are not anti-numbers. They use numbers when numbers are the right tool. The discipline is knowing when that is.
"Data-informed" rather than "data-driven" is not a subtle distinction. It is about sequence. The question comes first. The metric serves the question. When the question is answered, the metric has done its job.
This means treating qualitative signals as data: support tickets, user interviews, engineering team friction, the things people stop doing without telling anyone. It means holding a measurement with appropriate scepticism when it disagrees with a strong qualitative signal; sometimes the metric is wrong, or is measuring the wrong thing, or is failing to capture a dynamic that is real but hard to quantify.
And it means, at the organisational level, returning to the question itself on a regular cadence. The organisational retrospective is a good mechanism for this: a structured opportunity to ask not just "did the metric move?" but "are we still asking the right question?".
The measurement system answers the questions you give it. Choosing the questions is the work that cannot be automated.