Programming as Theory Building — Peter Naur (1985)

Peter Naur's essay argues that the primary product of programming is not source code — it is the theory that programmers build in their minds: a deep understanding of how the problem maps to the solution, why design decisions were made, and how the system relates to the real world it models.

Core claims

The theory cannot be fully written down. Documentation, comments, and specs capture fragments, but the living understanding in the programmer's head is richer than any artifact.
When the team leaves, the theory dies. The code remains but becomes increasingly opaque. Modifications made without the theory tend to be wrong in subtle ways — they fix the symptom but violate the underlying logic.
Reconstruction is not revival. A new team reading the old code builds a new, different theory. This is why rewrites often diverge from the original intent in unexpected ways.

The practical implication: programming is fundamentally a human activity of knowledge-building, not text-production. Treating it as text-production — measuring output in lines of code, automating edits — misses what actually makes software work over time.

Tests as theory preservation

Tests are one of the few mechanisms that partially externalise the theory:

A test encodes a specific claim about how the system should behave — a fragment of the theory made executable and verifiable.
A passing test suite tells a new programmer: this is what the system is supposed to do, not just this is what it currently does.
When a change breaks a test, it surfaces a theory violation — the modification contradicts a known requirement — before it reaches production.

Tests do not capture the full theory (they say nothing about why a requirement exists, or how components relate at a higher level), but they are the closest thing to a machine-readable specification of intent.

Full theory (in programmer's head)
    │
    ├── partially captured by → documentation, comments, ADRs
    └── partially captured by → tests  ← executable, automatically verified

This is also why Python is the only domain where LLMs perform reliably in the DELEGATE-52 study: it is the domain where correctness has a mechanical definition and outputs can be verified against a spec. Tests give LLMs — and humans — the same foothold in any domain.

The failure mode without tests

Without tests, the theory degrades silently:

Original team holds the theory — system works and evolves coherently.
Team changes — theory partially lost.
New team modifies code based on their reconstructed (incomplete) theory.
Subtle violations accumulate; the system drifts from its original intent.
Nobody knows what "correct" means anymore — only "it doesn't crash."

Tests interrupt this cycle at step 3 by making some of the original theory's claims non-negotiable.

Sources

Peter Naur — Programming as Theory Building (1985), republished in Computing: A Human Activity (1992)
arXiv 2604.15597 — LLMs Corrupt Your Documents When You Delegate