Programming as Theory Building — Peter Naur (1985)
Peter Naur's essay argues that the primary product of programming is not source code — it is the theory that programmers build in their minds: a deep understanding of how the problem maps to the solution, why design decisions were made, and how the system relates to the real world it models.
Core claims
- The theory cannot be fully written down. Documentation, comments, and specs capture fragments, but the living understanding in the programmer's head is richer than any artifact.
- When the team leaves, the theory dies. The code remains but becomes increasingly opaque. Modifications made without the theory tend to be wrong in subtle ways — they fix the symptom but violate the underlying logic.
- Reconstruction is not revival. A new team reading the old code builds a new, different theory. This is why rewrites often diverge from the original intent in unexpected ways.
The practical implication: programming is fundamentally a human activity of knowledge-building, not text-production. Treating it as text-production — measuring output in lines of code, automating edits — misses what actually makes software work over time.
Tests as theory preservation
Tests are one of the few mechanisms that partially externalise the theory:
- A test encodes a specific claim about how the system should behave — a fragment of the theory made executable and verifiable.
- A passing test suite tells a new programmer: this is what the system is supposed to do, not just this is what it currently does.
- When a change breaks a test, it surfaces a theory violation — the modification contradicts a known requirement — before it reaches production.
Tests do not capture the full theory (they say nothing about why a requirement exists, or how components relate at a higher level), but they are the closest thing to a machine-readable specification of intent.
Full theory (in programmer's head)
│
├── partially captured by → documentation, comments, ADRs
└── partially captured by → tests ← executable, automatically verified
This is also why Python is the only domain where LLMs perform reliably in the DELEGATE-52 study: it is the domain where correctness has a mechanical definition and outputs can be verified against a spec. Tests give LLMs — and humans — the same foothold in any domain.
The failure mode without tests
Without tests, the theory degrades silently:
- Original team holds the theory — system works and evolves coherently.
- Team changes — theory partially lost.
- New team modifies code based on their reconstructed (incomplete) theory.
- Subtle violations accumulate; the system drifts from its original intent.
- Nobody knows what "correct" means anymore — only "it doesn't crash."
Tests interrupt this cycle at step 3 by making some of the original theory's claims non-negotiable.
Sources
- Peter Naur — Programming as Theory Building (1985), republished in Computing: A Human Activity (1992)
- arXiv 2604.15597 — LLMs Corrupt Your Documents When You Delegate