Skip to content

Programming as Theory Building — Peter Naur (1985)

Peter Naur's essay argues that the primary product of programming is not source code — it is the theory that programmers build in their minds: a deep understanding of how the problem maps to the solution, why design decisions were made, and how the system relates to the real world it models.

Core claims

  • The theory cannot be fully written down. Documentation, comments, and specs capture fragments, but the living understanding in the programmer's head is richer than any artifact.
  • When the team leaves, the theory dies. The code remains but becomes increasingly opaque. Modifications made without the theory tend to be wrong in subtle ways — they fix the symptom but violate the underlying logic.
  • Reconstruction is not revival. A new team reading the old code builds a new, different theory. This is why rewrites often diverge from the original intent in unexpected ways.

The practical implication: programming is fundamentally a human activity of knowledge-building, not text-production. Treating it as text-production — measuring output in lines of code, automating edits — misses what actually makes software work over time.

Tests as theory preservation

Tests are one of the few mechanisms that partially externalise the theory:

  • A test encodes a specific claim about how the system should behave — a fragment of the theory made executable and verifiable.
  • A passing test suite tells a new programmer: this is what the system is supposed to do, not just this is what it currently does.
  • When a change breaks a test, it surfaces a theory violation — the modification contradicts a known requirement — before it reaches production.

Tests do not capture the full theory (they say nothing about why a requirement exists, or how components relate at a higher level), but they are the closest thing to a machine-readable specification of intent.

Full theory (in programmer's head)
    │
    ├── partially captured by → documentation, comments, ADRs
    └── partially captured by → tests  ← executable, automatically verified

This is also why Python is the only domain where LLMs perform reliably in the DELEGATE-52 study: it is the domain where correctness has a mechanical definition and outputs can be verified against a spec. Tests give LLMs — and humans — the same foothold in any domain.

The failure mode without tests

Without tests, the theory degrades silently:

  1. Original team holds the theory — system works and evolves coherently.
  2. Team changes — theory partially lost.
  3. New team modifies code based on their reconstructed (incomplete) theory.
  4. Subtle violations accumulate; the system drifts from its original intent.
  5. Nobody knows what "correct" means anymore — only "it doesn't crash."

Tests interrupt this cycle at step 3 by making some of the original theory's claims non-negotiable.

Sources