Harness Engineering | The Chris Frequency

The previous article ended with a question. If the discipline in the age of AI shifts to getting the model right early and deferring implementation, what does that actually look like? It is not just "think harder up front". It is a specific thing you build.

The harness.

The harness is the part of the codebase that does not move while generation does its work. Everything else is interchangeable. The harness is not.

What the Harness Is

The harness is not a testing strategy. It is not a framework. It is a structural discipline: the set of things you hold deliberately stable so that AI generation can proceed safely around them.

It has three components, in order of load-bearing weight:

The model. Your types and immutable domain entities. The vocabulary of the system. If these are wrong, everything generated against them encodes the wrongness.

The interfaces. Provider contracts, the context object, the seams between components. Fakeable by design. The harness is only as strong as the interfaces it defines.

The contract tests. The guarantee that the interfaces mean something. The fake and the real implementation must both pass the same suite. Without this, the harness is load-bearing but unverified.

Each of these does a different job. Together they give AI generation somewhere correct to land.

The Model: Types First

Types are the vocabulary everything else is written in. Get them wrong and the whole system speaks the wrong language. Generation will produce fluent, well-structured, confidently wrong code.

What "types first" means in practice: before you generate implementation, name the things. What are the core entities? What fields do they carry? What are the valid values? These answers need to exist as code, as enums, dataclasses, typed fields, before a line of logic is written.

A weakly typed system is not just a code quality concern in the age of AI. It is a gap in the harness. Stringly-typed fields, implicit states, bare dictionaries passed between functions: these give generation nowhere solid to stand. The AI infers intent from context and produces something plausible. Plausible is not the same as correct, and at generation speed, plausible-but-wrong accretes fast.

Consider the difference:

# Weak: generation guesses, infers, assumes
def process_application(data: dict) -> dict:
    ...

# Strong: generation has a contract to work against
@dataclass(frozen=True)
class Application:
    id: ApplicationId
    customer_id: CustomerId
    status: ApplicationStatus
    submitted_at: datetime

def process_application(application: Application) -> Application:
    ...

The second version tells the generator what an application is, what status means, and what the function promises. The first version leaves all of that implicit. The generator will fill the gap with something, but you have no way to verify it filled it correctly until the logic runs.

Strong types are how you make the model legible to generation. They are the first layer of the harness.

The Interfaces: Fakeable by Design

Every external concern belongs behind an interface the application cannot see through. Storage, queues, third-party APIs, notifications, file stores. If a dependency cannot be faked, the seam is wrong. Fix the coupling, not the test.

This principle has appeared throughout this series, most fully in Fake It While You Make It. The framing here is different. The question is not "how do I make this testable?" It is "what am I committing to hold stable?"

The interface is the harness boundary. What sits behind it is what you are generating freely.

Inside the harness, you are deliberate. You define the interface; you decide what it promises; you maintain it with intention. Outside the harness, behind the interface, you generate freely. You swap implementations, try alternatives, iterate without ceremony, because nothing in the application depends on what is behind the interface: only on what the interface says.

                    Harness boundary
                          |
  Application code        |      External implementations
                          |
  [Domain logic    ] <----|----> [SQLRepository       ]
  [Service layer   ] <----|----> [S3ObjectStore       ]
  [Signal handlers ] <----|----> [StripePaymentClient ]
                          |
                    AI generates freely
                    on both sides, but
                    the boundary itself
                    is held deliberately.

The practical test: can your entire application boot and run with fakes in place of every external dependency? If not, something has leaked through a boundary that should not have. Find it and fix it. The ability to run the full system in-process, against fakes, with a single constructor call, is the diagnostic that tells you the interfaces are doing their job.

Contract Tests: Keeping the Harness Honest

A harness that cannot verify itself is not a harness. It is a convention, and conventions drift.

The contract test is the mechanism that keeps the harness honest. It runs the same test suite against both the fake and the real implementation. If they diverge, something is wrong: either the fake is lying, or the interface does not promise what you thought it did. Either way, you want to find out in a test, not in production.

import pytest

@pytest.fixture(params=["fake", "real"])
def application_repository(request):
    if request.param == "fake":
        return FakeApplicationRepository()
    else:
        return SQLApplicationRepository(test_database_connection())

def test_get_returns_inserted_application(application_repository):
    application = make_application(status=ApplicationStatus.SUBMITTED)
    application_repository.insert(application)
    result = application_repository.get(application.id)
    assert result == application

def test_save_updates_status(application_repository):
    application = make_application(status=ApplicationStatus.SUBMITTED)
    application_repository.insert(application)
    updated = application._replace(status=ApplicationStatus.PROCESSING)
    application_repository.save(updated)
    result = application_repository.get(application.id)
    assert result.status == ApplicationStatus.PROCESSING

Both fixture variants run against every test. Any divergence surfaces as a failure. Over time, the contract test becomes the authoritative definition of what a provider is supposed to do: more authoritative than the interface declaration, because it is executable.

In an AI-assisted codebase, contract tests have an additional role. AI will modify fakes. It will add methods to satisfy a test, adjust behaviour to make something pass, extend an interface because a new feature needed it. Without contract tests, the fake silently drifts from reality. With them, any drift surfaces immediately, before it reaches CI and long before it reaches production.

The real implementation tests require a real database, so they are slower. Run them in CI. Run the fake-only suite locally during development. The split is deliberate: fast feedback during iteration, full contract verification before merge.

The Agentic Flywheel

With all three layers in place, the generation loop changes character.

The agent generates against the model: it speaks the right vocabulary from the start, because the types give it a contract to work against. It generates behind the interfaces: it never touches real infrastructure, because every external concern is behind a boundary it cannot see through. It runs the full suite in milliseconds against fakes: it knows immediately whether it broke something, because the full application executes in-process with no containers and no network.

  Strong types + Clean interfaces + Contract tests
                      |
                      v
           AI generates against the harness
                      |
                      v
         Full suite runs in milliseconds
                      |
                      v
     Drift caught by contract tests in CI
                      |
                      v
              Ship with confidence

This is not a faster version of the old loop. It is a different loop. The feedback cycle that used to run at CI speed (push, wait, read logs, fix) now runs at test speed, locally, on every iteration. An agent can attempt a change, run the suite, see the result, and try again in seconds. The harness is what makes that safe: the agent is not guessing at infrastructure state or inferring behaviour from production logs. It is working against a deterministic, in-process model of the system.

Fast because fakes replace infrastructure. Complete because the full application runs, not unit-tested fragments in isolation. Secure because contract tests mean the fakes are honest, and CI catches what the fast loop misses.

The teams that get the most from AI-assisted development will not be the ones generating the most code. They will be the ones who built the harness first, held it deliberately, and let generation do its work inside it.