Fake It While You Make It | The Chris Frequency

Most teams treat testing as something you do after building. You write the code, wire up the infrastructure, and then figure out how to test it. The result is a test suite that requires a running database, a message broker, a cache, and maybe a stable internet connection in order to properly verify behaviour. Tests that take minutes. Tests that fail intermittently because a container was slow to start, or use a staging database that everyone else uses too. Tests that developers stop running locally and leave to CI.

The discipline described here is different. It is not a testing strategy. It is an architecture strategy: choose your technology boundaries so that every external concern is behind a fakeable interface. The testing speed is a consequence. The architecture quality is the point.

If you cannot fake it, the coupling is the problem. Fix the coupling.

What "Faking" Actually Means

A fake is a real, working implementation of an interface with an in-memory backend. Here is one:

class FakeObjectStore:
    def __init__(self):
        self._store: dict[str, bytes] = {}

    def upload(self, key: str, data: bytes) -> None:
        self._store[key] = data

    def download(self, key: str) -> bytes:
        return self._store[key]

That is the whole thing. It satisfies the same interface as the real ObjectStore, accepts the same types, and behaves consistently. No network, no credentials, no cloud SDK.

A mock verifies how your code was called. The test arranges expectations up front: did it call upload() with the right arguments, the right number of times? If the internals change, the mock breaks, even if the behaviour did not.

# Mock-based: testing implementation, not behaviour
def test_kyc_approval_stores_document():
    mock_store = Mock()
    context = make_test_context(object_store=mock_store)
    context.repositories.kyc_checks.insert(make_processing_check(CUSTOMER_ID))

    approve(CHECK_ID, context)

    mock_store.upload.assert_called_once_with(f"kyc/{CHECK_ID}/report.pdf", ANY)
    # Passes. But the mock has no state — you cannot verify the file is
    # actually retrievable, only that upload() was called.

A fake is something different: a real, working implementation of the same interface with a different backend. It does not care how it was called; it cares what the outcome was. And because it has state, you can test the full round-trip:

# Fake-based: testing behaviour, not implementation
def test_kyc_approval_stores_document():
    context = make_test_context(
        object_store=FakeObjectStore(),
    )
    context.repositories.kyc_checks.insert(make_processing_check(CUSTOMER_ID))

    approve(CHECK_ID, context)

    # The file is actually there and retrievable — not just "upload was called"
    report = context.object_store.download(f"kyc/{CHECK_ID}/report.pdf")
    assert report is not None
    assert len(report) > 0

The distinction matters in practice. A mock can only record calls; it cannot answer "does the system behave correctly?" A fake can. Mocks are also brittle: they are coupled to the implementation rather than the behaviour. When you refactor the internals of a transition, mock-based tests break even if the behaviour is unchanged. Fake-based tests only break if the behaviour actually changes. A mock is a bet on how the code is written. A fake is a bet on what the code does.

Fakes have appeared throughout this series without being named as a strategy. The FakeBlogPostProvider in How to Build a Data Access Layer. The FakeVerificationService and FakeRiskScoringService in Your System is a State Machine. The make_test_context() helper in Strategic Monolith + Satellites. Each article introduced a fake as a side effect of good interface design. This article names the strategy those fakes represent.

Design for Fakability

The question to ask of every external dependency is: can I put a clean interface in front of this?

"External" here means anything that is not your domain logic: databases, queues, third-party APIs, ML services, email providers, file stores. Each one is a candidate for fakability. The test is whether your entire application can boot and run with fakes in place of every one of them.

What this rules out:

ORMs leaking into application code (the Active Record pattern; a data provider that is also a database object)
Direct SDK calls scattered through business logic
Global state or ambient configuration that cannot be overridden per-test
Side effects that happen at import time rather than at call time

What it enables: the same code path runs in tests and in production. The domain logic, the state machine, the orchestration, the signal handlers: all of it executes. Only what sits behind the interface changes. In tests, a FakeKYCCheckRepository stores records in memory. In production, a SQLKYCCheckRepository stores them in Postgres. The application code never knows the difference.

This is not a novel idea. It is hexagonal architecture, ports and adapters, the provider pattern: all different names for the same underlying discipline. The test suite is the diagnostic.

Choosing Technologies for Fakability

Not every technology choice is equally fakeable, and the time to think about this is before you commit, not after.

Storage is the most common concern and the most well-understood. The provider pattern wraps every storage touchpoint behind an interface. A FakeKYCCheckRepository backed by an in-memory dict is a few dozen lines and covers the entire test suite. The real SQL implementation is swapped in for CI and production. This is covered in depth elsewhere in this series; the point here is that choosing an ORM that leaks into application code forecloses this option entirely.

File stores follow the same pattern. An ObjectStore interface with upload(key, data) and download(key) methods covers most use cases. The fake holds files in a dict. The real implementation writes to S3, GCS, or a local filesystem. The application never knows which.

Third-party APIs are wrappers by default. A VerificationService interface hides whether you are calling Onfido, Jumio, or a stub that returns predetermined results. This is also where the fake pays operational dividends: running a full workflow in CI without burning API credits or triggering rate limits.

Queues are a natural fit for the same provider pattern as storage. An EventQueue interface with enqueue(item) and dequeue() methods can be backed by a fake that holds items in a list. The application enqueues work and the worker dequeues it; in tests, both sides run in the same process with no polling delay, no broker, and no message serialisation to reason about. Whether the real implementation is a database-backed queue or a dedicated broker is irrelevant to the test.

Large-scale data processing frameworks require deliberate thought. Apache Spark can spin up a local in-process cluster for testing, for example, which means it is not untestable; but the startup overhead is real, and if Spark is woven through the core of your application rather than isolated behind a satellite boundary, that overhead lands on every test run. If you genuinely need a framework like this, think carefully about how you will keep the feedback loop fast before committing to it. It is solvable, but it is not free, and it is the kind of thing that is much easier to design for upfront than to retrofit.

The front end is worth thinking about separately, because it is the layer most teams abandon to browser automation. Playwright and Selenium are powerful but slow, fragile against DOM changes, and expensive to maintain. The alternative is to choose a rendering approach that keeps HTML generation inside the application process, where a test client can reach it directly as much as possible.

Low-JavaScript server-rendered approaches make this possible. At the time of writing, HTMX is the well-known option: HTML attributes that trigger server requests, with the server returning HTML fragments. DataStar goes further: a small (~14kb) declarative library that replaces the component model of a traditional SPA with server-sent HTML and reactive signals, with no build tooling required, easy to integrate and very fast. Both approaches mean most user interactions are a server request and a rendered HTML fragment; in tests, that is a test client call and a string assertion. No browser, no Playwright, no flaky timeout waiting for a React re-render.

The principle behind all of this is simple: if in doubt, wrap it. Every external concern is a candidate for an interface. If a technology cannot be wrapped at all, consider whether a different technology choice would serve the same need and be testable. The test suite you can trust is worth more than the SDK that saved two days of integration work.

End-to-End Tests at Unit Test Speed

Once every external concern sits behind a fakeable interface, something changes about what "end-to-end" means.

An end-to-end test in the traditional sense exercises the full stack: a real HTTP request hits a real server, which talks to a real database, which might call a real external API. It is slow, fragile, and expensive to maintain. Most teams run a small number of them and accept the unreliability as a cost of coverage.

With a fully fake context, an "end-to-end" test is a test client call against a real in-process application instance:

Consider a KYC onboarding submission: an analyst logs in, uploads a customer's documents, and the system runs identity verification, scores risk, writes an audit trail, and sends a welcome notification if approved. In a conventional integration test this requires a running web server, a database, stub servers for the verification and risk scoring APIs, and an email sink. In practice most teams skip it and test the pieces in isolation, accepting the coverage gap.

With a fake context, the entire flow is a test client call:

def test_kyc_submission_triggers_full_onboarding_workflow():
    context = make_test_context(
        verification=FakeVerificationService({
            CUSTOMER_ID: VerificationResult(approved=True),
        }),
        risk_scoring=FakeRiskScoringService(default_score=82),
        notifications=FakeNotificationService(),
    )
    app = create_app(context)
    client = app.test_client()

    client.post("/login", data={"username": "analyst@firm.com", "password": "test"})

    response = client.post("/kyc/submit", data={
        "customer_id": CUSTOMER_ID,
        "document_type": "passport",
        "document_ref": "P4712930",
    })

    assert response.status_code == 200
    assert b"Application approved" in response.data

    # Full downstream workflow executed against fakes
    check = context.repositories.kyc_checks.get_for_customer(CUSTOMER_ID)
    assert check.status == KYCStatus.APPROVED
    assert context.repositories.kyc_checks.get_risk_score(check.id) == 82

    # Audit trail written
    audit = context.repositories.audit_log.get_for_entity(check.id)
    assert [e.action for e in audit] == ["submitted", "processing_started", "approved"]

    # Welcome notification dispatched
    assert CUSTOMER_ID in context.notifications.welcome_emails_sent

This test exercises the HTTP layer, the authentication check, the KYC workflow, three calls on dependencies, the audit log writes, and the notification dispatch. Every signal handler fires. Every repository write happens. The HTML response is verified. No containers. No network. No shared staging database. Execution time: milliseconds.

The fake context is the spec. If the workflow produces the correct outcome against fakes, the domain logic and the abstractions are correct. Whether the real implementations behave consistently with the fakes is a separate question, answered by contract tests.

The Contract Test: Keeping Fakes Honest

A fake that diverges from the real implementation is worse than no fake. It gives you false confidence: your tests pass, your fakes agree, and then production behaves differently because the real provider has a subtly different behaviour you did not capture.

The solution is a parameterised contract test: run the same test suite against both the fake and the real provider.

import pytest

@pytest.fixture(params=["fake", "real"])
def kyc_check_repository(request):
    if request.param == "fake":
        return FakeKYCCheckRepository()
    else:
        return SQLKYCCheckRepository(test_database_connection())

def test_get_returns_inserted_check(kyc_check_repository):
    check = make_kyc_check(status=KYCStatus.SUBMITTED)
    kyc_check_repository.insert(check)
    result = kyc_check_repository.get(check.id)
    assert result == check

def test_save_updates_status(kyc_check_repository):
    check = make_kyc_check(status=KYCStatus.SUBMITTED)
    kyc_check_repository.insert(check)
    updated = check._replace(status=KYCStatus.PROCESSING)
    kyc_check_repository.save(updated)
    result = kyc_check_repository.get(check.id)
    assert result.status == KYCStatus.PROCESSING

Both fixture variants run against every test in the suite. If the fake and the real provider diverge on any test, you have found either a bug in the fake or a misunderstanding of the real thing. Either way, you want to know.

The contract test has a secondary benefit: it forces precision about what the interface actually promises. Tests that pass against the fake but fail against the real implementation reveal implicit assumptions you had not made explicit. Fixing them tightens the contract and makes the fake more accurate. Over time, the contract test becomes the authoritative definition of what a provider is supposed to do.

The real provider tests do require a real database, which means they are slower. Run them in CI against a test instance. Run the fake-only suite locally during development. The split is explicit and deliberate: fast feedback during iteration, full contract verification before merge.

The upshot is that the distinction between "unit test" and "end-to-end test" collapses. Both are function calls. Both run in milliseconds. Both give you a stack trace when they fail, pointing at the line that broke, not at a container log you have to go find. A suite of hundreds of these tests runs in seconds. Developers run it on every save. The feedback loop that used to be "push and wait for CI" becomes immediate.

The Path to Continuous Deployment

Continuous deployment is one of those goals that most teams agree with in principle and struggle to reach in practice. The bottleneck is almost always confidence: the test suite does not cover enough, or it takes too long, or it is unreliable enough that a red build no longer means something is broken. Any of these conditions forces a human gate before production. The gate accumulates. Deployments become events rather than routine.

Fakability removes each of these blockers directly. The fake suite runs in seconds. Contract tests run alongside in CI, scoped to verifying provider contracts in isolation. A test against fakes cannot fail due to a slow container, dirty staging data, or an external API returning a 429; the only infrastructure-related reason a fake-based test fails is that the application logic is wrong. When a red build reliably means broken code, you trust it. When you trust it, you can act on it automatically.

The resulting pipeline is straightforward: fake suite on every push, contract tests in CI, and if both pass the build is a candidate for deployment. No manual testing phase. No release manager coordinating a deployment window. No "we'll do a proper test pass on Friday." Deployment becomes the automatic consequence of a green build rather than a separate, anxiety-inducing activity.

The AI Agent Angle

AI agents need a bounded, deterministic context to operate reliably.

This is not theoretical. OpenAI's engineering team documented it directly in Harness Engineering: strict architectural boundaries with clear layering are "an early prerequisite" for agent-driven development. An agent that calls real infrastructure encounters flaky tests, rate limits, side effects from previous runs, and state that does not reset cleanly between iterations. The unpredictability compounds. The agent hallucinates about infrastructure state. The iteration loop slows to the speed of production.

The fake context is the harness. An agent working against make_test_context() calls real application code, exercises real domain logic, and sees real domain behaviour. It never touches real infrastructure. Side effects are contained. State resets between runs. Determinism is guaranteed.

The same discipline that makes developers fast makes agents reliable. A codebase where every external dependency is behind a clean interface, where the full system can be exercised with a constructor call, where fakes are maintained as first-class implementations: that is a codebase an agent can work with. Complexity wastes context. Coupling wastes time. Fakability is what removes both.

This axis is new. Two years ago, "AI legibility" was not a consideration in architecture decisions. It will matter more every year.

Summary

Design your boundaries so that every external concern sits behind an interface your application cannot see through. The payoff is layered: end-to-end coverage at unit test speed; contract tests that keep fakes honest and make the interface explicit; a pipeline fast and reliable enough to remove the human gate before production; and a codebase legible enough for an AI agent to iterate against without hitting real infrastructure.

The fakes in this series have been present from the start. Name the strategy, enforce the discipline, and the test suite becomes something you trust rather than something you tolerate.