Fake It While You Make It
Most teams treat testing as something you do after building. You write the code, wire up the infrastructure, and then figure out how to test it. The result is a test suite that requires a running database, a message broker, a cache, and maybe a stable internet connection in order to properly verify behaviour. Tests that take minutes. Tests that fail intermittently because a container was slow to start, or use a staging database that everyone else uses too. Tests that developers stop running locally and leave to CI.
The discipline described here is different. It is not a testing strategy. It is an architecture strategy: choose your technology boundaries so that every external concern is behind a fakeable interface. The testing speed is a consequence. The architecture quality is the point.
If you cannot fake it, the coupling is the problem. Fix the coupling.
What "Faking" Actually Means
A mock verifies how your code was called. The test arranges expectations up front: did it call verification.verify() with the right arguments, the right number of times? If the internals change, the mock breaks, even if the behaviour did not.
# Mock-based: testing implementation, not behaviour
def test_kyc_approval_sends_welcome_email_mock():
mock_notifications = Mock()
mock_notifications.send_welcome.return_value = None
context = make_test_context(notifications=mock_notifications)
context.repositories.kyc_checks.insert(make_processing_check(CUSTOMER_ID))
approve(CHECK_ID, context)
mock_notifications.send_welcome.assert_called_once_with(CHECK_ID)
# Passes. But if approve() is refactored to call send_welcome_email()
# instead of send_welcome(), this test breaks with no behaviour change.
A fake is something different: a real, working implementation of the same interface with a different backend. It accepts the same types, returns the same types, and behaves consistently. It does not care how it was called; it cares what the outcome was.
# Fake-based: testing behaviour, not implementation
def test_kyc_approval_sends_welcome_email_fake():
context = make_test_context(
notifications=FakeNotificationService(),
)
context.repositories.kyc_checks.insert(make_processing_check(CUSTOMER_ID))
approve(CHECK_ID, context)
assert CUSTOMER_ID in context.notifications.welcome_emails_sent
# Passes regardless of how approve() calls the notification service
# internally. Refactor freely; only a behaviour change breaks this test.
The distinction matters in practice. Mocks are brittle. They are coupled to the implementation rather than the behaviour. When you refactor the internals of a transition, mock-based tests break even if the behaviour is unchanged. Fake-based tests only break if the behaviour actually changes. A mock is a bet on how the code is written. A fake is a bet on what the code does.
Fakes have appeared throughout this series without being named as a strategy. The FakeBlogPostProvider in How to Build a Data Access Layer. The FakeVerificationService and FakeRiskScoringService in Your System is a State Machine. The make_test_context() helper in Strategic Monolith + Satellites. Each article introduced a fake as a side effect of good interface design. This article names the strategy those fakes represent.
Design for Fakability
The question to ask of every external dependency is: can I put a clean interface in front of this?
"External" here means anything that is not your domain logic: databases, queues, third-party APIs, ML services, email providers, file stores. Each one is a candidate for fakability. The test is whether your entire application can boot and run with fakes in place of every one of them.
What this rules out:
- ORMs leaking into application code (the Active Record pattern; a data provider that is also a database object)
- Direct SDK calls scattered through business logic
- Global state or ambient configuration that cannot be overridden per-test
- Side effects that happen at import time rather than at call time
What it enables: the same code path runs in tests and in production. The domain logic, the state machine, the orchestration, the signal handlers: all of it executes. Only what sits behind the interface changes. In tests, a FakeKYCCheckRepository stores records in memory. In production, a SQLKYCCheckRepository stores them in Postgres. The application code never knows the difference.
This is not a novel idea. It is hexagonal architecture, ports and adapters, the provider pattern: all different names for the same underlying discipline. What is worth stating plainly is the consequence: if your application cannot boot against fakes, your architecture has a coupling problem. The test suite is the diagnostic.
Choosing Technologies for Fakability
Not every technology choice is equally fakeable, and the time to think about this is before you commit, not after.
Storage is the most common concern and the most well-understood. The provider pattern wraps every storage touchpoint behind an interface. A FakeKYCCheckRepository backed by an in-memory dict is a few dozen lines and covers the entire test suite. The real SQL implementation is swapped in for CI and production. This is covered in depth elsewhere in this series; the point here is that choosing an ORM that leaks into application code forecloses this option entirely.
File stores follow the same pattern. An ObjectStore interface with upload(key, data) and download(key) methods covers most use cases. The fake holds files in a dict. The real implementation writes to S3, GCS, or a local filesystem. The application never knows which.
Third-party APIs are wrappers by default. A VerificationService interface hides whether you are calling Onfido, Jumio, or a stub that returns predetermined results. This is also where the fake pays operational dividends: running a full workflow in CI without burning API credits or triggering rate limits.
Queues are a natural fit for the same provider pattern as storage. An EventQueue interface with enqueue(item) and dequeue() methods can be backed by a fake that holds items in a list. The application enqueues work and the worker dequeues it; in tests, both sides run in the same process with no polling delay, no broker, and no message serialisation to reason about. Whether the real implementation is a database-backed queue or a dedicated broker is irrelevant to the test.
Large-scale data processing frameworks require deliberate thought. Apache Spark can spin up a local in-process cluster for testing, for example, which means it is not untestable; but the startup overhead is real, and if Spark is woven through the core of your application rather than isolated behind a satellite boundary, that overhead lands on every test run. If you genuinely need a framework like this, think carefully about how you will keep the feedback loop fast before committing to it. It is solvable, but it is not free, and it is the kind of thing that is much easier to design for upfront than to retrofit.
The front end is worth thinking about separately, because it is the layer most teams abandon to browser automation. Playwright and Selenium are powerful but slow, fragile against DOM changes, and expensive to maintain. The alternative is to choose a rendering approach that keeps HTML generation inside the application process, where a test client can reach it directly as much as possible.
Low-JavaScript server-rendered approaches make this possible. HTMX is the well-known option: HTML attributes that trigger server requests, with the server returning HTML fragments. DataStar goes further: a small (~14kb) declarative library that replaces the component model of a traditional SPA with server-sent HTML and reactive signals, with no build tooling required, easy to integrate and very fast. Both approaches mean most user interactions are a server request and a rendered HTML fragment; in tests, that is a test client call and a string assertion. No browser, no Playwright, no flaky timeout waiting for a React re-render.
The principle behind all of this is simple: if in doubt, wrap it. Every external concern is a candidate for an interface. If a technology cannot be wrapped at all, consider whether a different technology choice would serve the same need and be testable. The test suite you can trust is worth more than the SDK that saved two days of integration work.
End-to-End Tests at Unit Test Speed
Once every external concern sits behind a fakeable interface, something changes about what "end-to-end" means.
An end-to-end test in the traditional sense exercises the full stack: a real HTTP request hits a real server, which talks to a real database, which might call a real external API. It is slow, fragile, and expensive to maintain. Most teams run a small number of them and accept the unreliability as a cost of coverage.
With a fully fake context, an "end-to-end" test is a test client call against a real in-process application instance:
Consider a KYC onboarding submission: an analyst logs in, uploads a customer's documents, and the system runs identity verification, scores risk, writes an audit trail, and sends a welcome notification if approved. In a conventional integration test this requires a running web server, a database, stub servers for the verification and risk scoring APIs, and an email sink. In practice most teams skip it and test the pieces in isolation, accepting the coverage gap.
With a fake context, the entire flow is a test client call:
def test_kyc_submission_triggers_full_onboarding_workflow():
context = make_test_context(
verification=FakeVerificationService({
CUSTOMER_ID: VerificationResult(approved=True),
}),
risk_scoring=FakeRiskScoringService(default_score=82),
notifications=FakeNotificationService(),
)
app = create_app(context)
client = app.test_client()
client.post("/login", data={"username": "analyst@firm.com", "password": "test"})
response = client.post("/kyc/submit", data={
"customer_id": CUSTOMER_ID,
"document_type": "passport",
"document_ref": "P4712930",
})
assert response.status_code == 200
assert b"Application approved" in response.data
# Full downstream workflow executed against fakes
check = context.repositories.kyc_checks.get_for_customer(CUSTOMER_ID)
assert check.status == KYCStatus.APPROVED
assert context.repositories.kyc_checks.get_risk_score(check.id) == 82
# Audit trail written
audit = context.repositories.audit_log.get_for_entity(check.id)
assert [e.action for e in audit] == ["submitted", "processing_started", "approved"]
# Welcome notification dispatched
assert CUSTOMER_ID in context.notifications.welcome_emails_sent
This test exercises the HTTP layer, the authentication check, the KYC workflow, three calls on dependencies, the audit log writes, and the notification dispatch. Every signal handler fires. Every repository write happens. The HTML response is verified. No containers. No network. No shared staging database. Execution time: milliseconds.
The fake context is the spec. If the workflow produces the correct outcome against fakes, the domain logic and the abstractions are correct. Whether the real implementations behave consistently with the fakes is a separate question, answered by contract tests.
The Contract Test: Keeping Fakes Honest
A fake that diverges from the real implementation is worse than no fake. It gives you false confidence: your tests pass, your fakes agree, and then production behaves differently because the real provider has a subtly different behaviour you did not capture.
The solution is a parameterised contract test: run the same test suite against both the fake and the real provider.
import pytest
@pytest.fixture(params=["fake", "real"])
def kyc_check_repository(request):
if request.param == "fake":
return FakeKYCCheckRepository()
else:
return SQLKYCCheckRepository(test_database_connection())
def test_get_returns_inserted_check(kyc_check_repository):
check = make_kyc_check(status=KYCStatus.SUBMITTED)
kyc_check_repository.insert(check)
result = kyc_check_repository.get(check.id)
assert result == check
def test_save_updates_status(kyc_check_repository):
check = make_kyc_check(status=KYCStatus.SUBMITTED)
kyc_check_repository.insert(check)
updated = check._replace(status=KYCStatus.PROCESSING)
kyc_check_repository.save(updated)
result = kyc_check_repository.get(check.id)
assert result.status == KYCStatus.PROCESSING
Both fixture variants run against every test in the suite. If the fake and the real provider diverge on any test, you have found either a bug in the fake or a misunderstanding of the real thing. Either way, you want to know.
The contract test has a secondary benefit: it forces precision about what the interface actually promises. Tests that pass against the fake but fail against the real implementation reveal implicit assumptions you had not made explicit. Fixing them tightens the contract and makes the fake more accurate. Over time, the contract test becomes the authoritative definition of what a provider is supposed to do.
The real provider tests do require a real database, which means they are slower. Run them in CI against a test instance. Run the fake-only suite locally during development. The split is explicit and deliberate: fast feedback during iteration, full contract verification before merge.
The upshot is that the distinction between "unit test" and "end-to-end test" collapses. Both are function calls. Both run in milliseconds. Both give you a stack trace when they fail, pointing at the line that broke, not at a container log you have to go find. A suite of hundreds of these tests runs in seconds. Developers run it on every save. The feedback loop that used to be "push and wait for CI" becomes immediate.
The Path to Continuous Deployment
Continuous deployment is one of those goals that most teams agree with in principle and struggle to reach in practice. The bottleneck is almost always confidence: the test suite does not cover enough, or it takes too long, or it is unreliable enough that a red build no longer means something is broken. Any of these conditions forces a human gate before production. The gate accumulates. Deployments become events rather than routine.
Fakability removes each of these blockers directly.
Coverage. When the full application stack can be exercised in a test with a function call, coverage stops being a trade-off against speed. The KYC submission test above covers the HTTP layer, authentication, the workflow, three dependencies, the audit trail, and the notification in milliseconds. You can afford to write tests like this for every significant path because they cost almost nothing to run.
Speed. The fake suite runs in seconds. Contract tests, which do touch real infrastructure, run in minutes but are scoped: they test the provider contracts in isolation, not the application workflows. A CI pipeline with both tiers completes in the time most teams spend waiting for their first container to start.
Reliability. A test against fakes cannot fail because a container was slow, a staging database had dirty data from another developer's run, or an external API returned a 429. The only infrastructure-related reason a fake-based test fails is that the application logic is wrong. When a red build reliably means broken code, you trust it. When you trust it, you can act on it automatically.
The resulting pipeline is straightforward: the fake suite runs on every push, typically in under a minute. Contract tests run alongside it in CI, verifying that real implementations still honour their contracts. If both pass, the build is a candidate for deployment. No manual testing phase. No release manager coordinating a deployment window. No "we'll do a proper test pass on Friday."
This is not a novel idea. It is how the teams that deploy dozens of times per day operate. The architecture that enables it is not exotic either: clean interfaces, injectable dependencies, fakes maintained as first-class implementations. The same discipline, applied consistently, produces a pipeline where deployment is the automatic consequence of a green build rather than a separate, anxiety-inducing activity.
The AI Agent Angle
AI agents need a bounded, deterministic context to operate reliably.
This is not theoretical. OpenAI's engineering team documented it directly in Harness Engineering: strict architectural boundaries with clear layering are "an early prerequisite" for agent-driven development. An agent that calls real infrastructure encounters flaky tests, rate limits, side effects from previous runs, and state that does not reset cleanly between iterations. The unpredictability compounds. The agent hallucinates about infrastructure state. The iteration loop slows to the speed of production.
The fake context is the harness. An agent working against make_test_context() calls real application code, exercises real domain logic, and sees real domain behaviour. It never touches real infrastructure. Side effects are contained. State resets between runs. Determinism is guaranteed.
The same discipline that makes developers fast makes agents reliable. A codebase where every external dependency is behind a clean interface, where the full system can be exercised with a constructor call, where fakes are maintained as first-class implementations: that is a codebase an agent can work with. Complexity wastes context. Coupling wastes time. Fakability is what removes both.
This axis is new. Two years ago, "AI legibility" was not a consideration in architecture decisions. It will matter more every year.
Summary
Fakability is an architecture discipline, not a testing trick. Design your boundaries so that every external concern sits behind an interface your application cannot see through. If the application cannot boot against fakes, the coupling is the problem.
The payoff is layered:
- Fast feedback during development. End-to-end coverage at unit test speed, no infrastructure required.
- Reliable abstractions. Contract tests keep fakes honest and make the interface contract explicit.
- Continuous deployment. A fast, reliable, comprehensive test suite removes the human gate before production. Deployment becomes the automatic consequence of a green build.
- Agent-ready codebases. Deterministic, bounded execution contexts that let AI agents iterate without hitting real infrastructure.
The fakes in this series have been present from the start. Name the strategy, enforce the discipline, and the test suite becomes something you trust rather than something you tolerate.