Idempotency Is Not Optional
A payment processed twice is not a bug report. It is a regulatory incident, a customer complaint, and depending on the jurisdiction, a compliance violation. Most teams treat idempotency as a hardening pass they will get to later. In practice, "later" means "after the first batch of duplicate charges".
This is not academic. Financial systems, webhook receivers, queue consumers: anywhere "at least once" delivery exists, idempotency is a correctness requirement, not a nice-to-have. Your system will receive duplicate inputs. The only question is whether it handles them gracefully or pretends they cannot happen.
What Idempotency Actually Means
An operation is idempotent if applying it multiple times produces the same result as applying it once. A database write that sets a status to CONFIRMED is idempotent; running it twice leaves the record in the same state. A write that increments a counter by one is not; running it twice produces a different result than running it once.
This is distinct from safety in the "non-modifying" sense. A safe operation is read-only; it does not change state at all. An idempotent operation does change state, but only once. The second application is absorbed.
It is also distinct from exactly-once delivery, which is a property of infrastructure that almost never holds in practice. Networks drop acknowledgements. Workers crash between processing and committing. Webhook providers retry on timeout. The realistic assumption is "at least once", and idempotency is what makes "at least once" behave like "exactly once" from the perspective of your data.
Idempotency is not the same as deduplication. Deduplication discards duplicates before processing: it requires knowing something is a duplicate before you touch it. Idempotency means processing a duplicate is harmless because the system recognises the work is already done. The distinction matters in practice because deduplication fails when duplicates arrive through different paths or with different metadata. Idempotency handles the case regardless of how the duplicate arrived.
The prize is a system you can replay with confidence.
The Idempotency Key
Every write operation has an idempotency key, whether you name it or not.
This is the central insight, and it is worth sitting with for a moment. When you insert a row, something about that row identifies the intended effect. When you update a record, something about the update identifies which change you meant to apply. That "something" is the idempotency key. It answers one question: "has this specific intended effect already been applied?" If you can answer that reliably, you can handle duplicates. If you cannot, you have a correctness gap.
The key is not always a single field. It might be an event ID from an external system. It might be a composite of entity_id + action + timestamp. It might be a caller-provided UUID in a request header. The important thing is that two requests with the same key represent the same intended operation, and two requests with different keys represent different operations, even if the payload is otherwise identical.
Most systems already have idempotency keys buried in their data. The work is recognising them, naming them, and enforcing them.
Worked Example: Idempotent Audit Log
An audit log is a good place to see this in practice because the requirements are strict and the pattern is clean. The log records every state transition in the system. It is append-only: rows are written, never updated or deleted.
The natural idempotency key is the combination of fields that uniquely identify "this specific thing happened": the entity, the transition, and when it occurred.
CREATE TABLE audit_log (
id VARCHAR(100) NOT NULL,
entity_type VARCHAR(50) NOT NULL,
entity_id VARCHAR(100) NOT NULL,
action VARCHAR(50) NOT NULL,
old_status VARCHAR(50),
new_status VARCHAR(50) NOT NULL,
performed_by VARCHAR(100) NOT NULL,
performed_at TIMESTAMP WITH TIME ZONE NOT NULL,
detail JSONB,
PRIMARY KEY (id),
UNIQUE (entity_type, entity_id, action, new_status, performed_at)
);
Notice the two constraints. The id is the row's identity: used for foreign keys, joins, and application code. The unique constraint on (entity_type, entity_id, action, new_status, performed_at) is the idempotency key: it is what catches duplicates. These are not the same thing. The primary key identifies the row. The idempotency key identifies the intent. Sometimes they overlap; a Stripe event ID can serve as both. Often they do not, and designing for both from the start avoids a painful retrofit later.
The write is a single INSERT ... ON CONFLICT:
INSERT INTO audit_log (
entity_type,
entity_id,
action,
old_status,
new_status,
performed_by,
performed_at,
detail
)
VALUES (
'order',
'4821',
'confirmed',
'pending',
'confirmed',
'system',
'2026-03-15T10:30:00Z',
'{"source": "webhook"}'
)
ON CONFLICT (entity_type, entity_id, action, new_status, performed_at)
DO UPDATE SET id = EXCLUDED.id;
The write is unconditional. No check-before-write. No conditional logic. If the row already exists, the unique constraint catches the conflict and DO UPDATE SET id = EXCLUDED.id overwrites the surrogate id. That is not a no-op; the id may genuinely change if a replay generates a new one. But it is harmless: the idempotency key protects the business-meaningful data, and the id is just a handle. One row per transition, always.
Replay an entire day of transitions and the log ends up identical to what it would have been with a single pass. That is idempotency in its purest form.
Audit logs are also the canary for idempotency across the rest of your system. If your audit log contains duplicate entries, something upstream processed a transition twice and did not catch it. A clean, deduplicated audit log is evidence that idempotency is working. This connection is explored further in Audit Trails Done Properly.
Testing idempotency is mechanical: call it twice, assert the same outcome.
def test_process_payment_is_idempotent():
context = make_test_context()
payment = make_payment(amount=100, idempotency_key="pay-001")
first_result = process_payment(payment, context)
second_result = process_payment(payment, context)
assert first_result == second_result
assert context.repositories.payments.count() == 1
The cost of the test is one extra function call and one extra assertion. The cost of the bug it catches is a regulatory incident.
How to Pick Your Key
The idempotency key must uniquely identify the intended effect, not the request, not the message, not the row. Getting this wrong produces either false positives (treating distinct operations as duplicates) or false negatives (failing to catch real duplicates). This is the hardest part of building an idempotent system, and it is worth thinking about carefully.
Natural Keys
The best idempotency keys already exist in your domain. They come from the business problem, not from infrastructure.
- A payment callback from Stripe carries an event ID. That ID is the key. If you receive the same event ID twice, the second is a duplicate.
- An order status change is identified by
order_id + new_status. An order can only be confirmed once; the combination is the natural key for the confirmation operation. - A webhook delivery from a partner system carries a transaction reference. That reference is the key.
- A batch processing run is identified by
batch_id + item_id. If the batch re-runs, each item is individually idempotent.
Natural keys win because they require no coordination between caller and server. They are already present in the data. They are meaningful in the domain, so they appear in logs, dashboards, and debugging sessions. They are the first thing to look for.
Caller-Provided Keys
When the caller controls the retry logic, the server may have no way to distinguish "new request" from "retry of a previous request" based on the payload alone. In this case, the caller generates a unique key (typically a UUID) and sends it with the request, usually in a header such as Idempotency-Key.
The server stores the key alongside the result. On duplicate receipt, it returns the stored result without re-executing. This is the standard pattern for APIs. Stripe uses it. Most payment processors use it. It shifts the responsibility for key generation to the caller, which is appropriate when the server cannot infer intent from the payload.
The risk is straightforward: if the caller generates a new key for each retry (a bug, or a naive integration), the server sees each attempt as a distinct operation. The pattern only works if the caller understands the contract. Document it clearly. For critical endpoints, consider rejecting requests that omit the idempotency key entirely.
Composite Keys
When no single field identifies the intent, construct a key from a combination of fields. customer_id + action_type + date might identify "this customer's daily settlement". source_system + source_id + event_type might identify "this specific event from this specific upstream system".
The discipline is that the composite must be stable across retries. If any component changes between retries (a timestamp that advances, a sequence number that increments), the key changes and the duplicate is not caught. Use fields that describe the intent, not the attempt.
If you genuinely cannot construct a stable key, ask whether the operation needs idempotency at all. Often it does not.
What To Do On Duplicate
The key tells you whether something is a duplicate. The right response depends on context.
Skip: acknowledge and return the previous result. This is appropriate when the caller is waiting for a response: API endpoints, webhook receivers. The server caches the result alongside the idempotency key; on duplicate receipt, it returns the cached result without re-executing. In SQL, INSERT ... ON CONFLICT DO NOTHING is the simplest form.
Pass-through (upsert): write it again; the write is designed so that reapplying it is harmless. INSERT ... ON CONFLICT DO UPDATE SET value = EXCLUDED.value. This is appropriate for internal writes where nobody is waiting for a cached response: queue consumers, batch processing, data pipelines. Replaying an entire batch is a no-op for items already processed and a catch-up for items that were missed. No conditional logic. No error handling for "already done". Just write.
The caveat: upserts work for "set state to X". They do not work for "increment by N" or "append to list". Those operations are inherently non-idempotent unless guarded by a separate check. Know which kind of write you are making.
The Golden System: Replay-Safe Architecture
When every write in a system is idempotent, something fundamental changes in how that system is operated. Replay stops being a risk and becomes a recovery strategy.
- A queue consumer crashes mid-batch. Restart it. It reprocesses items it already handled; the writes are idempotent, so the duplicates are harmless. No forensic investigation into "which items succeeded and which didn't".
- An upstream system sends a batch of webhooks, some of which you already processed. Accept them all. The duplicates are no-ops.
- An incident corrupts some records. Reprocess the source events from the last known-good point. Idempotent writes restore the correct state without double-counting.
- A deployment goes wrong and a worker processes the same input data twice. No impact.
The system that handles duplicates gracefully is simpler to run than the system that must never receive one. The latter requires perfect infrastructure: exactly-once delivery, no crashes at inopportune moments, no duplicate webhook deliveries. That infrastructure does not exist. The former requires only that each write knows its own identity.
Teams that do not build for idempotency end up building compensating complexity instead: deduplication layers, message ID caches, reconciliation jobs that run nightly to detect and fix double-processing. The idempotent system does not need any of this. The upfront cost is thinking carefully about keys. The ongoing cost is near zero.
When Idempotency Gets Hard
Not everything is clean. Two cases are worth naming honestly.
Partial completion. A write succeeds but a downstream side effect (an API call, an email, a webhook delivery) fails. On replay, the write is absorbed (idempotent), but the side effect never happened. The state says "done" but the work is incomplete. The solution is to treat the side effect as its own operation with its own key and its own retry mechanism. The parent write is complete; the child retries independently. This is the same principle as sub-state-machines: give the side effect its own lifecycle rather than entangling it with the parent.
Increments and appends. Operations like "add £10 to balance" or "append item to list" are inherently non-idempotent. Running them twice produces a different result than running them once. They must be reframed: "set balance to £110" is idempotent; "add £10" is not. Alternatively, guard the increment with a key that prevents it from applying twice. This reframing is the single most common design change when retrofitting idempotency into an existing system, and it is worth thinking about from the start.
Summary
Every write has a key. Pick it from the domain when you can, accept it from the caller when you must, construct it from stable fields as a fallback.
Two responses to a duplicate: skip it or write through it. Both absorb the duplicate; the choice depends on whether the caller needs a cached response or not.
The prize is a replay-safe system: operationally simpler, trivially recoverable, and free from the compensating complexity that non-idempotent systems accumulate over time.
If every write is idempotent, the audit log records each intended effect exactly once. Clean input, clean history. The implementation of that audit log is the subject of the next article in the series: Audit Trails Done Properly.
This is part of a series on pragmatic architecture for startups and scaleups.