Idempotency is the only thing standing between you and a duplicate refund

Payment systems get retried. Networks fail mid-call. Phones rotate. The difference between a clean ledger and a customer-support nightmare is whether your endpoints are honestly idempotent.

The first time we saw a duplicate refund issued in production was almost ten years ago. A mobile client sent a refund request. The server processed it. The 200 response never made it back to the phone — the user had walked into a tunnel.

The phone retried, as phones do. The server processed it again. The customer got two refunds. The merchant noticed three months later. The conversation was unpleasant.

The fix took ten minutes. The prevention takes thought every single time a write endpoint is designed. Most teams build a real idempotency layer the first time they get burned. Some get burned twice.

What idempotent actually means

An endpoint is idempotent when calling it twice with the same input produces the same effect as calling it once. Not the same response — the same effect on the world. This distinction matters.

A GET /balance endpoint is idempotent by accident, because reading does not change anything. A POST /refund endpoint is idempotent only if you make it so. There is no language feature that does this for you.

The standard pattern is the idempotency key: the client generates a unique identifier per logical operation and sends it as a header. The server records the key the first time it sees it, stores the result, and on any retry with the same key, returns the stored result instead of re-executing.

Where the implementation gets subtle

The pattern sounds simple. Three places where it usually gets it wrong.

Storing the key after the work, not before. The naive implementation writes the operation, then stores the idempotency key. If the server crashes between those two writes, the next retry will re-execute. Store the key first, in the same transaction as the operation, or use a database that guarantees the two writes are atomic.

Storing the response but not the operation state. A retry has to return the same response, but it also has to know whether the original operation actually completed or is still in-flight. Storing just the response loses the in-flight state, and a slow original plus a fast retry can race. The idempotency record needs three states: in-progress, complete-with-response, failed.

Not bounding the key's lifetime. Idempotency keys live forever in the naive design, and that table grows without bound. Bound them — a 24-hour window is generous for most payment use cases — and document the bound so clients know they need to use fresh keys for retries beyond that window.

What clients are responsible for

The server can do everything right and still get duplicate writes if the client retries with a new key on each attempt. The contract has to extend to the client side.

The client generates a UUID per logical operation, not per HTTP attempt. Same logical refund = same key, no matter how many network retries it takes to deliver. Persist that key locally until the operation either completes or hits the bound.

This is the part that gets missed in mobile SDKs surprisingly often, because the network layer is treated as something to retry transparently. Idempotency keys leak out of the network layer and into the application layer. They are not a transport concern; they are a domain concern.

When to start

Now. Before the first duplicate refund. We have seen teams say "we will add it before launch" and then push back the deadline for launch reasons; the idempotency work slips, the first incident teaches them why it was important, and the cleanup is messy because there are now records in production that were created without keys.

It is one of the cheaper habits to start with, and one of the more expensive ones to retrofit.

We want to hear your thoughts.

A senior engineer reads every message — no SDR funnel.

← Previous

FHIR did not solve healthcare interoperability

Staff augmentation is the most expensive cheap option