Idempotency Is Easy Until the Second Request Is Different

A post made the Hacker News front page this month with a title that I have wanted to write on a whiteboard for years: Idempotency Is Easy Until the Second Request Is Different (HN discussion, 160 points, 79 comments). The author's thesis is short and correct. Putting an Idempotency-Key on a request, storing the response and replaying it on retry is the easy part. It survives the demo. The hard part starts with the second request, because the second request is not always a clean replay of the first one.

I have shipped this layer for a payment integration on a .NET stack, and I have also debugged the version somebody built in an afternoon and forgot about. So let me take the article's framing and pin it to concrete, runnable C#. The mistakes are the same in every language; the fixes have a particular shape in ASP.NET Core and EF Core.

The replay cache that everyone ships first

Here is the implementation that passes every single-threaded test and fails in production:

public async Task<IResult> CreatePayment(PaymentRequest req, string idempotencyKey)
{
    var existing = await _db.IdempotencyRecords
        .FirstOrDefaultAsync(r => r.Key == idempotencyKey);

    if (existing is not null)
        return Results.Json(existing.ResponseBody, statusCode: existing.ResponseStatus);

    var payment = await _payments.CreateAsync(req);
    var record = new IdempotencyRecord(idempotencyKey, 201, payment);
    _db.IdempotencyRecords.Add(record);
    await _db.SaveChangesAsync();

    return Results.Json(payment, statusCode: 201);
}

Three bugs hide in plain sight. There is a read-then-write race: two requests with the same key can both see null and both create a payment. There is no memory of what the first command meant, so a second request with the same key but a different amount silently replays the wrong response. And there is no concept of "in progress", so a retry that lands while the first call is still talking to the payment provider has no defined behaviour.

The article lists the cases a replay cache does not explain, and they are worth memorising: completed replay, concurrent retry, partial local success, downstream unknown state, same key with a different command, duplicate without a key, retry after expiry, and retry after a deploy or region failover. If your design only handles "same command, completed", you built a cache, not an idempotency layer.

Step one: an atomic insert decides who owns execution

The single most important fix is to stop reading before writing. Let the database arbitrate ownership with a unique constraint, and insert first. In EF Core that means modelling a scoped composite key — never a globally unique idempotency key, because a broken client generating abc-123 should only ever collide with itself, not with another tenant.

public class IdempotencyRecord
{
    public required string TenantId { get; init; }
    public required string Operation { get; init; }   // "create_payment"
    public required string Key { get; init; }
    public required string RequestHash { get; init; }
    public IdempotencyStatus Status { get; set; }
    public int? ResponseStatus { get; set; }
    public string? ResponseBody { get; set; }
    public string? ResourceId { get; set; }
    public DateTimeOffset CreatedAt { get; init; }
    public DateTimeOffset ExpiresAt { get; init; }
    public DateTimeOffset? LockedUntil { get; set; }
}

// OnModelCreating
modelBuilder.Entity<IdempotencyRecord>()
    .HasKey(r => new { r.TenantId, r.Operation, r.Key });

public enum IdempotencyStatus
{
    InProgress,
    Completed,
    FailedReplayable,
    FailedRetryable,
    UnknownRequiresRecovery
}

The acquisition step is an insert that swallows the conflict. On PostgreSQL with Npgsql this is ON CONFLICT DO NOTHING; the row count tells you whether you won the race:

const string sql = """
    INSERT INTO idempotency_records
        (tenant_id, operation, key, request_hash, status,
         created_at, expires_at, locked_until)
    VALUES
        ({0}, 'create_payment', {1}, {2}, 'InProgress',
         {3}, {4}, {5})
    ON CONFLICT (tenant_id, operation, key) DO NOTHING;
    """;

var inserted = await _db.Database.ExecuteSqlRawAsync(sql,
    tenantId, key, requestHash,
    now, now.AddHours(24), now.AddSeconds(30));

if (inserted == 1)
{
    // We own execution. Proceed.
}
else
{
    // Someone else owns it. Inspect the existing record and decide.
}

If you are on SQL Server, the equivalent is letting a unique index throw and catching DbUpdateException with SQL error 2627/2601 — uglier, but the same idea: one writer wins, atomically. Do not emulate this with SELECT then INSERT. Do not emulate it with a lock statement in C#; your process is not the only one serving traffic.

A Redis SET key value NX EX 30 is frequently proposed as the whole solution. It is not. At best it is an execution guard that reduces concurrent duplicates. If the lock expires while the provider call is still running, another request enters. If the process dies after the provider succeeded but before you stored the outcome, the lock tells the retry nothing. Redis can help; it is not durable memory of what happened.

Step two: hash the command, not the bytes

The "same key, different body" case is the one that separates a real implementation from a toy. The article's position — which I share for anything that moves money — is that a scoped key reused with a different canonical command should be a hard error, regardless of whether the first operation completed, failed, or is still running. Silently returning the first response when the client asked for something different is not idempotency, it is reinterpretation. A client that thinks it safely retried a €10 payment should never discover later that the server quietly kept a €100 one.

But you cannot compare raw bytes. {"amount":"10.00","currency":"EUR"} and {"currency":"EUR","amount":"10.00"} are the same command; field order and whitespace must not matter. The rule is: hash the validated command, not the HTTP body. Parse into a DTO, normalise the values your API treats as equivalent, drop transport-only metadata, then hash canonically.

public static string CanonicalHash(CreatePaymentCommand cmd)
{
    // Project to a stable, ordered shape. Normalise money and enums.
    var canonical = new SortedDictionary<string, string?>
    {
        ["operation"]          = "create_payment",
        ["accountId"]          = cmd.AccountId,
        ["amount"]             = decimal.Parse(cmd.Amount).ToString("0.00", CultureInfo.InvariantCulture),
        ["currency"]           = cmd.Currency.ToUpperInvariant(),
        ["merchantReference"]  = cmd.MerchantReference,
        ["apiVersion"]         = cmd.ApiVersion
    };

    var json = JsonSerializer.Serialize(canonical);
    var bytes = SHA256.HashData(Encoding.UTF8.GetBytes(json));
    return Convert.ToHexString(bytes);
}

Note what is excluded: the Authorization header, the idempotency key itself, and anything that only shapes the response. Note what needs a decision: server-side default fields (if channel defaults to "web", are a request with and without it the same command?) and unknown fields you currently ignore but might make meaningful after a deploy. The hash is a contract. If you change how it is computed, yesterday's legitimate retries start looking like conflicts.

When the insert loses the race, you load the existing row and branch on it:

var rec = await _db.IdempotencyRecords.FindAsync(tenantId, "create_payment", key);

if (rec!.RequestHash != requestHash)
    return Results.Json(
        new { errorCode = "IDEMPOTENCY_KEY_REUSED_WITH_DIFFERENT_REQUEST" },
        statusCode: 409);

return rec.Status switch
{
    IdempotencyStatus.Completed
        => Results.Json(rec.ResponseBody, statusCode: rec.ResponseStatus!.Value),

    IdempotencyStatus.InProgress when rec.LockedUntil > now
        => Results.Json(new { status = "processing" }, statusCode: 409),  // + Retry-After

    IdempotencyStatus.InProgress  // stale lock
        => await RecoverOwnership(rec),

    IdempotencyStatus.UnknownRequiresRecovery
        => await TriggerReconciliation(rec),

    _ => Results.StatusCode(500)
};

409 Conflict is a defensible default for the mismatch, because the request conflicts with the server's remembered meaning of that scoped key. Some teams prefer 422. What matters is a stable, machine-readable error code and no silent replay for a different command.

Step three: the provider timeout is where your guarantee ends

This is the failure mode that the cheap implementations never consider, and the one that actually costs money. The sequence is mundane: you insert InProgress, create local payment pay_789, call the downstream provider, the provider accepts the charge — and then your process times out, crashes, or loses the response. The client retries with the same key. Your database cannot infer whether money moved.

The fix is twofold. First, do not hold a database transaction open across the provider call; commit your local InProgress state and a stable downstream identity before you make the call. Second, give the downstream call its own idempotency key derived from your stable resource id, not from the client's key:

// Local, in ONE transaction:
await using var tx = await _db.Database.BeginTransactionAsync();
var payment = new Payment { Id = "pay_789", Status = "PENDING" };
_db.Payments.Add(payment);
_db.OutboxEvents.Add(OutboxEvent.PaymentCreated(payment.Id));
rec.ResourceId = payment.Id;
await _db.SaveChangesAsync();
await tx.CommitAsync();

// Outside the transaction, with a derived, stable provider key:
var providerKey = $"provider_payment_{payment.Id}";
var result = await _provider.ChargeAsync(payment, providerKey);

Because the provider key is provider_payment_pay_789 and not abc-123, a recovery worker can later query the provider by that key to find out whether the charge went through, instead of blindly charging again. The retry logic becomes: if the record is Completed, replay; if it is a fresh InProgress, return 202 or 409 with Retry-After; if it is a stale InProgress, acquire recovery ownership atomically, query the provider, and move the record to Completed or UnknownRequiresRecovery. If the provider has neither an idempotency key nor a query API, you have an operational gap — you can choose to accept it, but be honest that your local table is not protecting the external effect.

Your queue consumer has the exact same bug

HTTP gets the attention because the header is visible, but most duplicate side effects I have chased lived in consumers: outbox publishers, notification workers, ledger writers. The broker promising "exactly-once delivery" does not give you exactly-once business effect. That comes from durable operation ids and unique constraints, the same as on the write path.

// Consumer side: dedupe on a business key, write behind a unique constraint.
modelBuilder.Entity<LedgerEntry>()
    .HasIndex(e => new { e.EntryType, e.SourcePaymentId })
    .IsUnique();

public async Task Handle(PaymentCreated evt)
{
    var entry = new LedgerEntry
    {
        EntryType = "payment_received",
        SourcePaymentId = evt.PaymentId,
        Amount = evt.Amount
    };
    _db.LedgerEntries.Add(entry);
    try { await _db.SaveChangesAsync(); }
    catch (DbUpdateException ex) when (IsUniqueViolation(ex))
    {
        // Already processed. This is success, not failure.
    }
}

And mind the ordering trap: if you mark a message processed before sending the receipt email and then crash, the retry skips the email forever. If you send the email first and then crash, the retry sends it twice. The reliable shape is to make the side effect durable before triggering it — insert an email row with a unique key, then let a separate sender process it.

When not to build all this

The cost is not the header; it is the durable memory and recovery behaviour behind it. Do not build a payment-grade layer for an admin action whose duplicate is harmless and visible. For many operations a business key beats a random idempotency key entirely:

modelBuilder.Entity<Payment>()
    .HasIndex(p => new { p.AccountId, p.MerchantReference })
    .IsUnique();

If the rule is genuinely "one payment per merchant reference per account", that constraint catches duplicates even when a buggy client retries with a fresh random key. And sometimes the best fix is to reshape the operation into a naturally idempotent PUT /accounts/{id}/settings/default-currency, where repeating the request just leaves the setting where it was.

The takeaway

The easy version of idempotency remembers that a key was seen. The useful version remembers what the key meant: the scoped operation, the canonical command, the execution state, the resulting resource, the expiry window, and enough failure state to avoid turning uncertainty into a duplicate charge. The second request might be a retry, a different operation wearing the same key, a race against the first, or an arrival after the provider succeeded but your process did not. The server's job is to prove which case it is — to replay, reject the mismatch, or recover, rather than guess. In .NET that proof is a unique constraint, an insert-first ownership step, a canonical command hash, and a downstream key you control. Everything else is the cache that survives the demo and nothing more.

Source: "Idempotency Is Easy Until the Second Request Is Different" by Dochia — Hacker News discussion. Code examples are my own.

Idempotency Is Easy Until the Second Request Is Different: A .NET Field Guide

Idempotency Is Easy Until the Second Request Is Different

The replay cache that everyone ships first

Step one: an atomic insert decides who owns execution

Step two: hash the command, not the bytes

Step three: the provider timeout is where your guarantee ends

Your queue consumer has the exact same bug

When not to build all this

The takeaway

Related Articles

Debian Is Mandating Reproducible Builds. What Does That Mean for .NET?

Idempotency Keys Are Easy — Until the Second Request Is Different

Entity Framework Core Performance: Optimizing Queries Without Compromise

Want to stay updated?