That Friday Afternoon When Our Stripe Webhooks Stopped
An honest story about a webhook integration failing at the worst possible moment, and what I learned about debugging webhooks the hard way.
Jean-Pierre Broeders
Freelance .NET & DevOps Engineer
4:32 PM, Friday afternoon
My phone rang. It was Marco, the CTO of an e-commerce client where I'd been working on their payment infrastructure for a few months. Not the call you want on a Friday afternoon.
"Jean-Pierre, something's wrong. Customers are paying, but their orders stay on 'pending'. We've got six of them now."
Six orders. Real people who transferred real money, getting nothing back. No confirmation email, no status update, nothing. The webshop dashboard showed the orders as if payment never happened.
My first move: check Stripe. If payments are landing at Stripe but the webhooks aren't reaching our application, we know where the problem is.
I opened the Stripe dashboard, and sure enough, all payments showed succeeded. The money was there. But the webhook events told a different story: a wall of red HTTP 400 responses. Every webhook Stripe sent us was being rejected.
Down the rabbit hole
My first instinct was to check the logs. I SSH'd into the production server and started scrolling through the application logs. Nothing. Not a single webhook request to be found. That was strange. A 400 response means our application did receive the request, but rejected it. So why weren't we logging that?
I checked the webhook controller:
[ApiController]
[Route("api/webhooks/stripe")]
public class StripeWebhookController : ControllerBase
{
private readonly string _webhookSecret;
public StripeWebhookController(IConfiguration configuration)
{
_webhookSecret = configuration["Stripe:WebhookSecret"]!;
}
[HttpPost]
public async Task<IActionResult> HandleWebhook()
{
var json = await new StreamReader(HttpContext.Request.Body).ReadToEndAsync();
try
{
var stripeEvent = EventUtility.ConstructEvent(
json,
Request.Headers["Stripe-Signature"],
_webhookSecret
);
// Handle the event...
await ProcessEvent(stripeEvent);
return Ok();
}
catch (StripeException)
{
return BadRequest();
}
}
}
See it? That bare catch (StripeException) that just returns a BadRequest() without logging anything. That was my code. From three months ago. I wrote it thinking "this is never going to fail". Yeah, right.
But fine, now we knew the signature validation was failing. The question was: why?
The signature puzzle
Stripe webhooks use HMAC-SHA256 signatures to verify that a request genuinely comes from Stripe. You get a whsec_ secret when you configure a webhook endpoint, and every request includes a Stripe-Signature header that you validate against it.
The problem became obvious once I figured it out: earlier that day, a colleague had run a deployment. Not of the application itself, but of the infrastructure-as-code. During that process, Terraform recreated the Stripe webhook endpoint, and Terraform generated a new webhook signing secret in the process.
The new secret was neatly stored in the Terraform state. But nobody had updated the application configuration. Our .NET application was still using the old whsec_ value from the environment variables.
Every webhook that came in was properly signed by Stripe with the new secret, and our application tried to validate it with the old one. Mismatch. 400. Silent death.
The quick fix
The fix was straightforward. Update the environment variable with the new secret and restart the app:
# Update the webhook secret in Azure App Service
az webapp config appsettings set `
--resource-group rg-ecommerce-prod `
--name app-ecommerce-api `
--settings Stripe__WebhookSecret="whsec_newsecretvalue"
# Restart the application
az webapp restart `
--resource-group rg-ecommerce-prod `
--name app-ecommerce-api
Within two minutes, webhooks were flowing again. Stripe automatically retries failed webhooks, so most of the "lost" events were delivered after all. For the six orders that had been waiting too long, I wrote a quick PowerShell script to manually reconcile the payments:
# Fetch all recent succeeded payments that weren't processed
$apiKey = $env:STRIPE_SECRET_KEY
$headers = @{ "Authorization" = "Bearer $apiKey" }
$payments = Invoke-RestMethod `
-Uri "https://api.stripe.com/v1/payment_intents?limit=20" `
-Headers $headers
foreach ($payment in $payments.data) {
if ($payment.status -eq "succeeded") {
$orderId = $payment.metadata.order_id
# Check if the order was already processed in our system
$order = Invoke-RestMethod `
-Uri "https://api.ecommerce-client.nl/api/orders/$orderId" `
-Headers @{ "X-Api-Key" = $env:INTERNAL_API_KEY }
if ($order.status -eq "pending") {
Write-Host "Order $orderId: paid but not processed. Fixing..."
Invoke-RestMethod `
-Method Put `
-Uri "https://api.ecommerce-client.nl/api/orders/$orderId/confirm" `
-Headers @{ "X-Api-Key" = $env:INTERNAL_API_KEY } `
-Body (@{ paymentIntentId = $payment.id } | ConvertTo-Json) `
-ContentType "application/json"
Write-Host "Order $orderId: fixed!" -ForegroundColor Green
}
}
}
By 5:45 PM everything was resolved. The six customers got their confirmation emails, and Marco could start his weekend. I needed a beer.
What actually went wrong
Looking back, multiple things were off:
1. No logging on signature failures. This is the most obvious one. If we had logged a warning on every StripeException, we would have found the problem in minutes instead of after an hour of digging. The improved version:
catch (StripeException ex)
{
_logger.LogWarning(
"Stripe webhook signature validation failed: {Message}. " +
"Check if the webhook signing secret is up to date.",
ex.Message
);
return BadRequest();
}
2. No monitoring on webhook failures. We should have had alerts on HTTP 400 responses at the webhook endpoint. If we had, we would have been notified before customers noticed anything.
3. Infrastructure and application config weren't coupled. The Terraform deployment should have automatically pushed the new secret to the App Service. Manual steps in a deployment pipeline are ticking time bombs.
4. No way to inspect webhooks. This is what frustrated me the most. I couldn't see what Stripe was sending us. I couldn't look at the body, inspect the headers, or manually validate the signature. I was flying blind.
The ngrok dance
Before pushing the fix to production, I wanted to test it locally. Sounds reasonable, but with webhooks that's easier said than done.
My standard workflow was: start ngrok, copy the temporary URL, paste it into the Stripe dashboard as a test endpoint, make a test payment, see if the webhook comes through, debug, then revert everything. Every single time.
It works. In theory. In practice, you forget to update the URL after an ngrok restart. Or ngrok gives you a different port. Or your free-tier session expires halfway through debugging. Or your colleague also starts ngrok and now you're both pointing at different local instances. It's a workflow that's just good enough to use, and just bad enough to slowly drive you insane.
That Friday afternoon, I didn't feel like doing the ngrok dance. I had the fix in my code, I was fairly certain it would work, but "fairly certain" isn't the answer a CTO wants to hear when it's about payments. So I did what every developer does when they're pressed for time: pushed the fix to a staging environment, manually triggered a webhook via the Stripe CLI, and hoped for the best.
It worked. But it didn't feel like a professional way to test software.
The tooling I wished I had
After this incident, I started thinking about what I actually needed. Logging is great, but it's reactive. I wanted something I could work with proactively:
- A persistent endpoint I can always check, without spinning up ngrok every time
- Complete request history, including headers, body and query parameters
- The ability to replay webhooks to my local machine, without waiting for Stripe to retry
- Editable request bodies before replay, so I can test edge cases without triggering new events
This is ultimately why I built WebhookVault. Not because I wanted to launch a product, but because I kept running into this exact problem. At every client, with every webhook integration.
With WebhookVault, I could have simply opened the dashboard during this incident, looked at the incoming requests, and immediately seen that signature validation was failing. I could have replayed a request to my local development environment to test the fix, without waiting for Stripe retries. The difference between an hour of debugging and five minutes.
Lessons for your own webhook integrations
If you work with webhooks (and if you're building a modern application, you probably do), here's what I'd tell you:
Log everything, including failures. A bare catch block that returns a 400 is a recipe for late-night phone calls. Log it, with context.
Monitor your webhook endpoints. Set up alerts on response codes. A sudden spike in 400s or 500s is almost always a problem you want to know about before your customers do.
Test with real payloads. Not with hand-crafted JSON that "kind of looks right". Use tooling that lets you capture and replay real webhook requests. It will save you hours of debugging.
Couple your infrastructure to your application config. If a deployment rotates a secret, that secret needs to reach your application automatically. No manual steps.
Have a reconciliation strategy. Webhooks can fail. That's a fact. Make sure you have a way to detect and process missed events. A PowerShell script you run manually is better than nothing at all.
Wrapping up
Webhook integrations look simple. You set up an endpoint, parse some JSON, do something with the data. But in production, with real customers and real money, every assumption you've made gets tested. And usually on a Friday afternoon.
Next time you write a webhook endpoint, think about this story for a second. And test your signature validation. For real.
Ever had a webhook nightmare of your own? I'd love to hear about it. Get in touch. And if you want to debug webhook integrations without losing your mind, check out WebhookVault.
