Debugging Azure Functions with Application Insights

Serverless is great. Until something breaks and nobody can figure out where the problem is. No server logs to dig through, no IIS to quickly inspect. Azure Functions run somewhere in the cloud, and when they fail, finding the cause is like searching for a needle in a haystack — unless Application Insights is set up properly.

The basics: more than just flipping a switch

Most tutorials stop at "enable Application Insights in the portal." That's like buying a smoke detector and leaving it in the drawer. Default telemetry catches some things, but the real value comes from custom configuration.

The logging section in host.json controls how much data actually flows to Application Insights:

{
  "version": "2.0",
  "logging": {
    "applicationInsights": {
      "samplingSettings": {
        "isEnabled": true,
        "maxTelemetryItemsPerSecond": 20,
        "excludedTypes": "Request"
      }
    },
    "logLevel": {
      "default": "Information",
      "Host.Results": "Error",
      "Function": "Information",
      "Host.Aggregator": "Trace"
    }
  }
}

That excludedTypes: "Request" setting matters. Without it, requests get sampled and some invocations disappear from telemetry entirely. When debugging sporadic failures, those are exactly the data points that are needed.

KQL queries that actually help

The Log Analytics workspace behind Application Insights uses Kusto Query Language. Not SQL, but the learning curve is gentle. A few queries that come up regularly:

All failed function executions in the last 24 hours:

requests
| where timestamp > ago(24h)
| where success == false
| summarize count() by cloud_RoleName, resultCode
| order by count_ desc

Spotting slow executions (above 5 seconds):

requests
| where timestamp > ago(7d)
| where duration > 5000
| project timestamp, name, duration, resultCode
| order by duration desc
| take 50

Dependency failures — when a downstream service gives up:

dependencies
| where timestamp > ago(1h)
| where success == false
| summarize failCount=count() by target, type, resultCode
| order by failCount desc

That last one is gold. An Azure Function calling an API or database might run fine on its own, but when the dependency goes down, it's the dependency logs that tell the story. Not the function itself.

Adding custom telemetry

Default metrics cover about 80% of cases. For the remaining 20%, TelemetryClient fills the gap:

public class OrderProcessor
{
    private readonly TelemetryClient _telemetry;

    public OrderProcessor(TelemetryClient telemetry)
    {
        _telemetry = telemetry;
    }

    [Function("ProcessOrder")]
    public async Task Run(
        [QueueTrigger("orders")] OrderMessage order)
    {
        var stopwatch = Stopwatch.StartNew();

        try
        {
            await ProcessAsync(order);

            _telemetry.TrackMetric("OrderProcessingMs",
                stopwatch.ElapsedMilliseconds);

            _telemetry.TrackEvent("OrderProcessed", new Dictionary<string, string>
            {
                ["OrderId"] = order.Id,
                ["ProductCount"] = order.Items.Count.ToString()
            });
        }
        catch (Exception ex)
        {
            _telemetry.TrackException(ex, new Dictionary<string, string>
            {
                ["OrderId"] = order.Id,
                ["Stage"] = "Processing"
            });
            throw;
        }
    }
}

Those custom properties on TrackException make the difference between "something went wrong" and "order 12345 failed during the processing stage." In a production environment handling hundreds of invocations per minute, that saves hours of guesswork.

Alerting: not everything is equally urgent

A common mistake is setting up an alert for every exception. Within a week, the inbox is flooded and all alerts get ignored. More effective: layered alerts.

Level	Condition	Action
Critical	Function failure rate > 25% in 5 min	SMS + PagerDuty
Warning	P95 latency > 10s in 15 min	Teams/Slack notification
Info	Daily failure summary	Email digest

In the Azure Portal this goes through Alerts → New Alert Rule, but the ARM template approach is reusable and version-controllable:

{
  "type": "Microsoft.Insights/metricAlerts",
  "apiVersion": "2018-03-01",
  "properties": {
    "severity": 1,
    "evaluationFrequency": "PT5M",
    "windowSize": "PT5M",
    "criteria": {
      "odata.type": "Microsoft.Azure.Monitor.SingleResourceMultipleMetricCriteria",
      "allOf": [
        {
          "name": "HighFailureRate",
          "metricName": "Http5xx",
          "operator": "GreaterThan",
          "threshold": 10,
          "timeAggregation": "Total"
        }
      ]
    }
  }
}

Live Metrics Stream for real-time debugging

Some problems only show up during peak hours. Live Metrics Stream displays what's happening in real time: incoming requests, failures, dependency calls, everything with sub-second delay.

It doesn't replace KQL queries for post-mortem analysis, but for monitoring a deployment in real time or tracking down an active issue, it's indispensable. One tip: don't leave it open all day. Live Metrics adds extra resource consumption to the function app.

Distributed tracing across multiple functions

A queue-triggered function that makes an HTTP call to another function — without tracing, that becomes chaos. Application Insights automatically assigns an operation_id to related telemetry, but only when the Activity context propagates correctly.

requests
| where operation_Id == "abc123"
| union dependencies
| where operation_Id == "abc123"
| order by timestamp asc
| project timestamp, itemType, name, duration, success

This query shows the entire chain: from the first trigger to the last dependency call. Useful for pinpointing exactly where latency hides.

Keeping costs under control

Application Insights charges per GB of ingested data. With high-throughput functions, costs add up fast. Sampling reduces costs but also reduces visibility. Finding a balance is necessary.

Rule of thumb: sample everything except failures. Error scenarios should always be captured completely. Successful requests can be sampled — the patterns remain visible in aggregated data anyway.

The difference between a serverless setup that works and one that inspires confidence? Monitoring. Not as an afterthought when things go wrong, but from day one as part of the architecture.

Debugging Azure Functions with Application Insights: From Blackbox to Crystal Clear

Debugging Azure Functions with Application Insights

The basics: more than just flipping a switch

KQL queries that actually help

Adding custom telemetry

Alerting: not everything is equally urgent

Live Metrics Stream for real-time debugging

Distributed tracing across multiple functions

Keeping costs under control

Related Articles

Monitoring on a Budget: Cost Control Without Blind Spots

Bindings and Triggers in Azure Functions: Less Code, More Integration

Fighting Alert Fatigue: Effective Alerting Strategies for DevOps Teams

Want to stay updated?