Log Aggregation with Grafana Loki: A Lightweight ELK Alternative

How Grafana Loki makes log aggregation simple and scalable without the overhead of Elasticsearch.

Jean-Pierre Broeders

Freelance DevOps Engineer

March 4, 20266 min. read

Log Aggregation with Grafana Loki

Most teams start with kubectl logs or docker logs and think: this is enough. Until something breaks on a Friday afternoon and nobody can figure out which service triggered the cascade of errors. Scrolling through logs across twenty containers via SSH isn't a strategy — it's panic mode.

Why not just use ELK?

Elasticsearch, Logstash, and Kibana. The classic stack. Works fine, but the operational overhead is significant. Elasticsearch is memory-hungry. Clusters need managing. Index lifecycle management becomes a job in itself. For small to mid-sized teams, that's overkill.

Grafana Loki takes a fundamentally different approach. Where Elasticsearch indexes every line of text, Loki only indexes metadata — labels like app=payment-service or env=production. The actual log lines get stored as compressed chunks in cheap object storage. Massive savings in resources.

Architecture overview

A typical Loki setup has three components:

  • Promtail — agent that collects and ships logs to Loki
  • Loki — the log aggregation engine
  • Grafana — visualization and querying

The nice part: if Grafana is already running for metrics (Prometheus), no additional UI needs to be deployed. Logs and metrics live in the same interface.

Docker Compose setup

version: "3.8"

services:
  loki:
    image: grafana/loki:2.9.4
    ports:
      - "3100:3100"
    volumes:
      - ./loki-config.yaml:/etc/loki/local-config.yaml
      - loki-data:/loki
    command: -config.file=/etc/loki/local-config.yaml

  promtail:
    image: grafana/promtail:2.9.4
    volumes:
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - ./promtail-config.yaml:/etc/promtail/config.yaml
    command: -config.file=/etc/promtail/config.yaml

  grafana:
    image: grafana/grafana:10.3.1
    ports:
      - "3000:3000"
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
    volumes:
      - grafana-data:/var/lib/grafana

volumes:
  loki-data:
  grafana-data:

Promtail configuration

Promtail needs to know where logs live and which labels to attach. A configuration for Docker containers:

server:
  http_listen_port: 9080

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  - job_name: docker
    static_configs:
      - targets:
          - localhost
        labels:
          job: docker
          __path__: /var/lib/docker/containers/*/*-json.log
    pipeline_stages:
      - docker: {}
      - match:
          selector: '{job="docker"}'
          stages:
            - json:
                expressions:
                  output: log
                  stream: stream
            - output:
                source: output

Those pipeline_stages matter. Docker writes logs as JSON with metadata. The pipeline strips that away and extracts the actual log line.

Structured logging makes the difference

Loki becomes truly powerful when combined with structured logging from the application. Say a .NET service logs this:

{
  "timestamp": "2026-03-04T08:23:11Z",
  "level": "error",
  "message": "Payment processing failed",
  "orderId": "ORD-88412",
  "provider": "stripe",
  "duration_ms": 3201,
  "error": "timeout"
}

With a Promtail pipeline that extracts JSON fields, filtering becomes trivial:

{app="payment-service"} | json | provider="stripe" | level="error"

Compare that to grepping through unstructured text logs. Night and day.

LogQL: the query language

LogQL resembles PromQL (for those familiar with Prometheus) but for logs. Some practical queries:

# All errors from the last 5 minutes
{env="production"} |= "error" | line_format "{{.message}}"

# Error rate per service
sum(rate({env="production"} |= "error" [5m])) by (app)

# Top 10 slowest requests
{app="api-gateway"} | json | duration_ms > 1000 | sort_desc | limit 10

That rate() function is incredibly useful. It creates log-based metrics without separate instrumentation. An alert on "more than 50 errors per minute" is a single query.

Alerting on logs

Grafana supports alerts directly on LogQL queries:

groups:
  - name: log-alerts
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate({env="production"} |= "error" [5m])) by (app) > 0.5
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High error rate in {{ $labels.app }}"

More than 0.5 errors per second for two minutes straight? Alert fires. Simple, effective, no separate tooling required.

Cost and scaling

This is where the real win shows. A concrete comparison for a setup handling ~50GB of logs per day:

| | ELK | Loki | |---|---|---| | RAM | 32-64 GB | 4-8 GB | | Storage | SSD required | S3/MinIO (cheap) | | Operational | Cluster management | Almost nothing | | Cost (cloud) | €400-800/month | €50-100/month |

These aren't marketing numbers. In production environments with dozens of microservices, this is the difference between "we need to cut back on logging" and "we just log everything."

Pitfalls

Not everything is perfect. A few things to watch out for:

Avoid high cardinality labels. Setting user_id or request_id as a Loki label is a recipe for trouble. Those values belong in the log line itself, not in the index. Loki doesn't always warn about this, but performance will collapse.

Set retention policies. By default, Loki keeps everything. For production, a 30-90 day retention is usually sufficient:

table_manager:
  retention_deletes_enabled: true
  retention_period: 720h  # 30 days

Promtail vs Alloy. Grafana is pushing towards Alloy (their new collector). Promtail still works fine, but for new setups Alloy is worth considering — it combines metrics and logs collection in a single agent.

Wrapping up

Log aggregation doesn't have to be a massive project. Loki with Promtail and Grafana runs within an hour, costs a fraction of ELK, and scales effortlessly. Combined with structured logging and LogQL, debugging distributed systems becomes drastically faster. No more excuses for not having centralized logs.

Want to stay updated?

Subscribe to my newsletter or get in touch for freelance projects.

Get in Touch