Log Aggregation with Grafana Loki: A Lightweight ELK Alternative

Most teams start with kubectl logs or docker logs and think: this is enough. Until something breaks on a Friday afternoon and nobody can figure out which service triggered the cascade of errors. Scrolling through logs across twenty containers via SSH isn't a strategy — it's panic mode.

Why not just use ELK?

Elasticsearch, Logstash, and Kibana. The classic stack. Works fine, but the operational overhead is significant. Elasticsearch is memory-hungry. Clusters need managing. Index lifecycle management becomes a job in itself. For small to mid-sized teams, that's overkill.

Grafana Loki takes a fundamentally different approach. Where Elasticsearch indexes every line of text, Loki only indexes metadata — labels like app=payment-service or env=production. The actual log lines get stored as compressed chunks in cheap object storage. Massive savings in resources.

Architecture overview

A typical Loki setup has three components:

Promtail — agent that collects and ships logs to Loki
Loki — the log aggregation engine
Grafana — visualization and querying

The nice part: if Grafana is already running for metrics (Prometheus), no additional UI needs to be deployed. Logs and metrics live in the same interface.

Docker Compose setup

version: "3.8"

services:
  loki:
    image: grafana/loki:2.9.4
    ports:
      - "3100:3100"
    volumes:
      - ./loki-config.yaml:/etc/loki/local-config.yaml
      - loki-data:/loki
    command: -config.file=/etc/loki/local-config.yaml

  promtail:
    image: grafana/promtail:2.9.4
    volumes:
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - ./promtail-config.yaml:/etc/promtail/config.yaml
    command: -config.file=/etc/promtail/config.yaml

  grafana:
    image: grafana/grafana:10.3.1
    ports:
      - "3000:3000"
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
    volumes:
      - grafana-data:/var/lib/grafana

volumes:
  loki-data:
  grafana-data:

Promtail configuration

Promtail needs to know where logs live and which labels to attach. A configuration for Docker containers:

server:
  http_listen_port: 9080

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  - job_name: docker
    static_configs:
      - targets:
          - localhost
        labels:
          job: docker
          __path__: /var/lib/docker/containers/*/*-json.log
    pipeline_stages:
      - docker: {}
      - match:
          selector: '{job="docker"}'
          stages:
            - json:
                expressions:
                  output: log
                  stream: stream
            - output:
                source: output

Those pipeline_stages matter. Docker writes logs as JSON with metadata. The pipeline strips that away and extracts the actual log line.

Structured logging makes the difference

Loki becomes truly powerful when combined with structured logging from the application. Say a .NET service logs this:

{
  "timestamp": "2026-03-04T08:23:11Z",
  "level": "error",
  "message": "Payment processing failed",
  "orderId": "ORD-88412",
  "provider": "stripe",
  "duration_ms": 3201,
  "error": "timeout"
}

With a Promtail pipeline that extracts JSON fields, filtering becomes trivial:

{app="payment-service"} | json | provider="stripe" | level="error"

Compare that to grepping through unstructured text logs. Night and day.

LogQL: the query language

LogQL resembles PromQL (for those familiar with Prometheus) but for logs. Some practical queries:

# All errors from the last 5 minutes
{env="production"} |= "error" | line_format "{{.message}}"

# Error rate per service
sum(rate({env="production"} |= "error" [5m])) by (app)

# Top 10 slowest requests
{app="api-gateway"} | json | duration_ms > 1000 | sort_desc | limit 10

That rate() function is incredibly useful. It creates log-based metrics without separate instrumentation. An alert on "more than 50 errors per minute" is a single query.

Alerting on logs

Grafana supports alerts directly on LogQL queries:

groups:
  - name: log-alerts
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate({env="production"} |= "error" [5m])) by (app) > 0.5
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High error rate in {{ $labels.app }}"

More than 0.5 errors per second for two minutes straight? Alert fires. Simple, effective, no separate tooling required.

Cost and scaling

This is where the real win shows. A concrete comparison for a setup handling ~50GB of logs per day:

	ELK	Loki
RAM	32-64 GB	4-8 GB
Storage	SSD required	S3/MinIO (cheap)
Operational	Cluster management	Almost nothing
Cost (cloud)	€400-800/month	€50-100/month

These aren't marketing numbers. In production environments with dozens of microservices, this is the difference between "we need to cut back on logging" and "we just log everything."

Pitfalls

Not everything is perfect. A few things to watch out for:

Avoid high cardinality labels. Setting user_id or request_id as a Loki label is a recipe for trouble. Those values belong in the log line itself, not in the index. Loki doesn't always warn about this, but performance will collapse.

Set retention policies. By default, Loki keeps everything. For production, a 30-90 day retention is usually sufficient:

table_manager:
  retention_deletes_enabled: true
  retention_period: 720h  # 30 days

Promtail vs Alloy. Grafana is pushing towards Alloy (their new collector). Promtail still works fine, but for new setups Alloy is worth considering — it combines metrics and logs collection in a single agent.

Wrapping up

Log aggregation doesn't have to be a massive project. Loki with Promtail and Grafana runs within an hour, costs a fraction of ELK, and scales effortlessly. Combined with structured logging and LogQL, debugging distributed systems becomes drastically faster. No more excuses for not having centralized logs.

Log Aggregation with Grafana Loki: A Lightweight ELK Alternative

Why not just use ELK?

Architecture overview

Docker Compose setup

Promtail configuration

Structured logging makes the difference

LogQL: the query language

Alerting on logs

Cost and scaling

Pitfalls

Wrapping up

Further reading

Related Articles

GitHub Actions Caching: 3x Faster Pipelines Without Extra Costs

Monitoring on a Budget: Cost Control Without Blind Spots

Health Checks and Zero-Downtime Deployments with Docker Compose

Want to stay updated?