Log Aggregation with Grafana Loki: A Lightweight ELK Alternative
How Grafana Loki makes log aggregation simple and scalable without the overhead of Elasticsearch.
Jean-Pierre Broeders
Freelance DevOps Engineer
Log Aggregation with Grafana Loki
Most teams start with kubectl logs or docker logs and think: this is enough. Until something breaks on a Friday afternoon and nobody can figure out which service triggered the cascade of errors. Scrolling through logs across twenty containers via SSH isn't a strategy — it's panic mode.
Why not just use ELK?
Elasticsearch, Logstash, and Kibana. The classic stack. Works fine, but the operational overhead is significant. Elasticsearch is memory-hungry. Clusters need managing. Index lifecycle management becomes a job in itself. For small to mid-sized teams, that's overkill.
Grafana Loki takes a fundamentally different approach. Where Elasticsearch indexes every line of text, Loki only indexes metadata — labels like app=payment-service or env=production. The actual log lines get stored as compressed chunks in cheap object storage. Massive savings in resources.
Architecture overview
A typical Loki setup has three components:
- Promtail — agent that collects and ships logs to Loki
- Loki — the log aggregation engine
- Grafana — visualization and querying
The nice part: if Grafana is already running for metrics (Prometheus), no additional UI needs to be deployed. Logs and metrics live in the same interface.
Docker Compose setup
version: "3.8"
services:
loki:
image: grafana/loki:2.9.4
ports:
- "3100:3100"
volumes:
- ./loki-config.yaml:/etc/loki/local-config.yaml
- loki-data:/loki
command: -config.file=/etc/loki/local-config.yaml
promtail:
image: grafana/promtail:2.9.4
volumes:
- /var/log:/var/log:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- ./promtail-config.yaml:/etc/promtail/config.yaml
command: -config.file=/etc/promtail/config.yaml
grafana:
image: grafana/grafana:10.3.1
ports:
- "3000:3000"
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
volumes:
- grafana-data:/var/lib/grafana
volumes:
loki-data:
grafana-data:
Promtail configuration
Promtail needs to know where logs live and which labels to attach. A configuration for Docker containers:
server:
http_listen_port: 9080
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: docker
static_configs:
- targets:
- localhost
labels:
job: docker
__path__: /var/lib/docker/containers/*/*-json.log
pipeline_stages:
- docker: {}
- match:
selector: '{job="docker"}'
stages:
- json:
expressions:
output: log
stream: stream
- output:
source: output
Those pipeline_stages matter. Docker writes logs as JSON with metadata. The pipeline strips that away and extracts the actual log line.
Structured logging makes the difference
Loki becomes truly powerful when combined with structured logging from the application. Say a .NET service logs this:
{
"timestamp": "2026-03-04T08:23:11Z",
"level": "error",
"message": "Payment processing failed",
"orderId": "ORD-88412",
"provider": "stripe",
"duration_ms": 3201,
"error": "timeout"
}
With a Promtail pipeline that extracts JSON fields, filtering becomes trivial:
{app="payment-service"} | json | provider="stripe" | level="error"
Compare that to grepping through unstructured text logs. Night and day.
LogQL: the query language
LogQL resembles PromQL (for those familiar with Prometheus) but for logs. Some practical queries:
# All errors from the last 5 minutes
{env="production"} |= "error" | line_format "{{.message}}"
# Error rate per service
sum(rate({env="production"} |= "error" [5m])) by (app)
# Top 10 slowest requests
{app="api-gateway"} | json | duration_ms > 1000 | sort_desc | limit 10
That rate() function is incredibly useful. It creates log-based metrics without separate instrumentation. An alert on "more than 50 errors per minute" is a single query.
Alerting on logs
Grafana supports alerts directly on LogQL queries:
groups:
- name: log-alerts
rules:
- alert: HighErrorRate
expr: |
sum(rate({env="production"} |= "error" [5m])) by (app) > 0.5
for: 2m
labels:
severity: warning
annotations:
summary: "High error rate in {{ $labels.app }}"
More than 0.5 errors per second for two minutes straight? Alert fires. Simple, effective, no separate tooling required.
Cost and scaling
This is where the real win shows. A concrete comparison for a setup handling ~50GB of logs per day:
| | ELK | Loki | |---|---|---| | RAM | 32-64 GB | 4-8 GB | | Storage | SSD required | S3/MinIO (cheap) | | Operational | Cluster management | Almost nothing | | Cost (cloud) | €400-800/month | €50-100/month |
These aren't marketing numbers. In production environments with dozens of microservices, this is the difference between "we need to cut back on logging" and "we just log everything."
Pitfalls
Not everything is perfect. A few things to watch out for:
Avoid high cardinality labels. Setting user_id or request_id as a Loki label is a recipe for trouble. Those values belong in the log line itself, not in the index. Loki doesn't always warn about this, but performance will collapse.
Set retention policies. By default, Loki keeps everything. For production, a 30-90 day retention is usually sufficient:
table_manager:
retention_deletes_enabled: true
retention_period: 720h # 30 days
Promtail vs Alloy. Grafana is pushing towards Alloy (their new collector). Promtail still works fine, but for new setups Alloy is worth considering — it combines metrics and logs collection in a single agent.
Wrapping up
Log aggregation doesn't have to be a massive project. Loki with Promtail and Grafana runs within an hour, costs a fraction of ELK, and scales effortlessly. Combined with structured logging and LogQL, debugging distributed systems becomes drastically faster. No more excuses for not having centralized logs.
