Architecture diagram showing ECS Fargate services sending logs via Fluent Bit to Grafana Loki and Grafana
Back to Blog

Implementing Centralized Logging with Grafana Loki in ECS Fargate

How we implemented centralized logging for four ECS Fargate microservices using Grafana Loki and Fluent Bit sidecars, cutting debugging time by 90% and reducing logging costs by 60%.

SP
Saurabh Parmar
Author
4 min read

Implementing Centralized Logging with Grafana Loki in ECS Fargate

When you run multiple microservices on ECS Fargate, log aggregation quickly becomes a core operational concern. In this guide, we walk through how we implemented centralized logging with Grafana Loki and Fluent Bit sidecars for a travel platform running four microservices.

The Problem

With several ECS services, logs naturally end up scattered across multiple CloudWatch log groups and streams. Developers were:

  • Jumping between log streams and services
  • Manually correlating timestamps and request flows
  • Lacking a single, searchable view of the system

We needed centralized logging with powerful query capabilities and low operational overhead.

Why We Chose Grafana Loki

We evaluated traditional ELK-style stacks and CloudWatch-only approaches, but settled on Grafana Loki because it offered:

  • Cost-effective storage: Loki indexes labels, not full log content, which is typically up to 10x cheaper than Elasticsearch for similar workloads.
  • Native Grafana integration: Metrics and logs in a single Grafana UI, with easy correlation between dashboards and log queries.
  • LogQL: A query language similar to PromQL, making it intuitive for teams already using Prometheus.
  • Horizontal scaling: Loki’s microservices architecture scales with your microservices footprint.

The result is a logging platform that’s both powerful and economical for ECS Fargate workloads.

High-Level Architecture

ECS Service → Fluent Bit Sidecar → Loki → Grafana
↓ ↓ ↓ ↓
App Logs Log Processing Storage Queries

Each ECS task runs a Fluent Bit sidecar (via FireLens) that ships logs to Loki. Grafana is used as the query and visualization layer.

Flow

  1. Application containers write logs to stdout/stderr.
  2. FireLens / Fluent Bit sidecar in the same task reads those logs.
  3. Fluent Bit enriches logs with ECS metadata and forwards them to Loki.
  4. Loki stores logs in object storage (S3) with indices based on labels.
  5. Grafana queries Loki using LogQL for troubleshooting, dashboards, and alerting.

Fluent Bit Sidecar with FireLens

We use AWS FireLens (ECS’s native log router) to run Fluent Bit as a sidecar in each task definition.

Task Definition Snippet

ContainerDefinitions:
Name: log-router
Image: amazon/aws-for-fluent-bit:latest
Essential: true
FirelensConfiguration:
Type: fluentbit
Options:
enable-ecs-log-metadata: 'true'
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: /ecs/firelens
awslogs-region: us-east-1
awslogs-stream-prefix: firelens

Key points:

  • enable-ecs-log-metadata: 'true' injects ECS metadata (task, cluster, container, etc.) into log records.
  • The log router itself logs to a dedicated CloudWatch group (/ecs/firelens) for troubleshooting the pipeline.

Your application containers then use the awsfirelens log driver so their logs are routed through this sidecar.

Fluent Bit Output Configuration for Loki

We configure Fluent Bit to send all matched logs to Loki:

Querying Logs with LogQL

LogQL is Loki's query language, similar to PromQL. Here are some practical queries for ECS workloads:

  • Filter by service: {service="api-gateway"} |= "error"
  • Rate of errors: rate({service=~".+"} |= "error" [5m])
JSON parsing: {service="order-service"}jsonlevel="error"
Latency tracking: {service="payment"}jsonduration_ms > 1000

Dashboard Setup

Create Grafana dashboards that combine metrics and logs:

  • Service health panel - show error rate with drill-down to logs
  • Request tracing - correlate request IDs across services
  • Log volume trends - monitor logging costs and anomalies

Alerting on Logs

Set up alerts in Grafana based on log patterns:

  • Error spike detection - alert when error rate exceeds threshold
  • Missing heartbeat - alert when a service stops logging
  • Specific error patterns - alert on "OutOfMemory" or "Connection refused"

Results

After implementing Grafana Loki for our four ECS microservices:

  • 90% reduction in time to debug production issues
  • 60% cost savings compared to CloudWatch Logs Insights
  • Single pane of glass for metrics and logs in Grafana
  • Proactive alerting catching issues before users report them

Key Takeaways

Grafana Loki with Fluent Bit sidecars is an excellent choice for ECS Fargate logging. The combination of low storage costs, powerful LogQL queries, and native Grafana integration makes it ideal for microservices architectures. Start with basic log aggregation and progressively add dashboards and alerts as your observability needs grow.