Running n8n Queue Mode on ECS With Redis
Single-process n8n is fine for small teams and low-volume workflows. It becomes fragile when long-running jobs, bursty webhooks, and security enrichment tasks all compete inside the same process. Queue mode gives you a cleaner scaling model by separating workflow intake from execution.
On ECS, that usually means one service for the main n8n process, one or more worker services, PostgreSQL for state, and Redis as the queue backend.
Context
Problem: A single n8n process can become a bottleneck when security workflows mix webhook intake, enrichment, and downstream actions. Approach: Enable queue mode, run workers as separate ECS services, and scale workers based on workload signals. Outcome: Bursty automation becomes more reliable and easier to operate.
Architecture pattern
A production queue-mode deployment usually has:
n8n-mainfor the editor, API, scheduling, and webhook registration.n8n-workerfor workflow execution.- Optional dedicated webhook processors for high-volume ingress.
- RDS PostgreSQL shared by the main and worker tasks.
- ElastiCache Redis used by n8n’s queue backend.
The main service should not be scaled casually unless the n8n edition and configuration support the topology. Workers are the safer first scaling target.
ECS service shape
Workers can share the same container image as the main process but use different commands and environment values.
1
2
3
4
5
6
7
8
9
n8n-main
- desired_count: 1 or approved multi-main count
- command: n8n start
- load_balancer: yes
n8n-worker
- desired_count: 2+
- command: n8n worker
- load_balancer: no
Common environment values include EXECUTIONS_MODE=queue, Redis connection settings, PostgreSQL settings, and the same N8N_ENCRYPTION_KEY across all n8n tasks.
Scaling signals
Scale on signals that reflect real work:
- Redis waiting jobs.
- Worker CPU and memory.
- Execution duration.
- Failed executions by workflow.
- Age of oldest queued job.
- Downstream API rate limit errors.
CPU alone can mislead you. A worker may be idle because a downstream service is throttling, or CPU-heavy because one workflow performs expensive parsing. Tie scaling back to queue health.
Failure modes
- Redis unavailable, causing executions to stall.
- Main and worker tasks using different encryption keys.
- Workers deployed before database migrations complete.
- Too many workers overwhelming EDR, SIEM, or ticketing APIs.
- Webhook bursts filling the queue faster than workers can drain it.
Use deployment ordering, health checks, and workload-specific concurrency controls to avoid turning horizontal scaling into downstream pressure.
Takeaways
Queue mode makes n8n on ECS feel like a real automation platform. The win is not just more workers; it is separating intake, execution, and scaling decisions so each can fail and recover more predictably.