In the name of Allah, the Most Gracious, the Most Merciful
Your checkout flow takes 12 seconds. Users abandon carts. You profile the code and discover the actual payment processing takes 800 milliseconds.
So where do the other 11 seconds go?
- Sending confirmation email: 2 seconds
- Generating PDF receipt: 3 seconds
- Updating inventory in 3 systems: 2 seconds
- Notifying analytics: 1 second
- Sending webhook to partner: 2 seconds
- Logging to audit system: 1 second
You stare at the code. The user is waiting... for an email they'll read in 10 minutes. They're waiting for analytics they'll never see. They're waiting for a PDF they might never download.
Why?
- Don't make users wait for things they don't need — the biggest latency wins come from moving work to background
- Message queues are the bridge — they decouple "request" from "processing"
- Plan for failure — workers crash, messages get redelivered, idempotency is essential
Want the full story? Keep reading.
This post is for you if:
- Your API responses are slow because they do too much work inline
- Users are waiting for operations they don't care about
- You want to understand message queues without the enterprise jargon
- You're building for scale and need to decouple services
The Synchronous Trap
Most developers write code that does things sequentially because that's how we think. Step 1, then step 2, then step 3.
The user waits 12.8 seconds, but only cares about the 0.8s payment result. Everything else can happen after they see "Success!"
Imagine a restaurant where the waiter takes your order, walks to the kitchen, watches the chef cook, waits for the food, brings it to you, then takes the next order. Insane, right? But that's exactly how synchronous code works.
Humans process instructions sequentially. "Do A, then B, then C" maps directly to code.
Line 10 runs before line 11. Stack traces make sense. No race conditions.
For small scale, sync code works fine. The problems appear at scale.
But the user only cares about step 2 (payment) and the final "Success!" message. Everything else can happen after they see the success screen.
The question that changes everything: What MUST happen now vs what can happen later?
What Must Be Synchronous vs What Can Be Async?
This is the key insight. Not everything needs to block the user.
User needs the result to continue
User doesn't need to wait for this
If the user doesn't need to see the result RIGHT NOW, it can be async.
The user needs to know the upload succeeded. But thumbnail generation, search indexing, and other processing can happen in the background. Show a placeholder until thumbnails are ready.
Thumbnail generation and search indexing don't block the user's next action. Only saving the original and confirming it worked needs to be synchronous. The rest can happen in background workers.
Message Queues: The Bridge Between "Request" and "Processing"
A message queue is a buffer between the code that creates work and the code that does the work. Think of it as a to-do list that multiple workers can pull from.
The producer adds messages to the queue and immediately returns. Workers pull messages and process them whenever they're ready.
Imagine a busy restaurant. The waiter doesn't cook the food — they write the order on a ticket and clip it to the order wheel. The kitchen picks up tickets and cooks them. The waiter is free to take more orders immediately.
That ticket wheel? That's a message queue. It decouples "taking orders" from "making food."
Popular queue technologies:
The Producer-Consumer Pattern
The beauty of message queues is separation of concerns. Your web request code becomes simple:
def checkout(cart):
charge_payment(cart) # 800ms
send_email(cart) # 2000ms
generate_pdf(cart) # 3000ms
update_inventory(cart) # 2000ms
notify_analytics(cart) # 1000ms
send_webhook(cart) # 2000ms
log_audit(cart) # 1000ms
return "Success" # 12.8s later
def checkout(cart):
charge_payment(cart) # 800ms
order = create_order(cart) # 100ms
# Fire and forget - user doesn't wait
queue.send({
"type": "order_completed",
"order_id": order.id
})
return "Success" # 0.9s
# Worker process (runs separately from web server)
def process_message(message):
if message["type"] == "order_completed":
send_email(message["order_id"])
generate_pdf(message["order_id"])
update_inventory(message["order_id"])
# ... etc
User sees "Success!" in 0.9 seconds instead of 12.8 seconds. A 93% reduction in perceived latency. Same work gets done, but the user doesn't wait for it.
Async processing pairs beautifully with caching. You can use background workers to warm caches, regenerate expired data, and keep frequently-accessed content fresh — all without blocking user requests.
Your async workers still need database connections. With many workers processing in parallel, connection pools can exhaust quickly. Configure worker concurrency based on your connection limits, or you'll trade API timeouts for database connection errors.
What If Workers Crash?
Moving work to background is great, but now we have new problems. What happens when things go wrong?
Email was half-sent, then server died.
User gets two confirmation emails.
Messages pile up, memory exhausted.
Bad data, infinite retry loop.
A Dead Letter Queue is where messages go to die — but in a controlled way. Instead of retrying forever, failed messages are moved aside for human review.
This connects to failure handling patterns — retries with backoff, circuit breakers, and fallbacks all apply to async processing too.
Idempotency: Safe to Process Twice
Network issues, crashes, and retries mean the same message might be processed multiple times. Your code must handle this gracefully.
abc123"
The simplest approach: store processed message IDs and check before processing.
# Using Redis for idempotency tracking
def process_order_email(message):
idempotency_key = f"email:{message['order_id']}"
# Check if already processed
if redis.get(idempotency_key):
print(f"Already sent email for {message['order_id']}")
return # Skip, acknowledge message
# Actually send the email
send_email(
to=message['user_email'],
subject="Order Confirmation",
body=generate_email_body(message['order_id'])
)
# Mark as processed (with expiry)
redis.set(idempotency_key, "done", ex=86400) # 24h
- Generate consistent idempotency keys from message data
- Check BEFORE doing the work, not after
- Set expiry to avoid storing keys forever
- Use atomic operations (Redis SET NX) to avoid race conditions
Backpressure: When Queues Overflow
What happens when producers create messages faster than consumers can process them? The queue grows. And grows. And eventually, something breaks.
Horizontal scaling. If one worker processes 100/sec, ten workers process 1000/sec.
Slow down input. Return 429 errors or queue locally until space opens.
Process high-priority first. Analytics can wait; payment confirmations can't.
If acceptable, drop old analytics events. Not all messages are equal.
Before scaling horizontally, understand WHY it's slow. Maybe there's a slow database query, an inefficient loop, or an external API bottleneck. Scaling bad code just costs more money.
Adding workers or increasing limits is treating the symptom, not the cause. First profile your worker code to understand why it's slow. You might find a simple fix that's cheaper than scaling.
Practice Mode: Test Your Understanding
- Save the original file to storage
- Generate 4 different quality versions (480p, 720p, 1080p, 4K)
- Extract a thumbnail every 10 seconds
- Run content moderation AI
- Update the database with video metadata
order_confirmation:{order_id} ensures that no matter how many times the message is retried, each order only gets one confirmation email. Using email address would block ALL emails to that user; using random UUID defeats the purpose.
ExternalAPIUnavailable: Partner webhook endpoint returned 503. The partner's API has been down for 2 hours.
Cheat Sheet
When to Go Async
- Operation takes > 100ms
- User doesn't need result immediately
- Calling unreliable external services
- Can tolerate eventual consistency
- Notifications, reports, cleanup tasks
Reliability Checklist
- Acknowledgment: Only after success
- Idempotency: Safe to process twice
- Dead Letter Queue: After N failures
- Monitoring: Queue depth, lag, DLQ size
- Alerting: Growing queues = problem
Tool Selection
- SQS: Managed, infinite scale, simple
- RabbitMQ: Complex routing, exchanges
- Redis: Already using it? Good enough
- Kafka: Massive scale, event streaming
- Start simple (SQS/Redis), scale later
Decision Framework: Should This Be Async?
Yes → Keep synchronous. No → Continue to step 2.
No → Probably fine synchronous. Yes → Continue to step 3.
No → Needs careful error handling (DLQ, alerts). Yes → Great async candidate!
Add idempotency, set up monitoring, configure DLQ.
What To Do Monday
- If you're just starting: Use SQS or Redis. Don't start with Kafka unless you need millions of events.
- If you have slow endpoints: Profile them. Find the sync operations that don't need to block. Move them to a queue.
- If you're already using queues: Check your monitoring. Is queue depth stable? Are DLQ messages piling up? Are workers healthy?
Async systems fail differently than sync ones. Failure Handling covers the patterns you need: retries with exponential backoff, circuit breakers to prevent cascade failures, and graceful degradation. These apply directly to your async workers and queues.
What to Read Next
Was this helpful?
Your feedback helps improve future posts
What could be better? (optional)
Discussion
Leave a comment