Performance & Scaling

Async Processing:
Don't Make Users Wait

Your checkout takes 12 seconds. Users abandon carts. The payment itself takes 0.8 seconds. Where do the other 11 seconds go? And why are users waiting for them?

Bahgat Bahgat Ahmed
· February 2026 · 20 min read
User clicks "Buy"
Process payment 0.8s USER CARES
Send email 2s
Generate PDF 3s
Update inventory 2s
Analytics + webhooks 4s
Show "Success!" 12.8s total
Table of Contents
4 parts

In the name of Allah, the Most Gracious, the Most Merciful

Your checkout flow takes 12 seconds. Users abandon carts. You profile the code and discover the actual payment processing takes 800 milliseconds.

So where do the other 11 seconds go?

  • Sending confirmation email: 2 seconds
  • Generating PDF receipt: 3 seconds
  • Updating inventory in 3 systems: 2 seconds
  • Notifying analytics: 1 second
  • Sending webhook to partner: 2 seconds
  • Logging to audit system: 1 second

You stare at the code. The user is waiting... for an email they'll read in 10 minutes. They're waiting for analytics they'll never see. They're waiting for a PDF they might never download.

Why?

Quick Summary
  • Don't make users wait for things they don't need — the biggest latency wins come from moving work to background
  • Message queues are the bridge — they decouple "request" from "processing"
  • Plan for failure — workers crash, messages get redelivered, idempotency is essential

Want the full story? Keep reading.

This post is for you if:

  • Your API responses are slow because they do too much work inline
  • Users are waiting for operations they don't care about
  • You want to understand message queues without the enterprise jargon
  • You're building for scale and need to decouple services

The Synchronous Trap

Most developers write code that does things sequentially because that's how we think. Step 1, then step 2, then step 3.

The Synchronous Checkout: Every Step Blocks
Click
Payment 0.8s
Email 2s
PDF 3s
Inventory 2s
Analytics 4s
Success! 12.8s
User cares about this
User doesn't need to wait

The user waits 12.8 seconds, but only cares about the 0.8s payment result. Everything else can happen after they see "Success!"

Why do we write synchronous code by default?
The Restaurant Analogy

Imagine a restaurant where the waiter takes your order, walks to the kitchen, watches the chef cook, waits for the food, brings it to you, then takes the next order. Insane, right? But that's exactly how synchronous code works.

It's how we think

Humans process instructions sequentially. "Do A, then B, then C" maps directly to code.

It's easier to debug

Line 10 runs before line 11. Stack traces make sense. No race conditions.

"It works"

For small scale, sync code works fine. The problems appear at scale.

But the user only cares about step 2 (payment) and the final "Success!" message. Everything else can happen after they see the success screen.

The question that changes everything: What MUST happen now vs what can happen later?

What Must Be Synchronous vs What Can Be Async?

This is the key insight. Not everything needs to block the user.

The Critical Question: Does the User Need This NOW?
Must Be Synchronous

User needs the result to continue

Payment processing — did it work?
Authentication — can they enter?
Data validation — is input valid?
Core record creation — order exists
Can Be Asynchronous

User doesn't need to wait for this

Confirmation emails — read in 10 min
PDF generation — download later
Analytics — user never sees
Webhooks — partner can wait
Inventory sync — eventually consistent
The Rule

If the user doesn't need to see the result RIGHT NOW, it can be async.

Quick Check
A user uploads a profile photo. Which operation should be synchronous?
A) Generate 5 different thumbnail sizes
B) Save the original photo and return the URL
C) Update the user's profile in the search index
Exactly right

The user needs to know the upload succeeded. But thumbnail generation, search indexing, and other processing can happen in the background. Show a placeholder until thumbnails are ready.

Not quite

Thumbnail generation and search indexing don't block the user's next action. Only saving the original and confirming it worked needs to be synchronous. The rest can happen in background workers.

Part 2
The Solution: Message Queues

Message Queues: The Bridge Between "Request" and "Processing"

A message queue is a buffer between the code that creates work and the code that does the work. Think of it as a to-do list that multiple workers can pull from.

The Message Queue Architecture
Producer
Your web app
send
Message Queue
Queue
Messages wait here
receive
Workers
Process in parallel

The producer adds messages to the queue and immediately returns. Workers pull messages and process them whenever they're ready.

What exactly is a message queue?
The Restaurant Order System

Imagine a busy restaurant. The waiter doesn't cook the food — they write the order on a ticket and clip it to the order wheel. The kitchen picks up tickets and cooks them. The waiter is free to take more orders immediately.

That ticket wheel? That's a message queue. It decouples "taking orders" from "making food."

How It Actually Works
1
Producer creates a message (JSON with task data)
2
Message is added to queue (durable, persisted to disk)
3
Worker pulls next message from queue
4
Worker processes message, then acknowledges completion

Popular queue technologies:

Amazon SQS
Managed, infinite scale
RabbitMQ
Full-featured, complex routing
Redis
Simple, if you already use it

The Producer-Consumer Pattern

The beauty of message queues is separation of concerns. Your web request code becomes simple:

Before vs After: The Checkout Example
Before: Synchronous 12.8s
def checkout(cart):
    charge_payment(cart)      # 800ms
    send_email(cart)          # 2000ms
    generate_pdf(cart)        # 3000ms
    update_inventory(cart)    # 2000ms
    notify_analytics(cart)    # 1000ms
    send_webhook(cart)        # 2000ms
    log_audit(cart)           # 1000ms
    return "Success"          # 12.8s later
After: Async 0.9s
def checkout(cart):
    charge_payment(cart)      # 800ms
    order = create_order(cart) # 100ms

    # Fire and forget - user doesn't wait
    queue.send({
        "type": "order_completed",
        "order_id": order.id
    })

    return "Success"          # 0.9s
Meanwhile, background workers process the queue:
# Worker process (runs separately from web server)
def process_message(message):
    if message["type"] == "order_completed":
        send_email(message["order_id"])
        generate_pdf(message["order_id"])
        update_inventory(message["order_id"])
        # ... etc
The Result

User sees "Success!" in 0.9 seconds instead of 12.8 seconds. A 93% reduction in perceived latency. Same work gets done, but the user doesn't wait for it.

Related: Async + Caching

Async processing pairs beautifully with caching. You can use background workers to warm caches, regenerate expired data, and keep frequently-accessed content fresh — all without blocking user requests.

Don't Forget: Workers Need Connections Too

Your async workers still need database connections. With many workers processing in parallel, connection pools can exhaust quickly. Configure worker concurrency based on your connection limits, or you'll trade API timeouts for database connection errors.

Part 3
Making It Reliable

What If Workers Crash?

Moving work to background is great, but now we have new problems. What happens when things go wrong?

The Four Questions of Async Reliability
Worker crashes mid-processing?

Email was half-sent, then server died.

Solution: Don't acknowledge until done. Queue re-delivers after timeout.
Process same message twice?

User gets two confirmation emails.

Solution: Idempotency keys. Check if already processed.
Queue fills up?

Messages pile up, memory exhausted.

Solution: Backpressure, scaling workers, or dropping low-priority messages.
Processing always fails?

Bad data, infinite retry loop.

Solution: Dead Letter Queue after N retries. Alert humans.
Deep Dive: Dead Letter Queues (DLQ)

A Dead Letter Queue is where messages go to die — but in a controlled way. Instead of retrying forever, failed messages are moved aside for human review.

Main Queue
Worker tries
3 attempts
Success: Delete
or
Fail: to DLQ
Dead Letter Queue
Human reviews

This connects to failure handling patterns — retries with backoff, circuit breakers, and fallbacks all apply to async processing too.

Idempotency: Safe to Process Twice

Network issues, crashes, and retries mean the same message might be processed multiple times. Your code must handle this gracefully.

Without vs With Idempotency
Without Idempotency
1. Message: "Send welcome email"
2. Worker sends email
3. Worker crashes before ACK
4. Queue re-delivers message
5. Worker sends email AGAIN
User gets 2 emails!
With Idempotency Key
1. Message: "Send email, key=abc123"
2. Worker checks: "Did I process abc123?" No
3. Send email, record abc123 as done
4. Worker crashes before ACK
5. Queue re-delivers message
6. Worker checks: "Did I process abc123?" Yes → Skip
User gets 1 email!
How to implement idempotency

The simplest approach: store processed message IDs and check before processing.

# Using Redis for idempotency tracking
def process_order_email(message):
    idempotency_key = f"email:{message['order_id']}"

    # Check if already processed
    if redis.get(idempotency_key):
        print(f"Already sent email for {message['order_id']}")
        return  # Skip, acknowledge message

    # Actually send the email
    send_email(
        to=message['user_email'],
        subject="Order Confirmation",
        body=generate_email_body(message['order_id'])
    )

    # Mark as processed (with expiry)
    redis.set(idempotency_key, "done", ex=86400)  # 24h
Key points:
  • Generate consistent idempotency keys from message data
  • Check BEFORE doing the work, not after
  • Set expiry to avoid storing keys forever
  • Use atomic operations (Redis SET NX) to avoid race conditions

Backpressure: When Queues Overflow

What happens when producers create messages faster than consumers can process them? The queue grows. And grows. And eventually, something breaks.

The Backpressure Problem
Producers
1000 msgs/sec
QUEUE GROWING!
5,847
Queue
Consumer
100 msgs/sec
(slow!)
Add More Workers

Horizontal scaling. If one worker processes 100/sec, ten workers process 1000/sec.

Rate Limit Producers

Slow down input. Return 429 errors or queue locally until space opens.

Prioritize Messages

Process high-priority first. Analytics can wait; payment confirmations can't.

Drop Low-Priority

If acceptable, drop old analytics events. Not all messages are equal.

Quick Check
Your queue depth is growing steadily. Workers are at 95% CPU. What's the FIRST thing you should check?
A) Add more worker instances
B) Increase the queue size limit
C) Check why processing is slow (profiling)
Smart thinking

Before scaling horizontally, understand WHY it's slow. Maybe there's a slow database query, an inefficient loop, or an external API bottleneck. Scaling bad code just costs more money.

Common mistake

Adding workers or increasing limits is treating the symptom, not the cause. First profile your worker code to understand why it's slow. You might find a simple fix that's cheaper than scaling.

Part 4
Practice & Reference

Practice Mode: Test Your Understanding

Async Processing Scenarios
Score: 0/3
Scenario 1 of 3
You're building a video upload feature. When a user uploads a video, you need to:
  • Save the original file to storage
  • Generate 4 different quality versions (480p, 720p, 1080p, 4K)
  • Extract a thumbnail every 10 seconds
  • Run content moderation AI
  • Update the database with video metadata
What should happen synchronously during the upload request?
A
Everything — user should see all versions before the page returns
B
Save original + database metadata, then return. Queue the rest.
C
Nothing synchronous — accept the upload and process everything async
Scenario 2 of 3
Your email worker is getting duplicate messages due to network issues. Users are receiving the same order confirmation email 2-3 times. You need to implement idempotency.
What's the best idempotency key for order confirmation emails?
A
The user's email address
B
A random UUID generated when creating the message
C
A combination: order_confirmation:{order_id}
Scenario 3 of 3
Your Dead Letter Queue has 500 messages that failed after 3 retries each. Investigation shows they all have the same error: ExternalAPIUnavailable: Partner webhook endpoint returned 503. The partner's API has been down for 2 hours.
What's the best course of action?
A
Delete the messages — they failed, move on
B
Wait for partner to recover, then replay the DLQ messages
C
Immediately move all messages back to the main queue

Cheat Sheet

Async Processing Quick Reference

When to Go Async

  • Operation takes > 100ms
  • User doesn't need result immediately
  • Calling unreliable external services
  • Can tolerate eventual consistency
  • Notifications, reports, cleanup tasks

Reliability Checklist

  • Acknowledgment: Only after success
  • Idempotency: Safe to process twice
  • Dead Letter Queue: After N failures
  • Monitoring: Queue depth, lag, DLQ size
  • Alerting: Growing queues = problem

Tool Selection

  • SQS: Managed, infinite scale, simple
  • RabbitMQ: Complex routing, exchanges
  • Redis: Already using it? Good enough
  • Kafka: Massive scale, event streaming
  • Start simple (SQS/Redis), scale later

Decision Framework: Should This Be Async?

1
Does the user need to see the result NOW?

Yes → Keep synchronous. No → Continue to step 2.

2
Does it take more than 100ms?

No → Probably fine synchronous. Yes → Continue to step 3.

3
Can it fail without the user knowing?

No → Needs careful error handling (DLQ, alerts). Yes → Great async candidate!

Result: Move it to a background queue!

Add idempotency, set up monitoring, configure DLQ.

What To Do Monday

  • If you're just starting: Use SQS or Redis. Don't start with Kafka unless you need millions of events.
  • If you have slow endpoints: Profile them. Find the sync operations that don't need to block. Move them to a queue.
  • If you're already using queues: Check your monitoring. Is queue depth stable? Are DLQ messages piling up? Are workers healthy?
Essential Reading: Failure Handling

Async systems fail differently than sync ones. Failure Handling covers the patterns you need: retries with exponential backoff, circuit breakers to prevent cascade failures, and graceful degradation. These apply directly to your async workers and queues.

What to Read Next

The 95% Problem: Understanding DB Connections
Workers need database connections too — don't exhaust your pool
Caching Strategies: Beyond Simple Key-Value
Use async workers to warm caches and keep data fresh
Failure Handling: Timeouts, Retries & Circuit Breakers
Handle failures gracefully in distributed systems

Was this helpful?

What could be better? (optional)

Discussion

Be the first to comment!

Leave a comment