Building a Rate Limiter in Redis That Actually Holds Up

At some point you'll need a rate limiter. Maybe an API endpoint is getting hammered. Maybe you want to stop a single user from submitting a form fifty times. Maybe you got a bill from your LLM provider and had a small heart attack.

I've reached for express-rate-limit in memory, which works fine until you have more than one server process. Then each process has its own counter and users get five times the allowed requests. The fix is to centralize state, and Redis is the obvious place for it.

Here's what I've learned building this properly.

Why Fixed Windows Have a Bug

The obvious approach: store a counter per user per time window. Reset it every minute. Simple.

key: rate:user123:2024-03-01T14:00  →  47

The problem is the boundary. Say the limit is 100 requests per minute. A user makes 99 requests at 11:59:59, then 99 more at 12:00:01. They've made 198 requests in two seconds and your rate limiter let all of them through.

This is called the burst problem. Fixed windows allow double the limit at window boundaries, which is often fine but sometimes catastrophic.

Sliding Window Fixes It

Instead of "requests in this minute-long bucket," track "requests in the last 60 seconds from now." Every request gets a timestamp, you keep a sorted set, you count entries within the window.

ZADD rate:user123 1709301234.123 "req-uuid-1"
ZCOUNT rate:user123 (now - 60) now

No boundary burst. The window slides with you.

The catch is cleanup. You need to remove old entries or your Redis memory grows forever. Add a ZREMRANGEBYSCORE before counting:

ZREMRANGEBYSCORE rate:user123 -inf (now - 60)
ZCOUNT rate:user123 0 +inf

Now there are three operations. That's not atomic. Between the ZREMRANGEBYSCORE and the ZCOUNT, another request could sneak in. This is where Lua comes in.

Making It Atomic with Lua

Redis executes Lua scripts atomically. Nothing else runs while your script runs:


lua
-- sliding_window.lua
local key = KEYS[1]
local window = tonumber(ARGV[1])  -- window size in seconds
local limit = tonumber(ARGV[2])
local now = tonumber(ARGV[3])     -- current timestamp in ms
local request_id = ARGV[4]

-- Remove entries outside the window
redis.call('ZREMRANGEBYSCORE', key, '-inf', now - (window * 1000))

-- Count remaining entries
local count = redis.call('ZCARD', key)

if count >= limit then
  -- Get oldest entry to calculate reset time
  local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
  local reset_at = oldest[2] + (window * 1000)
  return {0, count, reset_at}
end

-- Add this request
redis.call('ZADD', key, now, request_id)
redis.call('PEXPIRE', key, window * 1000)

return {1, count + 1, -1}

The return values are [allowed, current_count, reset_timestamp]. That's enough to set the right response headers.

Calling this from Node.js:


typescript
import { createClient } from 'redis'
import { readFileSync } from 'fs'
import { join } from 'path'
import { randomUUID } from 'crypto'

const client = createClient({ url: process.env.REDIS_URL })
await client.connect()

const script = readFileSync(
  join(__dirname, 'sliding_window.lua'),
  'utf-8'
)

interface RateLimitResult {
  allowed: boolean
  current: number
  resetAt: number | null
}

async function checkRateLimit(
  identifier: string,
  options: { window: number; limit: number }
): Promise<RateLimitResult> {
  const key = `rate:${identifier}`
  const now = Date.now()

  const [allowed, current, resetAt] = (await client.eval(script, {
    keys: [key],
    arguments: [
      String(options.window),
      String(options.limit),
      String(now),
      randomUUID(),
    ],
  })) as [number, number, number]

  return {
    allowed: allowed === 1,
    current,
    resetAt: resetAt === -1 ? null : resetAt,
  }
}

Wrapping It as Express Middleware


typescript
import { Request, Response, NextFunction } from 'express'

interface RateLimitOptions {
  window: number   // seconds
  limit: number
  keyFn?: (req: Request) => string
}

export function rateLimit(options: RateLimitOptions) {
  const { window, limit, keyFn } = options

  return async (req: Request, res: Response, next: NextFunction) => {
    const identifier = keyFn
      ? keyFn(req)
      : req.ip ?? 'unknown'

    const result = await checkRateLimit(identifier, { window, limit })

    // Always set headers so clients know their state
    res.setHeader('X-RateLimit-Limit', limit)
    res.setHeader('X-RateLimit-Remaining', Math.max(0, limit - result.current))

    if (result.resetAt) {
      res.setHeader(
        'X-RateLimit-Reset',
        Math.ceil(result.resetAt / 1000)
      )
    }

    if (!result.allowed) {
      res.status(429).json({
        error: 'Too many requests',
        retryAfter: result.resetAt
          ? Math.ceil((result.resetAt - Date.now()) / 1000)
          : window,
      })
      return
    }

    next()
  }
}

Usage is clean:


typescript
// 100 requests per minute per IP
app.use('/api/', rateLimit({ window: 60, limit: 100 }))

// 5 login attempts per 15 minutes per email
app.post('/auth/login',
  rateLimit({
    window: 900,
    limit: 5,
    keyFn: (req) => `login:${req.body.email}`,
  }),
  loginHandler
)

The Token Bucket Alternative

Sliding windows are great but they penalize bursty-but-legitimate usage. An API client might genuinely need to make 20 requests in one second, then nothing for the rest of the minute.

The token bucket algorithm handles this better. You have a bucket that fills at a fixed rate. Each request consumes a token. You can burst up to the bucket capacity, but your sustained rate is capped by the refill rate.


lua
-- token_bucket.lua
local key = KEYS[1]
local capacity = tonumber(ARGV[1])      -- max tokens
local refill_rate = tonumber(ARGV[2])   -- tokens per second
local now = tonumber(ARGV[3])           -- current timestamp in seconds (float)
local requested = tonumber(ARGV[4])     -- tokens to consume (usually 1)

local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now

-- Calculate how many tokens to add since last request
local elapsed = now - last_refill
local new_tokens = math.min(capacity, tokens + (elapsed * refill_rate))

if new_tokens < requested then
  -- Not enough tokens
  redis.call('HMSET', key, 'tokens', new_tokens, 'last_refill', now)
  redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) + 1)
  return {0, new_tokens}
end

new_tokens = new_tokens - requested
redis.call('HMSET', key, 'tokens', new_tokens, 'last_refill', now)
redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) + 1)

return {1, new_tokens}

Which Algorithm to Use

Sliding window if:

Your limits are simple (N requests per period)
You want to be strict about the window
The implementation complexity is a concern for your team

Token bucket if:

You want to allow bursts
You're rate limiting expensive operations with different costs (weight each request differently by passing requested > 1)
You're modeling something closer to a real resource budget

Both are production-viable. I tend to reach for sliding window first because it's easier to explain to stakeholders: "you get 100 requests per minute, the window slides" is a cleaner user story than "you start with 100 tokens that refill at 1.67 per second."

One More Thing: What to Do When Redis Is Down

Rate limiters are infrastructure. Redis goes down sometimes. Your choices:

Fail open — allow all requests when Redis is unavailable. Correct choice for most apps; you'd rather serve traffic than block everyone.
Fail closed — block all requests. Correct if you're protecting something expensive or security-critical.
Local fallback — fall back to a per-process in-memory limiter. Imperfect but better than either extreme.

My default is fail open with an alert. The rate limiter failing means something else is wrong; fix that first.


typescript
async function checkRateLimit(
  identifier: string,
  options: { window: number; limit: number }
): Promise<RateLimitResult> {
  try {
    // ... redis call
  } catch (err) {
    console.error('Rate limiter Redis error, failing open:', err)
    // Optionally: report to your error tracking service
    return { allowed: true, current: 0, resetAt: null }
  }
}