System Design Fundamentals: Design a Rate Limiter

📌 Question

"Design a rate limiter that allows up to 100 requests per user per minute."

This is a foundational system design interview question that tests your ability to balance correctness, performance, and scalability in distributed systems. It’s especially relevant in backend and API-related roles.


✅ Solution

1. Clarify Requirements

Before diving into the design, clarify the following:

  • Limit: 100 requests per user per minute
  • Behavior: Requests beyond the limit should be rejected (e.g., with HTTP 429)
  • Scale: System should handle millions of users and thousands of requests per second
  • Latency: Low overhead, minimal response delay
  • Fault tolerance: Should degrade gracefully under partial failures

2. Choose a Rate Limiting Strategy

There are several algorithmic approaches:

  • Fixed Window Counter: Count the number of requests in each time window (e.g., per minute). Simple but may cause bursts at boundary edges.
  • Sliding Window Log: Store a list of request timestamps and count only those within the last 60 seconds. Accurate but can become memory-heavy.
  • Sliding Window Counter: Divide time into small buckets (e.g., 6 buckets of 10 seconds) and use a rolling sum. More efficient than logging every request.
  • Token Bucket: Refill tokens at a steady rate. Each request consumes one token. Supports bursts and is widely used in practice.

For production systems, Token Bucket or Sliding Window Counter are preferred due to their balance between accuracy and efficiency.


3. Single-Node In-Memory Design

A simple approach is to maintain a per-user list of timestamps in memory and filter out those older than 60 seconds. This works well for prototyping or low-traffic applications, but lacks scalability, persistence, and fault tolerance.


4. Scalable Distributed Design

For real-world systems, a distributed store like Redis is typically used:

  • Maintain per-user counters in Redis (e.g., user:1234:counter)
  • Use atomic increment operations with expiration (e.g., 60 seconds)
  • Use Lua scripting to ensure atomicity when checking and updating limits
  • For higher scale, Redis can be clustered or sharded

This design is highly performant and supports multi-instance or cloud-based architectures.


5. System Architecture Overview

The system typically includes:

  • An API Gateway or middleware that performs rate limit checks
  • A central store (e.g., Redis) to track usage
  • Optional observability hooks to log usage and violations

Requests are checked against the user’s counter before being passed to backend services.


6. Advanced Enhancements

  • Tiered Limits: Offer higher limits for premium users
  • Global IP-based Limits: Prevent abuse from unknown or anonymous traffic
  • Burst Handling: Token Bucket allows short bursts while keeping average usage under control
  • Monitoring & Alerting: Track rate limit rejections and system health
  • Fallback Handling: Add circuit breakers or local caches in case Redis becomes unavailable

7. Trade-Offs and Considerations

OptionProsCons
Fixed Window CounterEasy to implementEdge burst issues
Sliding Window LogHigh accuracyMemory-intensive
Sliding Window CounterEfficient + fairly accurateSlight complexity
Token BucketRealistic, allows burstsRequires refill tracking logic
Redis-based approachScalable, consistentRedis is a single point of failure unless clustered

8. What Interviewers Look For

  • Understanding of trade-offs between accuracy, complexity, and scalability
  • Ability to design for both small and large-scale use cases
  • Awareness of distributed challenges (latency, consistency, resilience)
  • Practicality in choosing tools (e.g., Redis, queues, caching layers)
  • Clear communication of design decisions

✅ Summary

A rate limiter protects backend services from abuse, enforces fair usage policies, and helps maintain availability under heavy load. For interviews, it’s important to:

  • Compare algorithm choices clearly
  • Justify the use of distributed infrastructure (like Redis)
  • Consider user experience, resilience, and extensibility