System Design Fundamentals: Design a Rate Limiter

📌 Question

"Design a rate limiter that allows up to 100 requests per user per minute."

This is a foundational system design interview question that tests your ability to balance correctness, performance, and scalability in distributed systems. It’s especially relevant in backend and API-related roles.

✅ Solution

1. Clarify Requirements

Before diving into the design, clarify the following:

Limit: 100 requests per user per minute
Behavior: Requests beyond the limit should be rejected (e.g., with HTTP 429)
Scale: System should handle millions of users and thousands of requests per second
Latency: Low overhead, minimal response delay
Fault tolerance: Should degrade gracefully under partial failures

2. Choose a Rate Limiting Strategy

There are several algorithmic approaches:

Fixed Window Counter: Count the number of requests in each time window (e.g., per minute). Simple but may cause bursts at boundary edges.
Sliding Window Log: Store a list of request timestamps and count only those within the last 60 seconds. Accurate but can become memory-heavy.
Sliding Window Counter: Divide time into small buckets (e.g., 6 buckets of 10 seconds) and use a rolling sum. More efficient than logging every request.
Token Bucket: Refill tokens at a steady rate. Each request consumes one token. Supports bursts and is widely used in practice.

For production systems, Token Bucket or Sliding Window Counter are preferred due to their balance between accuracy and efficiency.

3. Single-Node In-Memory Design

A simple approach is to maintain a per-user list of timestamps in memory and filter out those older than 60 seconds. This works well for prototyping or low-traffic applications, but lacks scalability, persistence, and fault tolerance.

4. Scalable Distributed Design

For real-world systems, a distributed store like Redis is typically used:

Maintain per-user counters in Redis (e.g., user:1234:counter)
Use atomic increment operations with expiration (e.g., 60 seconds)
Use Lua scripting to ensure atomicity when checking and updating limits
For higher scale, Redis can be clustered or sharded

This design is highly performant and supports multi-instance or cloud-based architectures.

5. System Architecture Overview

The system typically includes:

An API Gateway or middleware that performs rate limit checks
A central store (e.g., Redis) to track usage
Optional observability hooks to log usage and violations

Requests are checked against the user’s counter before being passed to backend services.

6. Advanced Enhancements

Tiered Limits: Offer higher limits for premium users
Global IP-based Limits: Prevent abuse from unknown or anonymous traffic
Burst Handling: Token Bucket allows short bursts while keeping average usage under control
Monitoring & Alerting: Track rate limit rejections and system health
Fallback Handling: Add circuit breakers or local caches in case Redis becomes unavailable

7. Trade-Offs and Considerations

Option	Pros	Cons
Fixed Window Counter	Easy to implement	Edge burst issues
Sliding Window Log	High accuracy	Memory-intensive
Sliding Window Counter	Efficient + fairly accurate	Slight complexity
Token Bucket	Realistic, allows bursts	Requires refill tracking logic
Redis-based approach	Scalable, consistent	Redis is a single point of failure unless clustered

8. What Interviewers Look For

Understanding of trade-offs between accuracy, complexity, and scalability
Ability to design for both small and large-scale use cases
Awareness of distributed challenges (latency, consistency, resilience)
Practicality in choosing tools (e.g., Redis, queues, caching layers)
Clear communication of design decisions

✅ Summary

A rate limiter protects backend services from abuse, enforces fair usage policies, and helps maintain availability under heavy load. For interviews, it’s important to:

Compare algorithm choices clearly
Justify the use of distributed infrastructure (like Redis)
Consider user experience, resilience, and extensibility