📌 Question
"Design a rate limiter that allows up to 100 requests per user per minute."
This is a foundational system design interview question that tests your ability to balance correctness, performance, and scalability in distributed systems. It’s especially relevant in backend and API-related roles.
✅ Solution
1. Clarify Requirements
Before diving into the design, clarify the following:
- Limit: 100 requests per user per minute
- Behavior: Requests beyond the limit should be rejected (e.g., with HTTP 429)
- Scale: System should handle millions of users and thousands of requests per second
- Latency: Low overhead, minimal response delay
- Fault tolerance: Should degrade gracefully under partial failures
2. Choose a Rate Limiting Strategy
There are several algorithmic approaches:
- Fixed Window Counter: Count the number of requests in each time window (e.g., per minute). Simple but may cause bursts at boundary edges.
- Sliding Window Log: Store a list of request timestamps and count only those within the last 60 seconds. Accurate but can become memory-heavy.
- Sliding Window Counter: Divide time into small buckets (e.g., 6 buckets of 10 seconds) and use a rolling sum. More efficient than logging every request.
- Token Bucket: Refill tokens at a steady rate. Each request consumes one token. Supports bursts and is widely used in practice.
For production systems, Token Bucket or Sliding Window Counter are preferred due to their balance between accuracy and efficiency.
3. Single-Node In-Memory Design
A simple approach is to maintain a per-user list of timestamps in memory and filter out those older than 60 seconds. This works well for prototyping or low-traffic applications, but lacks scalability, persistence, and fault tolerance.
4. Scalable Distributed Design
For real-world systems, a distributed store like Redis is typically used:
- Maintain per-user counters in Redis (e.g.,
user:1234:counter
) - Use atomic increment operations with expiration (e.g., 60 seconds)
- Use Lua scripting to ensure atomicity when checking and updating limits
- For higher scale, Redis can be clustered or sharded
This design is highly performant and supports multi-instance or cloud-based architectures.
5. System Architecture Overview
The system typically includes:
- An API Gateway or middleware that performs rate limit checks
- A central store (e.g., Redis) to track usage
- Optional observability hooks to log usage and violations
Requests are checked against the user’s counter before being passed to backend services.
6. Advanced Enhancements
- Tiered Limits: Offer higher limits for premium users
- Global IP-based Limits: Prevent abuse from unknown or anonymous traffic
- Burst Handling: Token Bucket allows short bursts while keeping average usage under control
- Monitoring & Alerting: Track rate limit rejections and system health
- Fallback Handling: Add circuit breakers or local caches in case Redis becomes unavailable
7. Trade-Offs and Considerations
Option | Pros | Cons |
---|---|---|
Fixed Window Counter | Easy to implement | Edge burst issues |
Sliding Window Log | High accuracy | Memory-intensive |
Sliding Window Counter | Efficient + fairly accurate | Slight complexity |
Token Bucket | Realistic, allows bursts | Requires refill tracking logic |
Redis-based approach | Scalable, consistent | Redis is a single point of failure unless clustered |
8. What Interviewers Look For
- Understanding of trade-offs between accuracy, complexity, and scalability
- Ability to design for both small and large-scale use cases
- Awareness of distributed challenges (latency, consistency, resilience)
- Practicality in choosing tools (e.g., Redis, queues, caching layers)
- Clear communication of design decisions
✅ Summary
A rate limiter protects backend services from abuse, enforces fair usage policies, and helps maintain availability under heavy load. For interviews, it’s important to:
- Compare algorithm choices clearly
- Justify the use of distributed infrastructure (like Redis)
- Consider user experience, resilience, and extensibility