Redis, Caching, Session and everything in between

Let me tell you something that happened to a friend of mine.

He was building a food delivery app. Small team, good product, things were going well. Then one day, without any warning, the app just started dying. Pages took 8-10 seconds to load. Users were dropping off. The team was panicking.

They checked the servers. CPU was fine. Memory was fine. But the database, the database was on fire. It was getting hammered with thousands of identical queries every single second. The same restaurant pages being fetched over and over. The same menu data being loaded again and again. The same product listings being pulled from disk ten thousand times a minute.

Someone finally said: “Why are we not caching any of this?”

Nobody had an answer.

They added Redis that weekend. Monday morning, database load dropped by 90%. Pages went from 8 seconds to 200 milliseconds. Users came back. The app survived.

This blog is the thing my friend wishes he had read before that weekend.

We are going to cover everything, what caching actually is, what Redis actually is, how sessions work, how OTPs work, how rate limiting works, what goes wrong in production, and how to think about all of this when you are designing a system from scratch.

Real examples. Indian apps. No fluff. Everything from zero.

Let us go.

Quick Recap, Where We Left Off

In Part 4, we went deep on SQL vs NoSQL. We understood that SQL gives you ACID guarantees, Atomicity, Consistency, Isolation, Durability, and is perfect for financial data, order records, and anything where data correctness is non-negotiable. We understood that NoSQL gives you flexibility and horizontal scale, and is perfect for product catalogues, social feeds, and high-velocity data.

We also touched briefly on Redis, mentioned it as a caching layer, mentioned it as a session store. But we never went deep.

That changes today.

Because caching is not just “put Redis in front of your database.” It is an entire discipline. Getting it right is the difference between a system that scales to millions of users and a system that dies at 10,000. Getting it wrong introduces bugs that are genuinely difficult to find and fix.

Let us start from the very beginning.

What Is Caching? The Idea Behind Everything

The Problem Caching Solves

Imagine you run a chai shop near a busy office complex. Every morning at 9 AM, 200 people walk in and every single one of them asks you the same question: “Bhaiya, aaj ka special kya hai?”

You have two choices.

Choice A: Every single time someone asks, you walk to the back of the shop, open your notebook, check what ingredients you have, calculate what special makes sense, come back, and tell them. Two minutes per person. 200 people. You are answering this question for nearly 7 hours straight. The queue goes out the door. Everyone is frustrated.

Choice B: At 8:55 AM, before anyone arrives, you check your notebook once, figure out today’s special, “Masala chai with ginger biscuits”, and write it on a small board outside. Now when 200 people ask the same question, you point at the board. Two seconds per person. Everyone is happy.

That board is a cache.

Caching is the practice of storing the result of an expensive operation so that future requests for the same result can be served from the stored copy instead of recomputing it from scratch.

In backend systems, the “expensive operation” is almost always a database query, something that involves reading from disk, joining multiple tables, processing results, and sending data across a network. These operations take time. Sometimes 100 milliseconds. Sometimes 500 milliseconds. Sometimes more.

If 10,000 users are asking for the same thing, the homepage, a popular product page, a trending restaurant menu, and each request is independently hitting the database, you are solving the same problem 10,000 times when you only needed to solve it once.

Caching says: solve it once, remember the answer, serve the remembered answer to everyone else.

Where Caching Happens

Caching is not just one thing. It happens at multiple layers in a system, and it is useful to understand all of them.

Browser Cache: When you visit a website, your browser saves images, CSS files, and JavaScript locally. The next time you visit, your browser loads these from your own hard drive instead of downloading them again. This is why the second visit to any website feels faster than the first.

CDN Cache (Content Delivery Network): Companies like Cloudflare and Akamai have servers all over the world, in Mumbai, Chennai, Delhi, Singapore, London. When you visit a website, you are served files from the nearest Cloudflare server, not from a data centre in America. The content has been cached geographically close to you. This is why Netflix video starts quickly even though Netflix’s servers are not in your city.

Application Cache: This is the layer we are focused on today. Your backend application, sitting between the user and the database, maintains its own cache of frequently-requested data. Redis is the most common tool for this.

Database Cache: PostgreSQL and MySQL have their own internal caches, they keep frequently-accessed data in memory to avoid re-reading from disk. But this is internal to the database and you have limited control over it.

Each layer serves a different purpose. Today we are going deep on the application cache layer, specifically Redis.

What Is Redis? A Proper Introduction

The Story Behind Redis

In 2009, an Italian developer named Salvatore Sanfilippo was building a real-time analytics tool. He needed a system that could handle a very large number of writes per second. Traditional databases were too slow. So he built his own in-memory data store and open-sourced it. He called it Redis, Remote Dictionary Server.

It took off almost immediately. Within a few years, every major tech company in the world was using it. Today, Redis is one of the most widely deployed pieces of infrastructure in the industry. Instagram uses it. Twitter uses it. GitHub uses it. Swiggy, Zomato, PhonePe, Zepto, all of them use it.

Why did it become so popular so fast? Because it solves a genuine, universal problem, the need for a storage system that is fast enough to serve at the speed of user requests, flexible enough to handle many different kinds of data, and simple enough that engineers can understand and use it quickly.

Why Is Redis So Fast?

This is the most important thing to understand about Redis, and it comes down to one single fact: Redis stores everything in RAM.

Your database, PostgreSQL, MySQL, whatever you use, stores data on disk. When a query runs, it reads from disk, which is slow. Even with modern SSDs, a disk read takes somewhere between 0.1 and 1 millisecond. That does not sound like much, but when you multiply it by thousands of concurrent requests, it adds up enormously.

RAM is a completely different beast. Reading from RAM is roughly 100,000 times faster than reading from a spinning hard disk and about 10-50 times faster than even a fast SSD. When Redis looks up a key, it is going to RAM directly. No disk seek. No I/O wait. The result is back in microseconds.

A typical PostgreSQL query, even a simple one with a good index, takes 1-5 milliseconds. A Redis GET on the same data takes 0.1-0.3 milliseconds. That is a 10-50 times speed difference. For a system serving thousands of requests per second, that gap is the difference between a smooth experience and a sluggish one.

Redis Is More Than Just a Key-Value Store

Most people think of Redis as a simple dictionary, you put a key in, you get a value out. And yes, that is what it does at its core. But Redis supports several rich data structures, and each one has specific use cases.

Strings: The simplest type. A key maps to a text or binary value. This is what you use for caching plain data, a serialised JSON object representing a restaurant’s menu, an HTML fragment, a computed result.

SET restaurant:4421 '{"name": "Sharma Ji ka Dhaba", "rating": 4.5, ...}'
GET restaurant:4421

Lists: An ordered sequence of strings. Think of it like an array. You can push items to the front or back and pop from either end. Useful for activity feeds, message queues, recently viewed items.

LPUSH feed:user:9812 "post:1001"
LPUSH feed:user:9812 "post:1002"
LRANGE feed:user:9812 0 9    (get first 10 items)

Sets: An unordered collection of unique strings. You can add items, check membership, and perform union or intersection operations between sets. Useful for tracking which users have seen a notification, what tags a post has, which products are on sale.

SADD sale:today "product:501" "product:302" "product:788"
SISMEMBER sale:today "product:302"    (returns 1 — yes, it is on sale)

Sorted Sets: Like a set, but every member has a score, and members are ordered by score. This is one of the most powerful data structures in Redis. Perfect for leaderboards, priority queues, and time-ordered data.

ZADD leaderboard 4500 "user:Revathi"
ZADD leaderboard 7200 "user:priya"
ZADD leaderboard 6100 "user:rohan"
ZRANGE leaderboard 0 2 REV WITHSCORES    (top 3 users)

Hashes: A map of field-value pairs stored within a single key. Like a dictionary inside a dictionary. Perfect for storing objects where you might want to update individual fields without replacing the whole thing.

HSET session:a7f3k2 user_id 9812 cart_count 3 city "Delhi"
HGET session:a7f3k2 user_id    (returns 9812)

HyperLogLog: A probabilistic data structure for counting unique items using very little memory. If you need to count unique visitors to a page and a rough estimate is good enough, HyperLogLog gives you that count using only 12 KB of memory regardless of whether you have a thousand or a hundred million unique visitors.

Pub/Sub: Redis has a publish-subscribe messaging system. One part of your application publishes a message to a channel. Other parts of your application that are subscribed to that channel receive it instantly. We will cover this in detail later.

Streams: A newer, more powerful version of the log concept. Append-only sequences of events. Similar to Kafka but simpler and built right into Redis. Useful for event sourcing, audit logs, and activity tracking.

Understanding these data structures is what separates someone who “uses Redis” from someone who uses Redis well.

Caching With Redis, How It Actually Works

The Cache-Aside Pattern (Lazy Loading)

The most common caching pattern in backend development is called Cache-Aside, also known as Lazy Loading. Here is the idea: your application is responsible for managing the cache. The database and the cache do not talk to each other directly. Your code sits in the middle and decides what goes in the cache and when.

The flow looks like this:

A request comes in.
Your application checks Redis: “Do I have cached data for this?”
If yes (cache hit), return the cached data immediately. Done.
If no (cache miss), go to the database, fetch the data, store it in Redis, return it to the user.

Let me trace this through a real example.

You open the Zomato app and tap on a restaurant. Let us say it is “Sharma Ji ka Dhaba” with restaurant ID 4421.

First request (cache miss):

Your phone sends a request to Zomato’s servers. The server checks Redis: GET restaurant:4421. Redis says nothing, the key does not exist. Cache miss.

The server now queries PostgreSQL. This involves joining the restaurants table, the menu_items table, the ratings table, maybe a reviews_summary table. Four or five joins. The database reads from disk, executes the query, returns the result. This takes maybe 150 milliseconds.

The server takes the result, serialises it to JSON, and stores it in Redis: SET restaurant:4421 "{...json data...}" EX 300. The EX 300 part means “expire this key after 300 seconds”, 5 minutes. Then the server sends the data back to your phone.

Second request (cache hit), 30 seconds later:

Another user opens the same restaurant. The server checks Redis: GET restaurant:4421. Redis has it, the key was set 30 seconds ago and has not expired yet. Cache hit. The server returns the data instantly, in about 2 milliseconds. No database involved whatsoever.

Third through ten-thousandth requests, within the next 5 minutes:

Same story. Every request is a cache hit. The database handles zero additional queries for this restaurant during this 5-minute window. That is thousands of database queries saved, thousands of disk reads avoided, thousands of milliseconds of latency eliminated.

After 5 minutes, the TTL expires. Redis deletes the key. The next request gets a cache miss, goes to the database, and the cycle begins again.

This is the Cache-Aside pattern. Simple, effective, and the backbone of most production caching systems.

The Write-Through Pattern

Cache-Aside is lazy, you only populate the cache when someone asks for the data. Write-Through is proactive, whenever you write to the database, you also update the cache at the same time.

A restaurant manager updates their menu. In a Write-Through system, your application does two things simultaneously: writes the new menu to the database, and updates restaurant:4421 in Redis with the new menu data.

The advantage: the cache is always fresh. No stale data, ever.

The disadvantage: you write to the cache for every update, even for data that nobody might request again soon. You are doing extra work upfront. Also, if the cache write succeeds but the database write fails (or vice versa), you now have inconsistent data between your cache and your database, which is a serious problem.

Most systems use Cache-Aside as the default and add targeted Write-Through for high-traffic items where cache freshness really matters.

The Read-Through Pattern

In Read-Through, the cache itself is responsible for going to the database on a miss. Your application talks only to the cache. If the cache does not have something, it fetches it from the database automatically.

This simplifies your application code, you always talk to one place (the cache). But it requires your caching layer to be configured to understand your database, which adds operational complexity. Most teams using Redis implement Cache-Aside in their application code rather than configuring Read-Through, because it gives them more control.

Cache Hit Rate (The Number You Always Need to Watch)

The cache hit rate is the percentage of requests served from cache instead of the database.

If your cache hit rate is 95%, it means 95 out of every 100 requests never touch the database. Your database is only handling 5% of the actual load. For a system getting 100,000 requests per minute, that means only 5,000 go to the database instead of 100,000. That is an enormous difference in database load, cost, and response time.

If your cache hit rate is 30%, you are saving 30% of database queries. You still have significant load on the database. Your cache is providing some benefit but not nearly as much as it could.

What determines cache hit rate? Two main things.

First, TTL, the longer you keep data in cache, the more likely it is that a subsequent request will find it there. But longer TTL means potentially staler data. It is a trade-off.

Second, the diversity of requests. If your application is like a viral restaurant on Zomato that everyone is viewing at the same time, almost every request will be a cache hit after the first one. But if you have a million different restaurants and requests are spread randomly across all of them, the chance of any two consecutive requests being for the same restaurant is low, and cache hit rates suffer.

In reality, most applications follow what is called a power law distribution, a small percentage of items get the vast majority of traffic. The top 100 restaurants in a city might account for 60% of all restaurant views. The top 10 products on an e-commerce site might account for 50% of all product page views. Caching those hot items gets you enormous benefit, even if the long tail of less popular items gets poor cache hit rates.

TTL: Time To Live, Explained Properly

What TTL Is and How It Works

TTL stands for Time To Live. It is the expiry timer you attach to a key in Redis.

SET restaurant:4421 "{...data...}" EX 300

The EX 300 tells Redis: “This key should exist for 300 seconds. After that, delete it automatically.” When those 300 seconds are up, Redis removes the key with zero intervention from your application. No cleanup job. No cron script. Redis handles it.

This is the fundamental mechanism that prevents your cache from showing stale data forever. Without TTL, a cached item would live in Redis indefinitely, even if the underlying data in the database changed completely. With TTL, you put a maximum age on cached data.

Choosing the Right TTL

Choosing TTL is not a technical problem. It is a business problem. The question to ask is: “How old can this data be before it causes a real problem for the user?”

Let me give you a framework using real examples.

Restaurant menu on Zomato. A restaurant menu changes maybe once a month, new dishes added, seasonal specials, a price revision. If a user sees a menu that is 2 hours old, they might order something at a slightly different price than what appears on the menu. That is mildly annoying but not catastrophic. TTL of 1-2 hours is fine. Maybe even a day.

Product price on Flipkart during a normal day. Prices change occasionally, maybe once every few hours for popular products. A user seeing a price that is 5 minutes old is completely acceptable. TTL of 5-10 minutes works.

Product price on Flipkart during a Big Billion Days sale. Flash sales change prices every hour. Inventory might run out in minutes. A user seeing stale “In Stock” status and adding something to cart that is actually sold out is a terrible experience. TTL should be very short, 30 seconds to 1 minute, combined with active invalidation when prices change.

Order status on Swiggy. A user is waiting for their biryani. The order goes through stages: placed → accepted → being prepared → out for delivery → delivered. They are refreshing the app constantly. If the status is 5 minutes stale, they will panic. TTL should be 10-30 seconds at most, or better, invalidate the cache actively every time the status changes.

OTP for login. An OTP is valid for 10 minutes. TTL = 600 seconds exactly. After 10 minutes, the OTP should be gone. TTL handles this perfectly. The OTP never persists beyond its validity period, and you need zero cleanup logic.

City names, cuisine categories, fare tables, train schedules. This data changes maybe once in several months. Cache it for days. There is zero reason to ever query this from the database more than once a day.

The pattern: the more critical and fast-changing the data, the shorter your TTL. The more static and low-stakes the data, the longer your TTL.

Cache Invalidation (The Hardest Problem in Caching)

Why Cache Invalidation Is Hard

There is a famous quote in computer science: “There are only two hard things in computer science: cache invalidation and naming things.”

This is funny precisely because it is true. Cache invalidation sounds boring and mechanical but it is actually a deep distributed systems problem.

Here is the core issue. When you cache data, you now have two copies of that data, one in your database (the truth) and one in Redis (the cached copy). These two copies are only guaranteed to be in sync at the moment the cache was populated. After that, any change to the database makes the cache stale.

The question is: how do you handle that staleness?

TTL-Based Expiry (Let the Clock Do the Work)

The simplest strategy. You set a TTL and accept that your cached data might be stale for up to TTL seconds. When the TTL expires, the next request gets fresh data from the database.

This is the right default for most situations. It requires zero extra code. It requires zero coordination. It is simple to understand and debug.

When does it work well: when the underlying data changes slowly (menus, product descriptions, user profiles, city information), or when a small amount of staleness is genuinely acceptable.

When does it fail: when the data changes frequently and showing stale data causes real harm. Imagine IRCTC caching the number of available seats with a 5-minute TTL. Someone books the last seat. For the next 5 minutes, every other user sees “1 seat available” when there is actually none. They proceed to the payment page, enter their card details, and then get an error at the end. That is a terrible experience, and it is caused entirely by stale cache.

Active Invalidation (Delete When Data Changes)

When the data in the database changes, your application explicitly deletes or updates the corresponding Redis key at the same time.

Here is how it works in practice. A restaurant manager opens the Zomato partner app and updates their menu. Your application code runs two operations:

First: UPDATE menus SET ... WHERE restaurant_id = 4421, update the database. Second: DEL restaurant:4421, delete the cached key from Redis.

The next user who views the restaurant gets a cache miss, goes to the database, gets the fresh menu, and repopulates the cache. From that point forward, the cache reflects the new menu.

This is more reliable than pure TTL-based expiry. Changes are reflected almost immediately instead of waiting for the TTL to expire.

The challenge: you have to do this invalidation in every single place where the data can be changed. If your restaurant menu can be updated through the partner app, through an internal admin panel, through a bulk import tool, and through a customer support interface, all four of those code paths must invalidate the Redis key. Miss any one of them and you have a cache inconsistency bug that will be very hard to find.

This is the real reason cache invalidation is hard. It is not technically complex. It is organisationally complex. As your codebase grows, tracking all the places where a piece of data can change, and ensuring all of them properly invalidate the cache, becomes genuinely difficult.

Event-Driven Invalidation (The Cleanest Approach)

The most sophisticated approach, and the one used by most large-scale systems.

Instead of directly invalidating the cache from the code that makes the change, the code publishes an event, “restaurant 4421’s menu was updated”, to a message queue or to Redis Pub/Sub. A separate background worker listens for these events and handles the cache invalidation.

This decouples the code that makes changes from the code that manages the cache. Your partner app code does not need to know anything about Redis. It just publishes an event. The cache manager handles the rest.

This is cleaner and scales better. Adding a new code path that can update restaurant menus? Just make sure it publishes the same event. The cache manager handles it automatically.

The tradeoff is architectural complexity. You now have a message queue, a background worker, and more moving parts to operate and monitor.

For most applications, a combination of Strategy 1 and Strategy 2 is the right starting point, TTL as your safety net, active invalidation for data where freshness really matters. Graduate to event-driven invalidation when your codebase gets large enough that tracking invalidation manually becomes painful.

The Race Condition That Makes Even Active Invalidation Hard

Even if you implement active invalidation perfectly, there is a subtle race condition that can still cause problems.

Here is the sequence that causes it:

User A requests restaurant:4421. Cache miss.
User A queries the database. Gets the menu data.
Meanwhile, the restaurant manager updates the menu. Your code deletes restaurant:4421 from Redis. ✓
User A, not knowing about the deletion, writes the old data to Redis: SET restaurant:4421 "{old menu data}"
Now Redis has the old data again, and this time there might be a long TTL set on it.

The cache is now incorrect indefinitely (until the next invalidation event), and you might not even notice because the code “worked”, it deleted the key, and the key was recreated. But it was recreated with stale data.

This race condition is not extremely common, but it happens in production systems, and it is subtle enough that it can be difficult to diagnose.

Solutions include using versioned keys (the key includes a version number that gets incremented on every update, so old writes to an old version key are harmless), or using conditional writes (only write to cache if the current version matches the expected version), or simply relying on a short TTL as a safety net even when using active invalidation.

This is why cache invalidation is genuinely hard. Not technically complex in any one piece, but deceptively tricky to get perfectly right across an entire system.

Cache Eviction, What Happens When Redis Runs Out of Memory

The Memory Problem

Redis stores everything in RAM. RAM is finite. You might give your Redis instance 8 GB of memory. As your system grows, more and more keys get cached. Eventually, Redis fills up.

What happens then?

You have two options: Redis starts rejecting writes (returns an error when you try to set a new key), or Redis automatically evicts some existing keys to make room for the new one. Which behaviour you get depends on your eviction policy configuration.

Eviction Policies, Picking What to Throw Away

Redis gives you several eviction policies to choose from. Here are the important ones.

noeviction: When memory is full, Redis refuses any new writes and returns an error. Your application must handle this error gracefully. This is the right choice when Redis is your primary data store and you cannot afford to lose any data. For a pure cache use case, this is wrong, you want Redis to automatically make room rather than start erroring.

allkeys-lru (Least Recently Used): When memory is full, Redis evicts the key that was accessed least recently, across all keys. The logic: if a key has not been accessed in a long time, it is probably not hot data, and it is the least likely to be needed soon. Evicting it is a reasonable trade-off. This is the most commonly used eviction policy for caches.

allkeys-lfu (Least Frequently Used): When memory is full, Redis evicts the key that has been accessed the fewest total times, across all keys. Different from LRU in an important way: LRU might keep a key that was accessed once very recently, while LFU would evict it if it has only ever been accessed once. LFU is better when you have bursts of one-off accesses that should not displace your consistently hot data.

volatile-lru: Same as allkeys-lru, but only evicts keys that have a TTL set. Keys without a TTL are never evicted. Use this when some of your keys are permanent (configuration data, reference data) and should never be evicted, while other keys are cache entries with TTLs that can be.

volatile-ttl: Evicts the key with the shortest TTL first. The logic: if a key was going to expire in 30 seconds anyway, deleting it now is not much of a loss. This is conservative — it avoids evicting long-lived or permanent data.

For a cache use case, allkeys-lru or allkeys-lfu are usually the right defaults. Think of it like a library that has run out of shelf space, it makes sense to put back the books that nobody has touched in the longest time (LRU) or the books that have been checked out the fewest times total (LFU).

Sessions, How “Login” Actually Works at Scale

The Problem HTTP Has By Design

Here is something that confuses many beginners: HTTP is stateless.

Every single HTTP request is completely independent. When your phone sends a request to Swiggy’s server, the server receives it with zero memory of any previous requests from you. From the server’s perspective, every request might as well be from a complete stranger.

This creates an obvious problem. You log in to Swiggy. You navigate to the restaurant page. You add items to your cart. You go to checkout. Each of these actions is a separate HTTP request. How does the server know, across all these separate requests, that they are all from the same person who logged in earlier?

This is the session problem. And Redis solves it elegantly.

How Sessions Work, The Full Story

When you successfully log in to Swiggy, here is exactly what happens behind the scenes.

The server verifies your phone number and OTP. Login is successful. Now the server needs to create a record of your login, your session.

The server generates a unique, random session ID. This is a long string of random characters, something like sess_a7f3k2m9p4q8r1s6t3u7. It is long and random enough that nobody can guess it.

The server creates a session object containing everything it needs to know about you: your user ID, your name, what is in your cart, your saved addresses, when you logged in. This session object gets stored in Redis under the session ID as the key.

HSET session:a7f3k2m9p4q8r1s6t3u7
    user_id 9812
    name "Revathi Rana"
    city "Delhi"
    cart_items "3"
    logged_in_at "2024-08-15T09:30:00"

EXPIRE session:a7f3k2m9p4q8r1s6t3u7 604800    (7 days in seconds)

The server then sends the session ID back to your phone, either as a cookie that your browser will automatically include in every subsequent request, or as a token that your app stores and includes in request headers.

Every subsequent request your phone makes to Swiggy includes this session ID. The server extracts the session ID from the cookie or header, does a HGETALL session:a7f3k2m9p4q8r1s6t3u7 in Redis, and instantly has your complete session data, who you are, what you have in your cart, everything. Under a millisecond. No database query needed.

When you log out, the server does DEL session:a7f3k2m9p4q8r1s6t3u7. Your session is gone. The next request with that session ID will find nothing in Redis and redirect you to login.

Why Redis Is Perfect for Sessions

Think about what sessions need: they need to be fast (checked on literally every API request), they need to expire automatically (nobody wants to be logged in forever, and stale sessions should not accumulate), and the access pattern is always the same (look up a session by its ID, never by content).

This is a textbook key-value use case. Redis handles all three requirements beautifully, microsecond lookups, automatic TTL-based expiry, and simple key-value access pattern.

Sessions and Horizontal Scaling, The Critical Benefit

Here is where sessions in Redis become architecturally important.

Swiggy does not run on one server. They run on dozens or hundreds of application servers behind a load balancer. When you make a request, the load balancer picks any available server to handle it.

If session data were stored inside the memory of the application server, which was actually how some older systems worked, you would have a serious problem. You logged in and your session was created on Server 3. Your next request goes to Server 7 (because the load balancer picked it). Server 7 has no memory of your session. It thinks you are not logged in. You get redirected to the login page. Every. Single. Time. You. Navigate. Anywhere.

This is called the sticky session problem. Old solutions involved configuring the load balancer to always send you to the same server (sticky sessions), which defeats the purpose of having multiple servers.

The Redis solution is clean and simple: session data lives in Redis, a separate service that all application servers connect to. It does not matter which server handles your request, they all look up your session in the same Redis instance. Any server can serve any user. This is called stateless application architecture, and it is how every modern scalable system is built.

OTP Storage, Redis at Its Most Natural

How OTPs Work in Indian Apps

Virtually every Indian app uses OTP (One Time Password) based login. No username, no password, just your phone number and a 6-digit code delivered by SMS.

When you tap “Send OTP,” here is the exact sequence that happens:

Step 1: Your phone sends a request to the server: “Send OTP to +91-9876543210.”

Step 2: The server generates a random 6-digit number. Let us say it generates 482910.

Step 3: The server calls an SMS gateway API, services like MSG91, Twilio, or Exotel, with your phone number and the OTP. The SMS gateway sends the text message to your phone. This is usually done asynchronously so the server does not wait for the SMS to be delivered before responding.

Step 4: The server stores the OTP in Redis:

SET otp:9876543210 482910 EX 600

Key: otp: followed by the phone number. Value: the OTP. TTL: 600 seconds (10 minutes).

Step 5: Your phone receives the SMS. You read the code and type it into the app.

Step 6: Your phone sends a request to the server: “Verify OTP 482910 for +91-9876543210.”

Step 7: The server does GET otp:9876543210. Gets 482910. Compares it with what you submitted. Match! Login successful.

Step 8: The server deletes the OTP: DEL otp:9876543210. This ensures the OTP can only be used once.

If you type a wrong OTP, the server compares, finds a mismatch, and returns an error. The OTP remains in Redis (not deleted) and you can try again.

If 10 minutes pass before you submit the OTP, Redis automatically deletes the key. When the server tries to verify, GET otp:9876543210 returns nothing. The server sees an expired or missing OTP and asks you to request a new one.

Why This Is the Perfect Redis Use Case

OTP storage ticks every box for a Redis use case.

The data is temporary, an OTP should not exist forever. In fact, it must not exist beyond 10 minutes. TTL handles this with zero extra code.

The data is tiny, a 6-digit number is negligible in size.

The access pattern is pure key-value, you always look up an OTP by the phone number. You never need to query “find all OTPs created in the last hour” or “find all OTPs for users in Delhi.” Simple key-value.

Speed matters, users are staring at a loading spinner waiting for OTP verification. Every millisecond counts.

Storing OTPs in a SQL database would technically work, but you would need a scheduled job to clean up expired OTPs (the table would grow forever otherwise), the table would become a hotspot of writes and deletes, and the access pattern is a complete mismatch for a relational database designed for complex queries.

Redis is genuinely the right tool here, not just a fashionable choice.

Rate Limiting (Protecting Your System With Redis)

Why Rate Limiting Exists

Every API has limits on how many requests it can handle. If a single user (or a bot) sends thousands of requests per minute, they can overwhelm your servers, starve other users of resources, scrape your data, or brute-force their way through login forms.

Rate limiting is the practice of detecting excessive request rates and rejecting requests that exceed a defined threshold.

Common examples: a user can only attempt OTP verification 5 times before being locked out for 30 minutes. A free API key gets 100 calls per day. A user can only send one message per 5 seconds to prevent spam. A search endpoint accepts at most 10 requests per second from the same IP.

How Redis Makes Rate Limiting Simple

Redis makes rate limiting remarkably clean to implement, primarily because of one property: atomic operations.

Here is the simplest rate limiting approach, the fixed window counter.

A request comes in from user ID 9812. Before processing it:

current_count = INCR rate:user:9812
if current_count == 1:
    EXPIRE rate:user:9812 60    (set the window to 60 seconds)

if current_count > 100:
    return 429 Too Many Requests

INCR atomically increments the counter. If the key does not exist, Redis creates it at 0 and then increments to 1. You set the TTL only on the first increment to start the 60-second window. Each subsequent request increments the counter. If the counter goes above 100, you reject the request.

After 60 seconds, Redis automatically deletes the key. The window resets. The user gets their next 100 requests.

The critical word is atomic. INCR in Redis is a single, indivisible operation. Even if 50 requests arrive simultaneously from the same user (in a distributed system with multiple application servers), each INCR is processed one at a time by Redis. The counter will correctly reach 50, not get confused by concurrent increments. This is something that is surprisingly difficult to achieve without Redis.

Rate Limiting for Login Protection

Here is how Zepto or Swiggy protect their OTP login from brute force:

attempts = INCR login_attempts:9876543210
if attempts == 1:
    EXPIRE login_attempts:9876543210 1800    (30-minute window)

if attempts > 5:
    return "Too many attempts. Try again in 30 minutes."

Five wrong OTPs and you are locked out for 30 minutes. The lockout expires automatically. No manual unlock needed. No database write. No scheduled job to reset counters.

Rate Limiting for API Tiers

A more sophisticated use: your platform has free and paid tiers.

# Free user: 100 calls per day
daily_count = INCR api:daily:user:9812
if daily_count == 1:
    EXPIRE api:daily:user:9812 86400    (24 hours)

if user.tier == "free" and daily_count > 100:
    return 429 "Upgrade to Pro for more API calls"

if user.tier == "pro" and daily_count > 10000:
    return 429 "Daily limit reached"

This is what every public API, weather APIs, maps APIs, payment APIs, does to enforce usage limits by subscription tier.

Pub/Sub, Real-Time Messaging With Redis

What Is Pub/Sub?

Pub/Sub stands for Publish-Subscribe. It is a messaging pattern where:

Publishers send messages to a named channel without knowing who is listening. Subscribers listen on a channel and receive messages without knowing who sent them.

Redis has this built in. It is simple, fast, and useful for a specific set of problems.

A Real Example, Order Status Updates

You order biryani from Swiggy. Your phone is showing the order tracking screen. As the order status changes, confirmed, being prepared, picked up, out for delivery, your screen updates in real time.

How does this work?

When the restaurant marks the order as “being prepared,” the Swiggy partner app sends an update to Swiggy’s backend. The backend updates the database. It also publishes a message to a Redis channel:

PUBLISH order_updates:order_56789 '{"status": "being_prepared", "eta": "25 mins"}'

Swiggy’s notification service is subscribed to this channel:

SUBSCRIBE order_updates:order_56789

The notification service receives the message instantly and pushes a real-time update to your phone via WebSocket or push notification. Your screen updates without you having to refresh.

This is Pub/Sub. The backend code that updates the order status does not need to know about the notification service. It just publishes to a channel. The notification service does not need to know about the backend, it just listens to the channel. They are completely decoupled.

When to Use Pub/Sub and When Not To

Pub/Sub is perfect for: real-time notifications, live feed updates, broadcasting status changes to multiple interested parties, lightweight coordination between services.

Pub/Sub is not the right tool when: you need messages to persist (if a subscriber is offline when a message is published, it misses the message permanently, Redis Pub/Sub has no message history), you need guaranteed delivery, you need to replay old events, or you have very high message volumes.

For those scenarios, you want Kafka or Redis Streams (a more durable version of Pub/Sub built into Redis). But for simple real-time notification use cases, Redis Pub/Sub is straightforward and effective.

How Redis Scales

Single Node Redis

For most applications, a single Redis instance is more than sufficient. Redis on a single modern server can handle hundreds of thousands of operations per second. If your application is not at a scale where this is a bottleneck, do not over-engineer.

Redis Sentinel, High Availability

The concern with a single Redis instance is the single point of failure. If that one Redis server goes down, your sessions are gone, your cache is empty, your rate limiters are reset.

Redis Sentinel solves this with automatic failover. You run three nodes: one primary (master) and two replicas. The replicas continuously copy all data from the primary.

Sentinel monitors the primary. If the primary goes down, Sentinel automatically promotes one of the replicas to become the new primary. This takes a few seconds. Your application (configured to use Sentinel) automatically discovers the new primary and reconnects.

This is high availability for Redis without the complexity of a full cluster. Suitable for most mid-sized applications.

Redis Cluster, Horizontal Scaling

When your dataset grows beyond what fits in a single server’s RAM, or when you need more write throughput than a single node can provide, you need Redis Cluster.

Redis Cluster splits your data across multiple nodes using a concept called hash slots. Redis has 16,384 hash slots in total. Each node in the cluster is responsible for a range of hash slots. When you write a key, Redis computes a hash of the key, maps it to a hash slot, and routes the write to the node responsible for that slot.

Key "restaurant:4421"
    → hash → slot 12890
    → Node 3 is responsible for slots 10923-16383
    → Write goes to Node 3

Reading works the same way. Any client request goes to the correct node for that key.

Adding a new node to a Redis Cluster redistributes hash slots automatically. You add the node, Redis migrates some hash slots from existing nodes to the new one, and capacity is expanded with no downtime.

Each node in Redis Cluster is itself a replica set (typically 1 primary + 1 replica), providing high availability within each shard.

This is how you scale Redis to handle datasets that do not fit in a single server, the same horizontal scaling principle that makes NoSQL databases scale, applied to your cache layer.

Common Redis Patterns in Indian Apps You Use Daily

Let me walk through specific, real use cases in Indian apps. This is where everything we have discussed becomes concrete.

Swiggy, Driver Location Tracking

Every active delivery partner’s phone sends a GPS location update to Swiggy’s servers every 3-5 seconds. In a city like Bangalore during peak dinner time, there might be 10,000-15,000 active delivery partners. That is potentially 3,000-5,000 location writes per second.

Writing each of these directly to a database would be catastrophically expensive. Databases are not designed for this kind of write volume on data that changes so frequently.

Swiggy uses Redis Sorted Sets for this. The key is something like drivers:zone:bangalore_south. The members are driver IDs. The scores are GPS coordinates encoded as geohashes or as timestamps (for time-ordered queries).

ZADD drivers:zone:bangalore_south 1723124400 "driver:45821"

Every 3-5 seconds, when a driver sends a location update, Redis updates their entry. The real-time map on your screen reads driver positions from Redis. The database (Cassandra, in Swiggy’s case) gets a summary write every 30 seconds, not every 3 seconds. Redis absorbs the high-frequency writes and the database sees a 90% reduction in write volume for this data type.

PhonePe, Financial Data and Caching

PhonePe has a very specific challenge. Transaction history and account balance feel like exactly the kind of data you would want to cache, users check it constantly. But it is also financial data where showing a wrong balance is a serious problem.

PhonePe’s approach (like most payment apps) is nuanced. Recent transaction history is cached with a very short TTL, 30 seconds. This is sufficient for the display screen that most users see. The balance displayed on the home screen is cached similarly.

But when you actually initiate a payment, tap “Pay Now,” enter the amount, the application reads the balance directly from the database, not from cache. That one read is always fresh. The payment itself goes through a fully transactional SQL system with zero caching involved.

And when a payment completes, the cache for both the sender’s and recipient’s accounts is immediately invalidated. Active invalidation for financial data. The next time either person opens their app, they see fresh data from the database.

This is the right balance: cache for the common display case, bypass cache for the critical transaction path, invalidate actively on state changes.

Zomato, Restaurant Pages

Restaurant page data is a classic high-read, low-write use case. Thousands of users view restaurant pages. Restaurant data changes infrequently, maybe a few times per week per restaurant.

Zomato caches the full restaurant object (details, menu, ratings summary) in Redis with a TTL of around 30 minutes to 1 hour.

When a restaurant partner updates their menu, the restaurant’s cache key is immediately deleted, active invalidation. The next user to view the restaurant gets fresh data from the database, which repopulates the cache.

For ratings and reviews, which change more frequently as new reviews come in, a shorter TTL is used. maybe 5-10 minutes. Absolute freshness of rating counts is not critical; a rating of 4.3 vs 4.31 makes no practical difference to users.

IRCTC, Careful Caching of Train Data

IRCTC is interesting because the data sensitivity varies dramatically across different types of information.

Train schedules, station names, train route information, class availability descriptions, all of this is static data that changes at most a few times per year. This is cached aggressively with very long TTLs, sometimes days. There is zero reason to query this from the database per request.

Quota availability, how many seats are available in a given quota (general, tatkal, etc.), this is displayed on the search results page. IRCTC caches this with a short TTL, maybe 1-2 minutes. Users see slightly stale availability counts on the search page, which is acceptable, it is just a display.

Actual seat reservation, this never touches the cache. When you click “Book Now” and proceed to payment, IRCTC queries the authoritative database directly for current availability. The actual reservation goes through a fully transactional SQL system. The irony of IRCTC is that their booking system is actually quite robust precisely because they never use caching for the critical path.

Zepto and Blinkit, Hyperlocal Inventory

10-minute delivery is an interesting system design problem. Inventory is hyperlocal, each dark store has its own stock. The question “Is this item available” depends on which dark store is nearest to you.

When you open Zepto and it loads your home screen showing 50 products, it needs to check availability of all 50 items at your nearest dark store in milliseconds. Making 50 database queries would be too slow.

Zepto uses Redis with a data structure like this: for each dark store, a hash containing product IDs and their current stock counts. When you open the app, a single Redis query fetches the availability status for all products at your nearest store.

HMGET store:nagpur_south product:101 product:205 product:308 ...

When a delivery is completed and inventory changes, the relevant dark store’s Redis data is updated in real-time. The product catalogue details (name, description, images, price) come from MongoDB and are cached separately with longer TTLs. The inventory status is the part that needs to be fresh.

Caching Pitfalls, What Goes Wrong in Production

Pitfall 1, Cache Stampede (Thundering Herd)

Picture this. A very popular cached item, say, the Zomato homepage “trending restaurants in Bangalore”, has a TTL of 5 minutes. At exactly the moment the TTL expires, 5,000 users happen to be loading the homepage simultaneously.

All 5,000 requests check Redis. All 5,000 find the key missing. All 5,000 go to the database. The database receives 5,000 identical complex queries at the exact same moment. This is called a cache stampede or thundering herd, and it can bring a database to its knees.

The classic solution is a mutex lock. When the first request finds a cache miss, it acquires a lock (using SETNX, Set if Not Exists in Redis). While holding the lock, it goes to the database and repopulates the cache. All other concurrent requests see the lock, wait briefly, and then retry the cache, this time finding a hit.

Only one database query runs instead of 5,000. The lock expires automatically (with a TTL in case the lock holder crashes), so it never gets stuck permanently.

A simpler probabilistic solution: add random jitter to TTLs. Instead of everyone’s cache expiring at exactly the same moment, each cached item has a slightly different TTL, e.g., 300 seconds plus a random number between 0 and 30. This spreads out the expiry times and makes simultaneous mass expiry less likely.

Pitfall 2, Caching Null Results (The Missing Product Attack)

A request comes in for product ID 99999, which does not exist in your database. Cache miss. Database query returns nothing. Your application returns a 404 error.

But here is what many systems forget: they do not cache the “not found” result. The next request for product 99999? Another cache miss. Another database query. Another 404.

A malicious actor (or even just a buggy client making repeated requests for non-existent resources) can exploit this to bypass your cache entirely and hammer your database with queries that will never return anything useful.

The solution: cache null results. When a database query returns nothing, store a sentinel value in Redis with a short TTL.

SET product:99999 "NULL" EX 60

The next request for product 99999 hits the cache, sees “NULL,” and returns a 404 without touching the database. The sentinel expires after 60 seconds so legitimate future lookups (in case the product is later added) will get fresh data.

Pitfall 3, Cache Pollution (Filling Redis With Useless Data)

Not all data is equally worth caching. If you cache every single database query result indiscriminately, you fill Redis with data that gets accessed once and never again. This “cold” data evicts “hot” data (the things you actually want cached) to make room. Your cache hit rate drops. Your cache becomes less effective.

The discipline is to cache selectively. Cache items that are:

Expensive to compute (multiple joins, aggregations)
Accessed repeatedly (popular products, trending restaurants, common searches)
Not unique per user (shared across many users, not personalised)

Do not cache items that are:

Cheap to compute (simple primary key lookups)
Unique per user (personalised data that nobody else will request the same version of)
Never requested more than once
Sensitive (financial data, private user data that should not sit in a shared cache)

Pitfall 4, Cold Start

You just upgraded Redis or flushed all data for maintenance. The cache is empty. Every single incoming request is a cache miss. Your database suddenly receives 100% of the traffic that your cache was previously absorbing. A database that was happily handling 5% of requests at 2ms suddenly gets 100% of requests. It slows down. Response times spike. Users notice.

This is called cold start and it is a real production problem, especially for high-traffic systems.

The solution is cache warming, before fully switching traffic to the new cache, pre-populate it with the most popular data. Write a script that fetches the top 1,000 restaurants in each city, the top 500 products by sales volume, the most common search terms, and loads them all into Redis. When real user traffic arrives, the most commonly requested data is already cached.

For very high-traffic systems, some teams do a gradual traffic shift, route 10% of traffic to the new cache while 90% still goes through the warm old system, then gradually increase the percentage as the new cache warms up organically.

Pitfall 5, Forgetting to Handle Redis Failures

Redis is fast, reliable, and battle-tested. But it can still fail, hardware problems, network issues, out-of-memory crashes, deployments. Your application must handle Redis being unavailable.

The wrong approach: let a Redis connection failure crash your entire application. Now your system is completely down because a caching layer had a problem.

The right approach: wrap every Redis operation in a try-catch. If Redis is unavailable, fall back to the database directly. Log the error. Alert the team. But keep serving users.

try:
    cached_data = redis.get(f"restaurant:{restaurant_id}")
    if cached_data:
        return deserialise(cached_data)
except RedisConnectionError:
    log.error("Redis unavailable, falling back to database")

# Redis was unavailable or cache was empty — go to database
data = database.query("SELECT * FROM restaurants WHERE id = %s", restaurant_id)
return data

Your system becomes slower when Redis is down (every request hits the database), but it does not go down completely. This is called graceful degradation and it is an essential property of well-designed systems.

Redis vs Memcached, Which One and Why

Before Redis became the dominant in-memory cache, Memcached was the industry standard. You will still see it mentioned in older tutorials, older codebases, and older architecture docs.

Memcached is extremely simple. It stores string key-value pairs in memory. That is essentially all it does. No lists, no sets, no sorted sets, no pub/sub, no persistence, no clustering beyond basic client-side sharding.

In 2009, when Sanfilippo built Redis, he offered richer data structures, persistence options, replication, pub/sub, and a much more capable cluster mode, all while keeping the in-memory speed that made Memcached popular.

Today, there is almost no situation where you would choose Memcached over Redis. Redis does everything Memcached does and far more. The only argument for Memcached is extreme simplicity, if you want the absolute minimum possible tool, but even then, Redis is not significantly more complex to operate.

If you are starting a new project today, use Redis. If you are maintaining an older codebase with Memcached, it is worth budgeting time to migrate when you have the opportunity. The operational advantages of Redis are significant.

Redis as a Primary Database, The Honest Answer

People sometimes ask: “Can I just use Redis for everything and skip the SQL database?“

Technically yes. Redis has persistence options that make it more than just a volatile in-memory store.

RDB snapshots, Redis periodically dumps the entire in-memory dataset to disk as a binary snapshot file. If Redis restarts, it loads this snapshot. The risk: any writes since the last snapshot are lost if Redis crashes.

AOF (Append-Only File), every write command is appended to a log file on disk as it happens. On restart, Redis replays this log to reconstruct the dataset. You can configure AOF to sync to disk every second or on every write. With every-write syncing, you get durability close to what SQL databases offer.

But even with persistence, using Redis as your primary database for most applications is the wrong choice:

RAM is expensive. Much more expensive per gigabyte than disk. A 1 TB dataset in Redis costs dramatically more than a 1 TB dataset in PostgreSQL.

Redis lacks relational query capabilities. No JOINs, no GROUP BY, no complex filtering. For data with relationships (users, orders, products, payments), you would be reimplementing half of SQL in your application code.

SQL databases have stronger durability guarantees by default. WAL-based durability in PostgreSQL is extremely robust.

The right mental model, which I will say one more time because it is that important: Redis is the speed layer. PostgreSQL (or MySQL) is the source of truth. They work together. Redis in front of your database reduces load and latency. Your database holds the authoritative, durable, queryable record of everything. Do not try to replace one with the other.

Frequently Asked Questions

Q: What cache TTL should I use?

Ask: how often does this data change, and how bad is it for the user to see stale data? Restaurant menu → 30-60 minutes. Product prices on sale day → 30-60 seconds with active invalidation. Order status → 10-30 seconds or active invalidation. OTP → exactly 10 minutes. Static reference data → hours or days. There is no universal answer. It depends entirely on your data.

Q: What happens if Redis goes down?

Your application should fall back to the database directly. Wrap Redis calls in try-catch, handle connection errors gracefully, and serve requests from the database with higher latency. The system slows down but does not crash. Also, run Redis with at least one replica and use Sentinel for automatic failover, so Redis going down becomes a rare event.

Q: My cache hit rate is only 40%. What can I do?

Either your TTLs are too short (increase them if staleness is acceptable), or you are caching items that are not requested frequently enough (focus caching on truly hot data). Also check if your cache key design is correct, a small bug in key generation can cause cache misses even for the same logical data.

Q: Should I cache personalised data?

Generally no, or very carefully. If the cached value is different for every user, caching it per user means storing N copies of similar data. This works if the computation is expensive and the data is accessed frequently per user, but most personalised data (recommendations, feed rankings) changes too often and is too unique per user to cache effectively at the application level.

Q: Is Redis Pub/Sub reliable enough for production?

For lightweight real-time notifications and non-critical messaging, yes. For critical messages that must be delivered even if the subscriber is offline, no! use Kafka or Redis Streams instead. Know the limitations and choose accordingly.

The Mental Model You Should Carry

After everything in this blog, here is the clean mental model I want you to leave with.

Your database is the truth. It is durable, consistent, and authoritative. Everything important lives there permanently.

Redis is the speed layer. It sits in front of your database and absorbs the traffic that would otherwise crush it. When the same question gets asked thousands of times, Redis answers it from memory after the first time. Your database only handles the genuinely new questions.

Redis is also your session store, the shared memory that makes stateless, horizontally scaled application servers possible. It is your OTP vault, your rate limiter, your real-time messaging backbone. It plays multiple roles because it is fast, simple, and flexible.

Use it thoughtfully. Cache what is expensive and frequently requested. Set TTLs that match how often the data changes. Add active invalidation for data where correctness matters. Handle Redis failures gracefully. Monitor your cache hit rate and act when it drops.

The teams that use Redis well do not just add it and forget it. They think carefully about what to cache, for how long, and how to handle the edge cases. That thinking is what separates a system that works from a system that scales.

Key Takeaways From This Blog

Caching solves the problem of redundant computation, compute once, remember the result, serve from memory for all subsequent requests.

Redis stores data in RAM, making it 10-50x faster than disk-based databases. This speed is the foundation of everything Redis is used for.

Cache-Aside (Lazy Loading) is the most common caching pattern. Check cache first, go to database on miss, populate cache, serve result.

Cache hit rate is the key metric. A 95% hit rate means your database handles 5% of actual request volume. That is the goal.

TTL controls maximum staleness. Short TTL means fresher data but lower hit rates. Long TTL means higher hit rates but staler data. Choose based on how critical freshness is for each piece of data.

Cache invalidation is genuinely hard. Use TTL as your baseline. Add active invalidation for high-stakes data. Understand the race conditions and design accordingly.

Sessions work by storing a session object in Redis under a random key, giving that key to the client as a cookie or token, and looking it up on every request. This enables stateless application servers that can scale horizontally.

OTP storage is the perfect Redis use case, temporary, tiny data that must expire automatically, accessed by a simple key.

Rate limiting uses atomic Redis counters with TTLs to track request frequency and enforce limits without race conditions.

Pub/Sub enables lightweight real-time messaging between services. Use it for notifications; use Kafka for critical, high-volume event streaming.

Redis eviction policies determine what gets deleted when memory is full. LRU is the right default for most cache use cases.

Production pitfalls, stampedes, null caching, cold starts, cache pollution, Redis failures, are real. Know them before they surprise you in production.

Redis is the speed layer. Your SQL database is the source of truth. They are partners, not alternatives.

What Is Coming in Part 6

The next blog covers:

Microservices, what they actually are versus what a monolith is, when it genuinely makes sense to break a system apart, how services communicate with each other, and why most startups should not start with microservices no matter what Twitter says.

Message Queues and Kafka, asynchronous communication, when you need a queue between services, how Kafka works internally, and why companies like Swiggy and Ola use it for their real-time systems.

Content Delivery Networks (CDNs), how Cloudflare and Akamai serve static assets from servers geographically close to your users, reducing latency dramatically, and when you actually need one.

These three are where system design gets into the most interesting debates. Engineers at the same company disagree strongly about these topics all the time. Understanding them deeply will make you a genuinely better engineer, not just a better interview candidate.

Drop a comment if anything here raised more questions than it answered. I read every single one.

Follow for Part 6.