Simplified System Design: CAP Theorem & Database Scaling

You know that feeling when you are watching a cricket match and the commentator says “India needs 47 runs off 18 balls, run rate required is 15.67”?

Nobody sat down with a calculator for that. They did rough maths. Fast maths. Good-enough maths to understand the situation and make a decision in real time.

That is exactly what back-of-envelope estimation is in system design.

And honestly? This skill alone, the ability to sit down before writing a single line of code and figure out how big your problem actually is, is what separates engineers who just write features from engineers who actually build systems.

I remember when I was starting out, I used to just jump into code. Someone would say “build a feature that handles a lot of users” and I would start coding immediately. It took me embarrassingly long to realise that “a lot” means completely different things. A lot for a college project is 50 users. A lot for Swiggy is 5 million concurrent users at dinner time. If you do not estimate first, you will design the wrong system every single time.

This blog covers three things that are absolutely core to system design, back-of-envelope estimation, the CAP theorem, and database scaling. These three topics come up in almost every serious system design interview. But more importantly, they come up in real engineering decisions at real companies every single week.

By the end of this blog, you will not just know what these things are. You will understand why they exist, when they apply, and how to think about them from scratch, even if you are reading about them for the very first time.

Let’s go.

Quick Recap of Part 2 – Where We Left Off

In Part 2 of this series, we covered servers, latency, throughput, vertical scaling, horizontal scaling, load balancers, and auto scaling. We understood that when traffic grows beyond what one machine can handle, you add more machines behind a load balancer. And when traffic spikes unpredictably, auto scaling kicks in and adds or removes those machines automatically.

If you have not read Part 2 yet, I strongly recommend going through it before continuing here. The concepts build on each other.

Now, in Part 2 we kept saying “add more servers when needed.” But how do you know how many servers to add? How do you know when? How do you even know if 5 servers is enough or you need 50?

That is what estimation answers. And that is where we start.

Back-of-Envelope Estimation

Before building anything, a good engineer pauses for a minute and thinks, how big is this actually going to be?

Are we dealing with a few hundred users or millions? Will the system get a few requests per second or thousands? How much data are we talking about every day?

These rough answers help you quickly understand the scale and make better design decisions.

What Even Is This?

Before designing any system, before drawing any architecture diagram, before picking any database, a good engineer does a rough calculation. How big is this thing? How much load will it take? How much data will it generate?

This is called back-of-envelope estimation. The name is literal, engineers in the old days used to scribble these calculations on the back of a paper envelope during meetings. Quick, rough, directional. Good enough to make a smart decision.

In a system design interview, you are typically expected to spend about 5 minutes on this section. No more. The interviewer is not checking whether your numbers are exactly right. They are checking whether you can think at scale. Whether you instinctively know the difference between a system that gets 100 requests per second versus 1 million requests per second, and what that difference means for the design.

The One Table You Must Have in Your Head

This is non-negotiable. Every senior engineer I have met can recall this instantly. You need to know it the same way you know 2 + 2 = 4.

Power of 2	Approximate Value	Power of 10	Storage Name	Short Form
2¹⁰	1 Thousand	10³	1 Kilobyte	1 KB
2²⁰	1 Million	10⁶	1 Megabyte	1 MB
2³⁰	1 Billion	10⁹	1 Gigabyte	1 GB
2⁴⁰	1 Trillion	10¹²	1 Terabyte	1 TB
2⁵⁰	1 Quadrillion	10¹⁵	1 Petabyte	1 PB

The three you will use in literally every single estimation, 1 thousand = 10³, 1 million = 10⁶, 1 billion = 10⁹. Those three. Memorise them like your own phone number.

Also remember this one: there are 86,400 seconds in a day. You will divide by this constantly when converting daily totals into per-second rates.

Three Things to Always Estimate

When I do estimation in interviews or in real design discussions, I focus on exactly three things. Nothing more.

Load estimation, how many reads and writes happen per second or per day?

Storage estimation, how much disk space does this system consume per day? Per year? Over 5 years?

Resource estimation, how many servers and CPU cores do we actually need to handle this load?

Let us do all three using Twitter as the example. Twitter is the most popular case study in system design for a reason, it is both read-heavy and write-heavy, the numbers are interesting, and everyone understands what Twitter does.

Load Estimation, How Many Requests Are We Talking About?

The very first thing you do is establish your anchor number, DAU, or Daily Active Users. Everything else flows from this.

In an interview, you state your assumption clearly and confidently. “I will assume Twitter has 100 million daily active users.”

Now make two more assumptions. These are completely reasonable guesses based on typical social media behaviour:

Each user posts 10 tweets per day
Each user reads 1000 tweets per day (scrolling the feed, opening threads)

Calculate writes per day:

100 million users × 10 tweets per user = 1 billion writes per day

Calculate reads per day:

100 million users × 1000 tweets per user = 100 billion reads per day

Now convert reads to per second. Divide by 86,400:

100 billion ÷ 86,400 seconds ≈ 1.15 million reads per second

Stop and look at what just happened. Reads are 100 times more than writes. This is a massively read-heavy system. And that one insight, that reads dominate writes by 100:1, is going to shape your entire architecture. You will build everything around serving reads efficiently. Read replicas. Caching. CDNs. We will get into all of this in the database scaling section.

Think of load estimation like calculating match run rate in cricket. You are not computing the exact figure, you are building a picture. 1.15 million reads per second tells you “this is IPL final crowd traffic, not a colony gully match.” That context shapes every single decision after it.

Storage Estimation, How Much Disk?

A tweet has two components, the text of the tweet itself, and sometimes an attached photo or video. We estimate them separately.

Assumptions:

A tweet is 200 characters of text
One character takes 2 bytes of storage
10 percent of tweets have a photo
Each photo is 2 MB

Size of one tweet (text only):

200 characters × 2 bytes = 400 bytes ≈ 500 bytes
(Always round up. Buffer is your friend.)

Total tweets per day = 1 billion (from load estimation)

Tweets with photos = 10% of 1 billion = 100 million tweets with photos

Now calculate total storage needed per day:

Text storage = 500 bytes × 1 billion tweets = 500 GB
Photo storage = 2 MB × 100 million tweets = 200 TB
Total = 500 GB + 200 TB ≈ 1 TB + 1 PB ≈ 1 PB per day

Wait! why did I drop the 500 GB and even the 1 TB? Because when one number is 1000 times larger than another, the smaller one is essentially noise. 1 TB versus 1 PB, 1 TB does not meaningfully change your design decisions, so you round it away.

So Twitter needs roughly 1 Petabyte of new storage every single day. Over 5 years that is 1 PB × 365 × 5 = roughly 1,825 PB. Close to 2 Exabytes.

Now you understand why these companies build their own data centres with thousands of custom servers. No off-the-shelf cloud plan is designed for that.

A rule I always follow: When two numbers differ by 1000x or more, ignore the smaller one. It keeps your math fast and your estimates clean without losing directional accuracy.

Resource Estimation, How Many Actual Servers Do We Need?

This is where estimation gets really satisfying, because you end up with a specific, actionable number.

Assumptions:

Our system receives 10,000 requests per second
Each request takes 10 ms of CPU time to process

Total CPU time consumed per second:

10,000 requests/sec × 10 ms = 100,000 ms of CPU time needed per second

One CPU core can provide 1,000 ms of active processing per second. So:

CPU cores needed = 100,000 ms ÷ 1,000 ms per core = 100 cores

If each server has 4 CPU cores:

Servers needed = 100 cores ÷ 4 cores per server = 25 servers

And there you have it. Put a load balancer in front of 25 servers. That is a real, specific, defensible answer, not a guess. It came from a chain of reasoning.

Why This Skill Matters Even Outside Interviews

Let me tell you why I actually care about this skill beyond just cracking interviews.

When you are building something real, a side project that might go viral, a startup product, a feature at your company, you make infrastructure decisions. How many servers to provision. Which database tier to pick. How much storage to buy upfront.

If you guess randomly, you either overpay massively for resources you do not need, or you underprovision and crash the moment real traffic hits. Both are embarrassing and expensive.

Back-of-envelope estimation takes 5 minutes and gives you a solid starting point for every decision. It is the engineering equivalent of measuring twice before cutting.

CAP Theorem

Okay, this is the one that confuses people the most. I have seen developers with 3-4 years of experience who still cannot explain this clearly. So let me take it slow.

First, What Is a Distributed System?

When a single database cannot handle the load, you do what we discussed in CAP Theorem, you add more servers. In the context of databases, adding more servers and splitting or replicating data across them is called building a distributed system.

In a distributed database, the same data lives on multiple servers called nodes, often spread across different physical locations or cities.

Imagine Zomato’s database. There is a node in Mumbai, a node in Bangalore, and a node in Delhi. All three nodes store the same restaurant and user data. When a user in Mumbai updates their delivery address, that change gets automatically copied over, or replicated, to the Bangalore and Delhi nodes too. This is called data replication.

Why do this?

Because now users in Bangalore get fast responses from the Bangalore node instead of waiting for a round trip to Mumbai. And if the Mumbai node crashes, Bangalore and Delhi still serve traffic. The system is faster and more resilient.

Sounds perfect, right?

Well. Here is where it gets interesting.

The Moment Things Break Down

Everything works beautifully when all nodes can freely talk to each other. But what happens when they cannot?

Imagine a network cable gets cut between Bangalore and the rest of the country. The Bangalore node is now completely isolated, it cannot send or receive data from Mumbai or Delhi. This is called a network partition. And it is not a theoretical scenario. In large distributed systems, network partitions happen. Cables get cut. Routers fail. Data centres lose connectivity. It happens.

Now your system has to make a decision. A hard one.

Choice 1: Keep serving users. The Bangalore node keeps responding to requests. But since it has been cut off, it has not received any updates from Mumbai or Delhi for the past few minutes. So a user in Bangalore might see old data, an outdated menu, a price that changed yesterday. Meanwhile, a user in Delhi sees the correct, latest data. Two users on the same platform, same moment, different data.

The system was available, it kept responding. But it was not consistent, different users saw different data.

Choice 2: Stop serving users until the partition heals. The Bangalore node detects it is out of sync and refuses to respond to requests. Users in Bangalore get a “service unavailable” error. But when the network comes back, every node syncs up and all users see the exact same, correct data.

The system was consistent, data was always correct. But it was not available, it rejected requests during the outage.

You cannot have both at the same time when the network breaks. You have to choose.

This is the CAP theorem.

The Three Letters Explained

C – Consistency

Every single read request, no matter which node handles it, returns the most recent data. All nodes are in perfect sync at all times. If you just posted a tweet and your friend refreshes Twitter, they see it instantly. No delay, no stale data, no “wait a minute and refresh.”

Think of it like a shared Google Doc. When you type something, everyone else who has the document open sees it immediately. Same version, always, for everyone.

A – Availability

The system always responds to every request. It might not return the absolute latest data, it might be a few seconds or minutes behind, but it never returns an error. It never says “I am not available right now.”

Think of WhatsApp. Even when their servers have issues, you can still open the app, see your messages, try to send something. The app is available. Messages might be delivered late, but the system is up.

P – Partition Tolerance

The system continues to function correctly even when the network between nodes fails and they cannot communicate. When a cable gets cut or a data centre loses connectivity, the system does not completely collapse.

What the Theorem Actually Says

In a distributed system, you can only guarantee two of these three properties at the same time. All three together is mathematically impossible.

The valid combinations are:

CP – Consistent and Partition Tolerant. The system stays consistent even during network partitions, but it sacrifices availability. When a partition happens, some nodes refuse to serve requests until they resync. Users might get errors temporarily, but they never get wrong data.

AP – Available and Partition Tolerant. The system stays up and keeps responding even during network partitions, but it sacrifices consistency. Different nodes might return different data until they resync. Users always get a response, but it might be slightly stale.

CA – Consistent and Available, but not Partition Tolerant. This means assuming the network never fails. In a single-server database, this is fine. But in any real distributed system at scale, network failures are inevitable. So CA is basically not a real option for distributed systems.

Since you are building a distributed system precisely because your load is too high for one server, partition tolerance is not optional. P is always in your design. So the real decision you are making, always, is: CP or AP?

Why Can You Not Just Have All Three?

Let me walk through this one more time with a concrete story because I know this is the part that makes people go “but why though.”

Three nodes: A in Mumbai, B in Bangalore, C in Delhi. All three have the same data. Network partition happens, B gets cut off from A and C.

If you pick AP (keep serving, sacrifice consistency): B keeps answering requests. A user in Bangalore queries B and gets restaurant data, but B has not received the past 10 minutes of updates. The user sees yesterday’s menu. A user in Delhi queries A and sees today’s correct menu. Same platform, same restaurant, two different menus showing to two different users. You stayed available. But your data is inconsistent.

If you pick CP (sacrifice availability, keep consistent): B detects it cannot sync with A and C. B shuts down request handling. A user in Bangalore queries B and gets “Service temporarily unavailable.” Frustrating. But when the network heals and B reconnects, everything syncs perfectly. Every user across all nodes sees identical, correct data. You were consistent. But you lost availability for a period.

There is genuinely no way out of this. The network partition forces the choice. This is why the theorem exists, not as a rule someone made up, but as a mathematical reality of distributed systems.

So Which One Do You Pick: CP or AP?

This is the practical question. And the answer is not “it depends” and walking away. The answer is “ask what the cost of wrong data is.”

Choose CP: Consistency, when wrong data causes real, serious damage:

Banking and payments. If your UPI transaction shows deducted from your account on one device but not the other, that is not a bug, that is a catastrophic failure. HDFC, PhonePe, Razorpay, Paytm, all of these run on consistent databases. They will show you an error before showing you a wrong balance.

Stock trading apps like Zerodha or Groww. A wrong stock price shown to two users at the same time can lead to real financial loss. Consistency is non-negotiable.

Inventory management. If two users both see “1 item left in stock” and both buy it successfully, you have just sold the same physical item twice. Now you have to call one customer and say the order is cancelled. That is a terrible experience and a real business problem. Flipkart’s inventory system absolutely has to be consistent.

Hotel and flight booking. If IndiGo shows the same seat as available to two users simultaneously and both book it, that is a double-booking crisis.

Choose AP: Availability, when slightly stale data is completely fine:

Instagram or Twitter feed. If your feed is 30 seconds behind because of a temporary sync issue, nobody notices. Nobody cares. The experience is not meaningfully worse.

YouTube view counts. The counter showing 1,200,000 instead of 1,200,432 affects exactly zero users’ lives.

Swiggy restaurant listings. If a restaurant’s menu takes 2 minutes to reflect the updated price of a dish, customers are not harmed.

Hotstar streaming metadata. If the episode count on a show is slightly off for a few seconds while nodes sync, viewers do not notice or care.

The one question that makes this simple: What is the absolute worst thing that happens if two users see different data for 30 seconds right now?

If the answer involves money, legal issues, double-booking, overselling, or broken trust in financial data, choose CP.

If the answer is “one user sees a like count of 4,200 and another sees 4,198”, choose AP.

System	CP or AP	Why
Bank account balance	CP	Wrong balance = financial disaster
PhonePe / Razorpay	CP	Payment cannot be inconsistent
Zerodha stock prices	CP	Wrong price = real money loss
Flipkart inventory	CP	Overselling = cancellations, bad UX
Instagram feed	AP	Stale post for 2 seconds = fine
Zomato restaurant list	AP	Slightly old menu = acceptable
YouTube view count	AP	Off by a few = nobody cares
Hotstar metadata	AP	Sync delay = totally fine

Real databases by type: CP systems: HBase, Zookeeper, MongoDB with strong consistency, MySQL with synchronous replication. AP systems: Cassandra, DynamoDB, CouchDB, Redis in most configurations.

Once this clicks, you will find yourself naturally thinking about CP versus AP every time you pick a database or design a data flow. It becomes instinct.

Database Scaling

Alright, this is the section I personally find most satisfying to explain. Because unlike estimation (which is planning) and CAP theorem (which is theory), database scaling is something you will actually do at your job. Probably multiple times.

Here is the scenario.

Your product launched 8 months ago. Things are going great. Users are growing. Then one morning you open your monitoring dashboard and see database CPU sitting at 85%. Queries that used to take 12 ms are now taking 400 ms. Users are complaining that the app feels slow. Your team is stressed.

What do you do?

The answer is not “immediately shard the database.” The answer is to go through a specific, ordered set of solutions, starting with the simplest ones, and only move to more complex solutions when simpler ones stop working.

I think of this as a ladder. You climb one rung at a time. You do not jump to the top floor just because you heard Netflix lives there.

The golden rule of database scaling: scale only as much as your current problem requires.

A system that is simple is easier to debug, cheaper to run, faster to develop on, and much easier to hire for. Complexity is a cost. Pay it only when forced to.

Your Starting State, One App, One Database

When you launch, you have one application server and one database server. Clean, simple, works perfectly for thousands of users. Your app fires a query, the database returns the result, your user sees the data.

As users grow, queries slow down. CPU climbs. What next? You go up the ladder.

Rung 1: Vertical Scaling (Always, Always Try This First)

I cannot stress this enough. Before you touch your architecture, before you add any new systems, before you write any new routing code, just give your database server a bigger machine.

More RAM. Faster CPU cores. Faster NVMe SSD storage instead of regular spinning disk.

More RAM means the database can hold more of your hot data in memory and serve queries without going to disk at all. Disk reads are thousands of times slower than memory reads. This alone can be a 10x improvement in some cases.

Faster CPU means queries compute results faster. More cores means more queries can run in parallel.

This sounds almost too simple. But it works. Genuinely works. Many companies you have heard of run comfortably on a single very powerful database server for a very long time. Basecamp famously runs a major portion of their infrastructure on just a handful of powerful servers while serving millions of users.

No architecture changes. No new systems to learn. No new failure modes. Just a bigger machine.

Only move to the next rungs when you have genuinely hit the ceiling of what vertical scaling can give you.

Rung 2, Indexing (The Most Underrated Optimisation)

Before changing your architecture at all, look at your slow queries. Are they using indexes? Nine times out of ten, when a database starts getting slow, the first fix is not adding servers, it is adding the right index.

What is a full table scan?

Without an index, every query the database runs has to read every single row in the table until it finds what it needs. If your users table has 50 million rows and you run:

SELECT * FROM users WHERE id = 38291042;

The database reads row 1, checks if id = 38291042, nope. Row 2, checks, nope. Row 3, checks, nope. All the way until it hits row 38,291,042. That is O(N), linear time. As your table grows, this query gets proportionally slower.

What does an index do?

An index builds a separate data structure called a B-tree on the indexed column. A B-tree is a sorted, hierarchical structure that allows binary-search-style lookups. Instead of reading every row, the database jumps directly to the right part of the tree and finds your record in O(log N) time.

For 50 million rows, O(N) means up to 50 million comparisons. O(log N) means about 26 comparisons. That is not a 2x or 5x improvement, that is thousands of times faster.

And adding an index is literally one line of SQL:

CREATE INDEX idx_users_id ON users(id);

That is it. PostgreSQL, MySQL, any relational database will build and maintain the B-tree for you automatically. Every subsequent query on that column uses the index without any changes to your application code.

The real-world analogy I love for this:

Imagine a school library where 50,000 books are arranged in the order they were donated, completely random. You want a book by a specific author. You have to check every single book one by one. That is a full table scan.

Now imagine the librarian creates an author index, a small booklet with every author’s name listed alphabetically, with the shelf number next to each name. You walk in, open the booklet, find the author in seconds, go to that shelf, get the book. The main collection did not change at all. But finding things became dramatically faster. That is indexing.

One important caveat: Indexes slow down writes slightly. Every time you INSERT, UPDATE, or DELETE a row, the database must also update all the B-trees for all indexes on that table. So do not index every column blindly. Only index the columns that your slowest, most frequent queries filter on. Typically that means your primary keys, foreign keys, and the columns used in your WHERE clauses.

Rung 3: Partitioning (Splitting the Big Table Into Smaller Ones)

After a while, even with perfect indexes, extremely large tables start showing performance issues. Why? Because the index itself gets very large. Searching a 500 MB index file is slower than searching a 50 MB index file. The index is helping, but the index is also getting heavy.

Partitioning solves this by splitting one large table into multiple smaller tables, all within the same database server.

For example, your users table has 30 million rows. You partition it into three:

users_partition_1: User IDs 1 to 10,000,000
users_partition_2: User IDs 10,000,001 to 20,000,000
users_partition_3: User IDs 20,000,001 to 30,000,000

Each partition has its own smaller index. Finding user ID 38291042 now means searching only one partition’s index, which is one-third the size of the original index.

The beautiful thing about partitioning in PostgreSQL is that your application code does not need to change at all. You still write:

SELECT * FROM users WHERE id = 38291042;

PostgreSQL figures out which partition that ID belongs to and queries only that partition. This is completely invisible to your application.

The analogy: Think of a post office that handles mail for all of India. Everything in one giant room is chaos, finding a letter for Delhi takes forever because you are searching through mail for Kerala, Assam, Punjab, everyone. Now they split it into zones, North India room, South India room, East India room. Same post office, same building. But now finding Delhi mail means searching only the North India room. Three times faster, minimum.

Rung 4: Master-Slave Architecture (When Reads Are the Bottleneck)

You have one database handling every single read and write. As scale grows, this becomes your bottleneck. And in almost every real product, reads are far more frequent than writes.

Remember Twitter’s numbers? 100 billion reads per day versus 1 billion writes per day. That is a 100:1 ratio. Reddit is similar. Instagram is similar. Any platform where people consume content far more than they create it will have this pattern.

The solution is master-slave architecture, separating your read traffic from your write traffic across different database nodes.

Here is how it works:

The Master node handles all write operations, every INSERT, UPDATE, and DELETE goes here. There is only one master. It is the single source of truth for all data changes.

The Slave nodes are copies of the master. Every read query, every SELECT, goes to a slave node. You can have as many slave nodes as you need. 3, 5, 10, however many your read traffic requires.

When you write data to the master, it automatically replicates those changes to all slave nodes. This replication usually happens asynchronously, meaning slaves receive the updates a fraction of a second after the master processes them. In some configurations, it can be synchronous, master waits for slaves to confirm before acknowledging the write.

Your load balancer sits in the middle. Write requests go to master. Read requests go to the least-busy slave.

Real example from India: Think of the Hindustan Times or Times of India website. At any given moment, lakhs of users are reading articles. But only a few dozen journalists are writing and publishing new articles. The ratio of readers to writers is enormous. With master-slave, you add 10 slave nodes and multiply your read capacity 10x. Writers still go through one clean master. The system scales beautifully.

When does this make sense? Any time reads dominate writes. News sites, social media feeds, e-commerce product catalogues, restaurant listing pages, blog platforms, all read-heavy, all good candidates for master-slave.

One thing to be aware of: Because replication is usually asynchronous, there is a tiny lag between when data is written to master and when it appears on slaves. Usually this is milliseconds. For most read-heavy use cases, this is completely fine. But for use cases where a user writes something and immediately expects to read it back (like submitting a form and seeing the confirmation), you should route that specific read to the master node to avoid the replication lag.

Rung 5: Multi-Master Architecture (When Writes Also Bottleneck)

So you have master-slave running smoothly. Reads are distributed across many slaves. But now your write traffic is also growing. The single master is getting hammered. CPU on master is climbing. Writes are queueing up.

This is when you consider multi-master architecture, multiple nodes that each accept writes simultaneously.

A very practical pattern in India is geographic multi-master. One master for North India, one for South India. All write requests from users in Delhi, Lucknow, and Chandigarh go to the North master. All write requests from Bangalore, Chennai, and Hyderabad go to the South master. The two masters periodically sync their data with each other.

The benefit: write load is now distributed geographically. Each master handles only half the total write traffic. And users get lower write latency because their data goes to a closer server.

The hard problem, conflict resolution.

What if a user’s record gets updated in both masters at nearly the same millisecond? Both updates were legitimate writes. But now you have two different versions of the same record in two different masters. Which one wins?

You have to define this in your application logic. Some common strategies:

Last write wins: whichever update has the later timestamp is kept, the other is discarded. Simple but lossy.
Merge: try to combine both changes intelligently based on the fields that changed. Complex but less lossy.
Application decides: surface the conflict to the application layer and let business logic resolve it. Most flexible but most work.

There is no universal correct answer. It depends entirely on what data you are storing and what your business rules are.

Multi-master adds serious operational complexity. Use it only when a single master is genuinely the verified bottleneck and you have measured it carefully. Do not guess your way into this architecture.

Rung 6: Database Sharding (The Last Resort and I Mean It)

Sharding is the final and most powerful technique. And the most painful to work with.

Here is the core idea: in partitioning (Rung 3), you split a large table into smaller tables within the same server. In sharding, you take those smaller chunks and put them on completely separate servers.

Each server is called a shard. Each shard holds only a portion of your total data.

Example:

Shard 1 → User IDs 1 to 10 million, hosted on Server A
Shard 2 → User IDs 10 million to 20 million, hosted on Server B
Shard 3 → User IDs 20 million to 30 million, hosted on Server C

Now here is the critical difference from partitioning: the database does not handle routing for you. Your application code must know which shard to talk to for every single read and every single write.

If a request comes in for user ID 15,432,218 – your app code must determine “that falls in the 10M to 20M range, so query Shard 2 on Server B.” That logic has to be written, tested, maintained, and updated every time your sharding configuration changes.

The Four Sharding Strategies

Range-based sharding, divide data by value ranges of the sharding key. Shard 1 gets IDs 1-10M, Shard 2 gets 10M-20M. Simple to understand and reason about. Problem: if one range becomes much busier than others, called a hot shard, that shard gets overloaded while others sit idle. Uneven load distribution.

Hash-based sharding, apply a hash function to the sharding key: HASH(user_id) % number_of_shards. This distributes data mathematically evenly across all shards. Problem: if you add a new shard later, the hash function changes, all the values redistribute, and you need to move huge amounts of existing data from one shard to another. This data migration on a live production system is extremely painful and high-risk.

Geographic sharding, put users from one region on one shard and users from another region on another shard. Indian users on Shard India, US users on Shard US. Reduces latency because data is physically close to the users who access it. Problem: some regions generate far more traffic than others, so shards are uneven. Also, any feature that needs to join data across regions requires cross-shard queries, which are expensive.

Directory-based sharding, maintain a dedicated lookup table (a directory) that maps each user ID or range to a specific shard ID. When a query comes in, first check the directory to find the right shard, then query that shard. Very flexible, you can reassign data to different shards without changing any hash function. Problem: the directory service itself becomes a critical component. If it goes down, no queries can be routed. It is a single point of failure and a potential bottleneck.

Why Sharding is So Painful, The Real Talk

I want to be direct here because a lot of tutorials romanticise sharding without being honest about the cost.

You write routing logic yourself. Every read, every write, every batch job, every background task in your application must correctly determine which shard to hit. This means extra code, extra test coverage, extra things to break, extra edge cases to handle. And when you add or remove a shard, you update this logic everywhere.

Cross-shard joins are a nightmare. In a normal single-database setup, joining two tables is trivial, one SQL query, the database handles it. In a sharded system, if your user data is on Shard 1 and that user’s order history is on Shard 3, which is on a completely different server, joining them means pulling data from two servers over the network and combining them in your application code. This is slow, complex, and easy to get wrong.

Adding or removing shards requires data migration. Moving records from one shard to another while the system is live, without losing data and without causing downtime, is a genuinely hard engineering problem. Teams at large companies dedicate weeks or months to shard migrations.

Consistency gets harder. With data spread across multiple servers, maintaining transactions that span multiple shards requires distributed transaction protocols, which add significant complexity and latency.

Sharding is what companies like Google, Facebook, and Twitter use, because at their scale, they have no other choice. But they also have entire teams dedicated to managing shard infrastructure. For most products, you will never need to shard. Use this only when you have genuinely exhausted every other option and your data truly cannot be served by a single machine cluster.

The Complete Decision Guide, Which Rung Do You Need?

Symptom	Solution	Complexity
Queries are slow	Add indexes to the right columns	Very low
Table is massive, indexes feel heavy	Partition the large table	Low
Too many read requests	Master-Slave, add read replicas	Medium
Write bottleneck on master	Multi-Master setup	High
Data cannot fit in one machine	Sharding, truly last resort	Very high
Everything above is failing	Reconsider the data model, call for help	—

And always! always, try vertical scaling before any of the above. It is the fastest, safest, and often most effective first response.

Frequently Asked Questions

These are the questions I get asked most often about these topics, worth addressing directly.

Q: What is back-of-envelope estimation in simple words?

It is the process of doing quick, rough calculations to understand the scale of a system before designing it. Think of it like doing a quick budget estimate before committing to a big project, not exact, but close enough to make smart decisions.

Q: What is the CAP theorem in simple words?

In a distributed database system, when the network between servers fails, you are forced to choose between two things, either all servers return the same consistent data, or all servers stay available and keep responding. You cannot have both at that moment. CAP theorem is just the formal statement of this reality.

Q: When should I use sharding?

Only when your data volume has grown beyond what a single server can handle, even after vertical scaling, indexing, partitioning, and master-slave replication are all in place. For most applications, that means you are probably at tens of millions of active users with terabytes of data. If you are not there yet, you almost certainly do not need sharding.

Q: What is the difference between partitioning and sharding?

Partitioning splits a large table into smaller tables within the same database server. The database handles routing transparently, your app code does not change. Sharding splits data across completely separate servers. Your application code must handle routing manually. Same concept, very different complexity levels.

Q: Should I always choose AP over CP for better user experience?

Not necessarily. User experience suffers badly when users see wrong data, wrong balances, oversold items, double bookings. In those cases, CP is the better user experience even if it means occasional “service unavailable” messages. Choose based on what the cost of inconsistency is, not just what keeps the system “up.”

Key Takeaways From This Blog

Back-of-envelope estimation is a habit, not a formula. Memorise the storage units table, always start with DAU, estimate load then storage then resources in that order, and round aggressively. Directional accuracy is the goal, not precision.

CAP theorem is a real trade-off, not just theory. P is always required in distributed systems. So your actual choice is CP or AP. Ask “what is the cost of two users seeing different data for 30 seconds?” and the answer tells you which to pick.

Database scaling is a ladder, not a single decision. Vertical scaling → Indexing → Partitioning → Master-Slave → Multi-Master → Sharding. Climb only as far as your current problem forces you to. The simplest solution that works is always the right solution.

Most systems are read-heavy. Master-slave architecture exists specifically for this. Before doing anything else complex, check your read-to-write ratio. It will tell you a lot about where the bottleneck will be.

Sharding is a last resort. Every engineer who has done a live shard migration will tell you this. The complexity cost is real and ongoing. Do not shard because it sounds impressive. Shard only when everything else has failed.

What Is Coming in Part 4

The next blog covers:

SQL vs NoSQL, when to use a relational database and when to use a document or key-value store. This is one of the most commonly asked and commonly confused topics in interviews and on the job.

Caching, what Redis is, how caching works, what cache invalidation means, and why “just cache everything” is much worse advice than it sounds.

Microservices, what they actually are, when breaking your monolith makes sense, and how the pieces communicate with each other in a microservice architecture.

These three are where system design starts getting into the real engineering debates. The ones where reasonable engineers at the same company disagree strongly. Understanding them deeply will make you genuinely better at your job, not just better at interviews.

Drop a comment if anything here was confusing or if you want me to go deeper on any section. I read every comment and it genuinely shapes what I write next.

Follow for Part 4. If this helped you understand even one concept more clearly, share it with one friend who is also learning system design. That is honestly the best thing you can do.

Back-of-Envelope Estimation, CAP Theorem and Database Scaling – System Design Simplified for Beginners