Database Partitioning and Sharding: Range, Hash and Consistent Hashing

Partitioning vs Sharding: The Distinction

Both terms describe splitting data into smaller pieces, but they operate at different levels:

Partitioning typically refers to splitting data within a single database server into logical segments. The database engine manages the partitions transparently — queries still use the same connection, same SQL, same schema. PostgreSQL's declarative partitioning is the clearest example.

Sharding refers to splitting data across multiple separate database servers (shards). Each shard is an independent database. The application (or a middleware layer) must route queries to the correct shard. Cross-shard queries require application-level coordination.

In practice, the terms are used interchangeably. This post covers both, starting with single-server partitioning and building up to multi-server sharding.

Single-Server Partitioning (PostgreSQL Declarative Partitioning)

PostgreSQL supports table partitioning natively. The database splits a logical table into multiple physical partitions based on a partition key, while presenting a single table interface to queries.

-- Range partitioning: orders split by month
CREATE TABLE orders (
    id          BIGSERIAL,
    customer_id INTEGER NOT NULL,
    amount      NUMERIC(10,2) NOT NULL,
    created_at  TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (created_at);

-- Create monthly partitions
CREATE TABLE orders_2025_01 PARTITION OF orders
    FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
CREATE TABLE orders_2025_02 PARTITION OF orders
    FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');
-- ... and so on

-- Query: PostgreSQL automatically routes to correct partition(s)
SELECT * FROM orders WHERE created_at >= '2025-01-01' AND created_at < '2025-02-01';
-- Only scans orders_2025_01 — partition pruning eliminates other partitions

-- Drop old data: instant (drop partition = drop table)
DROP TABLE orders_2024_01;  -- instant, no DELETE needed

Benefits of single-server partitioning:

Partition pruning: queries with partition key in WHERE skip irrelevant partitions
Faster maintenance: DROP TABLE on a partition is instant vs DELETE on millions of rows
Vacuum efficiency: VACUUM runs per partition, not the entire table
Index size: indexes are per partition — smaller, faster

List partitioning — partition by discrete values:

CREATE TABLE users (
    id      BIGSERIAL,
    region  VARCHAR(10) NOT NULL,
    name    VARCHAR(100)
) PARTITION BY LIST (region);

CREATE TABLE users_us  PARTITION OF users FOR VALUES IN ('us-east', 'us-west');
CREATE TABLE users_eu  PARTITION OF users FOR VALUES IN ('eu-west', 'eu-central');
CREATE TABLE users_apac PARTITION OF users FOR VALUES IN ('ap-southeast', 'ap-northeast');

Hash partitioning — distribute evenly across N partitions:

CREATE TABLE events (
    id      BIGSERIAL,
    user_id INTEGER NOT NULL,
    payload JSONB
) PARTITION BY HASH (user_id);

CREATE TABLE events_0 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE events_1 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE events_2 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE events_3 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 3);

Multi-Server Sharding: The Three Strategies

When data must be distributed across multiple physical servers, you need a sharding strategy. The strategy determines how data is assigned to shards and how the application routes queries.

Strategy 1: Range Sharding

Assign each shard a contiguous range of the shard key.

Shard 0: customer_id 1       – 1,000,000
Shard 1: customer_id 1,000,001 – 2,000,000
Shard 2: customer_id 2,000,001 – 3,000,000
Shard 3: customer_id 3,000,001 – ∞

Routing logic:

def get_shard(customer_id: int) -> int:
    if customer_id <= 1_000_000:   return 0
    if customer_id <= 2_000_000:   return 1
    if customer_id <= 3_000_000:   return 2
    return 3

shard_connections = {0: db0, 1: db1, 2: db2, 3: db3}
conn = shard_connections[get_shard(customer_id)]

The hot spot problem:

Range sharding creates a critical weakness: write hot spots. New customers get sequential IDs — all new writes go to the last shard. Shard 3 receives 100% of new customer writes while shards 0–2 are idle.

Time →  New customers streaming in (IDs 3,000,001, 3,000,002, 3,000,003...)

Shard 0: 0% writes   ← cold
Shard 1: 0% writes   ← cold
Shard 2: 0% writes   ← cold
Shard 3: 100% writes ← hot (all new IDs land here)

When range sharding works well: When queries frequently access ranges of the shard key. Time-series data sharded by timestamp is natural — most queries ask for "last 7 days" which lands on one or two shards (sequential access pattern = no cross-shard scatter).

orders sharded by month:
  Shard 0: Jan 2025 orders
  Shard 1: Feb 2025 orders
  Shard 2: Mar 2025 orders

Query: "all orders this month" → hits Shard 2 only ✅
Query: "all orders last 90 days" → hits Shards 0, 1, 2 ✅
Query: "all orders by customer 42 ever" → hits ALL shards ❌ (scatter-gather)

Strategy 2: Hash Sharding

Compute a hash of the shard key modulo the number of shards. Each shard receives a uniform, pseudo-random distribution of keys.

shard = hash(customer_id) % num_shards

customer_id=1:   hash=0xA3B2 → 0xA3B2 % 4 = 2 → Shard 2
customer_id=2:   hash=0x7F91 → 0x7F91 % 4 = 1 → Shard 1
customer_id=3:   hash=0xC401 → 0xC401 % 4 = 1 → Shard 1
customer_id=4:   hash=0x2D88 → 0x2D88 % 4 = 0 → Shard 0

Routing logic:

import hashlib

def get_shard(customer_id: int, num_shards: int) -> int:
    key = str(customer_id).encode()
    hash_val = int(hashlib.md5(key).hexdigest(), 16)
    return hash_val % num_shards

conn = shard_connections[get_shard(customer_id, num_shards=4)]

Benefits:

Even data distribution — no hot spots
Simple routing logic

The resharding problem:

Hash sharding has a fatal flaw when you need to add shards. When num_shards changes from 4 to 5, almost every key moves to a different shard:

With 4 shards:    customer_id=42 → hash % 4 = 2 → Shard 2
With 5 shards:    customer_id=42 → hash % 5 = 4 → Shard 4

~80% of all data must be migrated when adding one shard.
During migration: which shard do you read from?
The answer requires careful dual-read logic and is extremely painful.

Strategy 3: Consistent Hashing

Consistent hashing solves the resharding problem by mapping both data keys and shard nodes onto a circular hash ring. Adding or removing a shard only moves a fraction of the data.

The hash ring (0 to 2^32):

         0
         │
    ┌────┴────┐
    │    ↑    │
2^32┤         ├ 2^30
    │    Ring │
    │         │
    └────┬────┘
         │
        2^31

Nodes placed at positions on the ring:
  Shard A: position 0x1000  (hash("shard-A"))
  Shard B: position 0x5000  (hash("shard-B"))
  Shard C: position 0x9000  (hash("shard-C"))
  Shard D: position 0xD000  (hash("shard-D"))

Data routing:
  hash(customer_id) = 0x3500 → walk clockwise → first node is Shard B at 0x5000
  hash(customer_id) = 0x7000 → walk clockwise → first node is Shard C at 0x9000

Adding a shard:

Add Shard E at position 0x3000 (between Shard A at 0x1000 and Shard B at 0x5000):

Before: keys in range [0x1000, 0x5000) → Shard B
After:  keys in range [0x1000, 0x3000) → Shard E  ← moved
        keys in range [0x3000, 0x5000) → Shard B  ← unchanged

Only keys in [0x1000, 0x3000) move — roughly 1/N of all data,
where N is the total number of shards. All other keys are unaffected.

Virtual nodes (vnodes): A single physical shard is assigned multiple positions on the ring (virtual nodes). This improves distribution uniformity — instead of 4 large arcs (one per shard), each shard owns many small arcs scattered around the ring.

Without vnodes (4 shards, 4 positions):
  Shard A: 1 arc of 25% of ring
  Shard B: 1 arc of 25% of ring
  Problem: uneven distribution if hashes cluster

With vnodes (4 shards, 150 positions each = 600 total):
  Shard A: 150 small arcs scattered across ring
  Shard B: 150 small arcs scattered across ring
  Result: near-uniform distribution across all shards

When a new shard is added:
  Takes a fraction of vnodes from each existing shard
  Data migration is distributed across all existing shards
  → No single shard bears the full migration burden

Used by: Cassandra, DynamoDB, Riak, Amazon S3.

Cross-Shard Query Problem

Sharding eliminates the problem of too much data on one server. It creates a new problem: queries that need data from multiple shards.

Query: "Top 10 customers by total spend (all time)"

Without sharding:
  SELECT customer_id, SUM(amount) FROM orders
  GROUP BY customer_id ORDER BY SUM(amount) DESC LIMIT 10;
  ← One query, one result

With sharding (4 shards):
  Step 1: Send query to all 4 shards (scatter)
    Shard 0: top 10 from its data
    Shard 1: top 10 from its data
    Shard 2: top 10 from its data
    Shard 3: top 10 from its data

  Step 2: Application merges 40 results, re-ranks, returns top 10 (gather)

  Problem: "top 10 from shard 0" may not include the globally top customer
  because that customer's data spans multiple shards.
  Need ALL customers from ALL shards, not just top 10 per shard.
  
  Full scatter-gather: query all shards, merge in application
  → 4× the data transfer, 4× the query work

The shard key rule: Choose a shard key such that the most common queries touch exactly one shard. If your application queries primarily by customer_id, shard by customer_id. If you frequently query by order_date, shard by order_date.

There is always a query pattern that crosses shards. Design for the 90% case, accept that the 10% case requires scatter-gather or a separate denormalized read model.

Resharding: The Painful Inevitable

Every sharding strategy eventually requires adding shards as data grows. Resharding is one of the most operationally dangerous database operations.

The safest resharding approach — double-write + backfill:

Phase 1: Add new shards, start double-writing
  New writes: go to BOTH old and new shard layout
  Old data: still only on old shards

Phase 2: Backfill old data to new layout
  Background job reads from old layout, writes to new layout
  Runs slowly to minimize production impact

Phase 3: Verify consistency
  Compare record counts, checksums between old and new layouts

Phase 4: Switch reads to new layout
  Update routing logic to read from new layout
  Monitor for inconsistencies

Phase 5: Stop double-writing
  Writes now go only to new layout
  Old layout is stale

Phase 6: Decommission old layout
  After verification period, delete old data

Total duration: days to weeks for large datasets
Risk window: Phase 4 — reads switch before full backfill is complete

Tools that manage resharding:

Vitess (YouTube): MySQL sharding middleware with transparent resharding
Citus (PostgreSQL): distributed PostgreSQL with shard rebalancing
PlanetScale: MySQL-compatible with online schema changes and resharding

Choosing a Shard Key

The shard key decision is the most consequential design choice in a sharded system. A poor shard key cannot be changed without a full resharding operation.

Good shard key properties:

✅ High cardinality: many distinct values (user_id: millions of values)
   Low cardinality causes hot shards (status: only 'active'/'inactive' = 2 shards max)

✅ Uniform distribution: values spread evenly across the key space
   Sequential IDs with range sharding → hot spot on latest shard

✅ Queries mostly filter by shard key: minimizes cross-shard scatter
   Shard by customer_id if most queries are "all orders for customer X"

✅ Immutable: the shard key value never changes for a row
   If customer_id changes, the row must migrate to a different shard
   (This essentially never happens with surrogate keys)

❌ Avoid timestamp as sole shard key (range sharding):
   All current writes go to the "today" shard → write hot spot

❌ Avoid low-cardinality columns (status, region with few values):
   Too few shards, uneven distribution

❌ Avoid columns that frequently appear in joins:
   Joining by non-shard-key requires cross-shard scatter

🧭 What's Next

Post 17: CAP Theorem & PACELC — sharding forces you to confront the fundamental trade-offs of distributed systems; CAP and PACELC are the frameworks that make those trade-offs explicit and the decisions around them deliberate

Partitioning vs Sharding: The Distinction

Single-Server Partitioning (PostgreSQL Declarative Partitioning)

Multi-Server Sharding: The Three Strategies

Strategy 1: Range Sharding

Strategy 2: Hash Sharding

Strategy 3: Consistent Hashing

Cross-Shard Query Problem

Resharding: The Painful Inevitable

Choosing a Shard Key

🧭 What's Next

Related

Distributed Transactions: Two-Phase Commit, Saga and When to Use Each

CAP Theorem and PACELC: What They Mean for Real Database Decisions

Failover, Consensus and High Availability: Raft, Paxos and Automatic Failover

Leave a comment

Comments