Distributed Transactions: Two-Phase Commit, Saga and When to Use Each

The Problem: ACID Across Multiple Nodes

Within a single database, ACID transactions are straightforward — the database engine handles atomicity, isolation, and durability internally. When a business operation spans multiple databases or services, ACID guarantees do not extend across the boundary.

Single-database transaction: trivial
─────────────────────────────────────
BEGIN;
  UPDATE inventory SET stock = stock - 1 WHERE product_id = 7;
  INSERT INTO orders (customer_id, product_id, amount) VALUES (42, 7, 99.99);
  UPDATE accounts SET balance = balance - 99.99 WHERE customer_id = 42;
COMMIT;
-- All three succeed or all three roll back. Atomic. ✅

Multi-database operation: hard
─────────────────────────────────────
Step 1: UPDATE inventory.stock     (Inventory DB)
Step 2: INSERT INTO orders         (Orders DB)
Step 3: UPDATE accounts.balance    (Payments DB)

If Step 2 succeeds and Step 3 fails:
  - Inventory is decremented ✅
  - Order is created ✅
  - Payment is not taken ❌
  
No single COMMIT can span three separate databases.
You need a protocol.

Two protocols dominate: Two-Phase Commit (2PC) for synchronous safety, and Saga for asynchronous availability.

Two-Phase Commit (2PC)

2PC is a distributed coordination protocol that achieves atomic commitment across multiple participants. Either all participants commit or all roll back — guaranteed.

The Protocol

A coordinator (transaction manager) orchestrates multiple participants (database nodes or services).

Phase 1 — Prepare (Voting):

Coordinator → all participants: "Prepare to commit transaction T"

Each participant:
  1. Acquires all locks needed for the transaction
  2. Writes all changes to a temporary log (not yet committed)
  3. Ensures it CAN commit (has resources, constraints satisfied)
  4. Responds to coordinator:
     - VOTE YES: "I am prepared, I will commit if you tell me to"
     - VOTE NO:  "I cannot commit, please abort"

Timeline:
  Coordinator ──Prepare──→ Participant A  →  writes to WAL, acquires locks  →  YES
  Coordinator ──Prepare──→ Participant B  →  writes to WAL, acquires locks  →  YES
  Coordinator ──Prepare──→ Participant C  →  constraint violation found      →  NO

Phase 2 — Commit or Abort:

If ALL participants voted YES:
  Coordinator writes COMMIT decision to its own durable log
  Coordinator ──COMMIT──→ all participants
  Each participant: applies changes, releases locks, acknowledges
  Transaction is committed ✅

If ANY participant voted NO:
  Coordinator writes ABORT decision to its own durable log
  Coordinator ──ABORT──→ all participants
  Each participant: discards prepared changes, releases locks
  Transaction is rolled back ❌

Full timeline:
  T=1  Coordinator: BEGIN distributed transaction
  T=2  Coordinator → A, B, C: PREPARE
  T=3  A, B, C: acquire locks, write to WAL, send VOTE YES
  T=4  Coordinator: all YES → write COMMIT to durable log
  T=5  Coordinator → A, B, C: COMMIT
  T=6  A, B, C: apply changes, release locks, send ACK
  T=7  Coordinator: transaction complete

The Blocking Problem

2PC has a fundamental flaw: if the coordinator crashes after Phase 1 but before Phase 2, participants are stuck holding locks indefinitely.

Failure scenario:

T=1  All participants send VOTE YES
T=2  Coordinator writes COMMIT to its own log
T=3  Coordinator crashes ← right here
T=4  Participants are in PREPARED state, holding all locks
T=5  Participants cannot proceed (don't know if COMMIT or ABORT)
T=6  Participants cannot release locks (might need to commit)

Result: System is BLOCKED until coordinator recovers
  - All locks held by this transaction are unavailable
  - Any other transaction touching the same rows is blocked
  - Recovery requires coordinator to restart and re-send Phase 2
  - If coordinator's disk fails: manual intervention required

This is why 2PC is called a blocking protocol — it can block indefinitely if the coordinator fails. The alternative, Three-Phase Commit (3PC), adds a pre-commit phase to eliminate most blocking scenarios but is rarely used in practice due to its complexity and susceptibility to network partitions.

When 2PC Makes Sense

Despite the blocking problem, 2PC is appropriate when:

Strong atomicity is non-negotiable. Financial systems, payment processing, inventory management — where a partial failure means data corruption.
Participants are few and trusted. 2PC latency grows linearly with the number of participants and network round-trips. 2–5 participants across a fast internal network is reasonable; 20 participants across a wide-area network is not.
Participants are long-lived services, not ephemeral microservices. The coordinator must be able to reach every participant during recovery.

2PC in practice:

-- PostgreSQL distributed transaction via PREPARE TRANSACTION
-- (PostgreSQL's implementation of the 2PC prepare phase)

-- On Coordinator (application code):
BEGIN;
  -- Execute operations on multiple databases via dblink or foreign data wrappers
  UPDATE inventory SET stock = stock - 1 WHERE product_id = 7;

-- Phase 1: Prepare
PREPARE TRANSACTION 'txn-order-12345';
-- Transaction is now in prepared state — survives restart

-- Check prepared transactions (should be empty in normal operation)
SELECT * FROM pg_prepared_xacts;

-- Phase 2: Commit (once all participants prepared)
COMMIT PREPARED 'txn-order-12345';
-- or if abort needed:
ROLLBACK PREPARED 'txn-order-12345';

2PC in distributed databases:

CockroachDB, Google Spanner, and YugabyteDB implement 2PC internally and transparently — you write normal SQL, the database handles distributed coordination. This is what "NewSQL" means: SQL semantics across distributed nodes without manual 2PC management.

The Saga Pattern

Saga decomposes a distributed transaction into a sequence of local transactions, each on a single service or database. If a step fails, compensating transactions undo the work done by preceding steps.

Saga trades atomicity for availability. Instead of blocking on coordinator health, each step completes independently. The system remains available even if individual services are slow or temporarily down.

Choreography-Based Saga

Each service listens for events and publishes events when done. No central coordinator.

E-commerce order saga (choreography):

OrderService          InventoryService      PaymentService        NotificationService
     │                      │                    │                       │
  Create order               │                    │                       │
  Publish: OrderCreated      │                    │                       │
     │ ─────────────────────→│                    │                       │
     │                 Reserve stock              │                       │
     │                 Publish: StockReserved      │                       │
     │                      │ ──────────────────→│                       │
     │                      │              Charge payment                 │
     │                      │              Publish: PaymentCharged        │
     │                      │                    │ ─────────────────────→│
     │                      │                    │              Send confirmation
     │                      │                    │              email

If PaymentService fails:
  Publish: PaymentFailed
  InventoryService listens → Release stock (compensating transaction)
  OrderService listens → Cancel order (compensating transaction)

Advantages:

Loose coupling — services do not call each other directly
Services can be developed and deployed independently
No coordinator → no blocking

Disadvantages:

Difficult to track overall transaction state (what step are we on?)
Hard to debug — transaction logic is spread across multiple services
Cyclic dependencies can emerge as event handlers grow complex

Orchestration-Based Saga

A central saga orchestrator explicitly calls each service in sequence and manages compensation on failure.

OrderSaga Orchestrator:

Step 1: Call InventoryService.reserveStock(productId=7, qty=1)
  ├─ Success → proceed to Step 2
  └─ Failure → (nothing to compensate yet) → fail saga

Step 2: Call PaymentService.charge(customerId=42, amount=99.99)
  ├─ Success → proceed to Step 3
  └─ Failure → compensate: InventoryService.releaseStock(productId=7, qty=1)
                            → fail saga

Step 3: Call OrderService.confirmOrder(orderId=12345)
  ├─ Success → saga complete ✅
  └─ Failure → compensate: PaymentService.refund(customerId=42, amount=99.99)
                            InventoryService.releaseStock(productId=7, qty=1)
                            → fail saga

Compensation order is always reverse of execution order:
  Normal:      Step 1 → Step 2 → Step 3
  Compensate:  Step 3's compensation → Step 2's compensation → Step 1's compensation

class OrderSaga:
    def execute(self, customer_id: int, product_id: int, amount: float):
        completed_steps = []

        try:
            # Step 1: Reserve inventory
            reservation = inventory_service.reserve_stock(product_id, qty=1)
            completed_steps.append(('inventory', reservation.id))

            # Step 2: Charge payment
            payment = payment_service.charge(customer_id, amount)
            completed_steps.append(('payment', payment.id))

            # Step 3: Confirm order
            order = order_service.confirm_order(customer_id, product_id, amount)
            completed_steps.append(('order', order.id))

            return order

        except Exception as e:
            # Compensate in reverse order
            self._compensate(completed_steps)
            raise SagaFailedError(f"Saga failed at step: {e}") from e

    def _compensate(self, completed_steps: list):
        for service, resource_id in reversed(completed_steps):
            try:
                if service == 'inventory':
                    inventory_service.release_stock(resource_id)
                elif service == 'payment':
                    payment_service.refund(resource_id)
                elif service == 'order':
                    order_service.cancel_order(resource_id)
            except Exception as comp_error:
                # Compensation itself failed — log for manual intervention
                logger.critical(f"Compensation failed for {service}/{resource_id}: {comp_error}")
                alert_oncall(f"Manual intervention required: {service}/{resource_id}")

Advantages:

Central visibility into saga state — easy to monitor and debug
Explicit control flow — easy to reason about what happens on failure
Easier to add new steps or modify existing ones

Disadvantages:

Orchestrator is a central component — must be highly available
Risk of coupling services to the orchestrator's knowledge of their APIs

Idempotency: The Critical Requirement

Both 2PC and Saga require that each operation be idempotent — calling it multiple times produces the same result as calling it once.

Why? Network failures may cause duplicate calls. A timeout does not mean the operation failed — it means you do not know whether it succeeded. Retrying without idempotency causes double-charges, double-shipments, and duplicate records.

# Non-idempotent: charging twice means two charges
def charge_payment(customer_id: int, amount: float):
    payment = Payment.create(customer_id=customer_id, amount=amount)
    stripe.charge(payment.id, amount)
    return payment

# Idempotent: same idempotency_key = no duplicate charge
def charge_payment(customer_id: int, amount: float, idempotency_key: str):
    # Check if we already processed this exact request
    existing = Payment.find_by_idempotency_key(idempotency_key)
    if existing:
        return existing  # return the already-created payment, no new charge

    payment = Payment.create(
        customer_id=customer_id,
        amount=amount,
        idempotency_key=idempotency_key
    )
    stripe.charge(payment.id, amount, idempotency_key=idempotency_key)
    return payment

# Usage: idempotency_key = stable ID derived from the saga's context
charge_payment(42, 99.99, idempotency_key=f"order-{order_id}-payment")

Compensating transactions must also be idempotent. If the refund call times out and is retried, the second refund call must detect the first refund already happened and return without issuing a second refund.

2PC vs Saga: The Decision

2PC vs Saga:

                    2PC                         Saga
─────────────────────────────────────────────────────────────────
Atomicity           True atomicity              Eventual consistency
                    All or nothing at commit    Compensations may fail

Blocking            Yes — coordinator failure   No — each step independent
                    can block indefinitely

Latency             Higher — requires 2         Lower — each step async
                    synchronous round-trips

Availability        Lower — blocked by          Higher — no coordinator
                    coordinator health          dependency per step

Failure handling    Automatic rollback          Requires compensating
                    (coordinator manages)       transactions (you manage)

Cross-service       Tight coupling required     Loose coupling via events
coupling            (coordinator knows all)     or explicit calls

Debugging           Simpler — one atomic unit   Complex — distributed state

Best for            Few trusted services,       Many microservices,
                    financial atomicity,        long-running workflows,
                    same infrastructure         cross-org boundaries

The practical rule:

Same infrastructure (same company's databases): 2PC via a NewSQL database (CockroachDB, Spanner) that handles it transparently, or an XA transaction manager.
Across microservices or organizations: Saga — 2PC requires tight coupling and direct database access that breaks service encapsulation.
Long-running workflows (minutes to days): Saga always — no coordinator should hold locks for hours.

Chapter 5 Complete

You have finished the Scaling & Distributed Systems chapter:

✅ Post 15: Vertical vs Horizontal Scaling
✅ Post 16: Database Partitioning & Sharding
✅ Post 17: CAP Theorem & PACELC
✅ Post 18: Distributed Transactions: 2PC & Saga

Chapter 6 moves from relational databases to NoSQL — the document, key-value, and wide-column stores that trade SQL semantics for scale, flexibility, and write throughput.

🧭 What's Next — Chapter 6: NoSQL Databases

Post 19: NoSQL Explained — document, key-value, wide-column, and when NOT to use SQL; understanding which NoSQL family fits which problem is the difference between a fast system and a slow one

The Problem: ACID Across Multiple Nodes

Two-Phase Commit (2PC)

The Protocol

The Blocking Problem

When 2PC Makes Sense

The Saga Pattern

Choreography-Based Saga

Orchestration-Based Saga

Idempotency: The Critical Requirement

2PC vs Saga: The Decision

Chapter 5 Complete

🧭 What's Next — Chapter 6: NoSQL Databases

Related

CAP Theorem and PACELC: What They Mean for Real Database Decisions

Database Partitioning and Sharding: Range, Hash and Consistent Hashing

Failover, Consensus and High Availability: Raft, Paxos and Automatic Failover

Leave a comment

Comments