Graphs: From Zero to BFS/DFS in One Post

Why Graphs Feel Hard — and Why They Aren't

Ask candidates to rank topics by difficulty and graphs almost always land near the top. The word itself triggers anxiety: directed, undirected, weighted, cyclic, acyclic, connected components, shortest path, topological order...

Here is the reality: the overwhelming majority of graph interview questions at every major tech company reduce to one of four techniques — BFS, DFS, topological sort, or union-find. The intimidation comes from unfamiliar vocabulary and the visual complexity of diagrams. The actual coding patterns are direct extensions of what you already know from trees.

Trees are graphs. Specifically, a tree is a connected, acyclic, undirected graph with exactly n-1 edges for n nodes. Every tree traversal technique you learned in Post 08 applies to graphs — with one addition: a visited set to prevent infinite loops in cyclic structures.

That's the whole difference. Let's build from there.

Graph Representations

Before any algorithm, choose the right representation. Interviews almost always use adjacency lists.

# Adjacency List — O(V + E) space, O(degree) neighbor lookup
# Use for: sparse graphs, most interview problems
graph = {
    0: [1, 2],
    1: [0, 3],
    2: [0, 4],
    3: [1],
    4: [2],
}

# Build from edge list
def build_graph(n, edges, directed=False):
    graph = {i: [] for i in range(n)}
    for u, v in edges:
        graph[u].append(v)
        if not directed:
            graph[v].append(u)    # undirected: add both directions
    return graph

# Adjacency Matrix — O(V²) space, O(1) edge lookup
# Use for: dense graphs, when you frequently check "is there an edge u→v?"
matrix = [
    [0, 1, 1, 0, 0],   # node 0 connects to 1 and 2
    [1, 0, 0, 1, 0],   # node 1 connects to 0 and 3
    [1, 0, 0, 0, 1],   # node 2 connects to 0 and 4
    [0, 1, 0, 0, 0],   # node 3 connects to 1
    [0, 0, 1, 0, 0],   # node 4 connects to 2
]

# Edge List — simplest form, used as input format
edges = [(0,1), (0,2), (1,3), (2,4)]

When to use which:

Adjacency list: default choice — O(V+E) space, efficient for sparse graphs
Adjacency matrix: when you need O(1) edge existence checks (rare in interviews)
Edge list: usually just the input format; convert to adjacency list before processing

The One Addition: The Visited Set

Every graph algorithm is a tree algorithm plus a visited set. The visited set prevents revisiting nodes in cyclic graphs — without it, your traversal loops forever.

# Tree DFS — no visited set needed (trees have no cycles)
def dfs_tree(node):
    if not node: return
    process(node)
    dfs_tree(node.left)
    dfs_tree(node.right)

# Graph DFS — visited set required
def dfs_graph(graph, node, visited):
    visited.add(node)
    process(node)
    for neighbor in graph[node]:
        if neighbor not in visited:
            dfs_graph(graph, neighbor, visited)

# Graph BFS — visited set required
from collections import deque
def bfs_graph(graph, start):
    visited = {start}
    queue = deque([start])
    while queue:
        node = queue.popleft()
        process(node)
        for neighbor in graph[node]:
            if neighbor not in visited:
                visited.add(neighbor)   # mark BEFORE enqueuing, not after
                queue.append(neighbor)

Critical: Mark nodes visited before enqueuing, not after dequeuing. Marking after dequeuing allows the same node to be enqueued multiple times — each incoming edge adds it to the queue again before it's processed.

Technique 1: DFS — Connectivity and Exploration

DFS is your go-to when you need to explore all reachable nodes, find connected components, or detect cycles.

Count Connected Components

def count_components(n, edges):
    graph = build_graph(n, edges, directed=False)
    visited = set()
    count = 0

    def dfs(node):
        visited.add(node)
        for neighbor in graph[node]:
            if neighbor not in visited:
                dfs(neighbor)

    for node in range(n):
        if node not in visited:
            dfs(node)
            count += 1    # each unvisited starting point = new component

    return count

Number of Islands (Grid DFS)

Grids are implicit graphs — each cell is a node, edges connect adjacent cells. No need to build an explicit graph.

def num_islands(grid):
    if not grid:
        return 0

    rows, cols = len(grid), len(grid[0])
    count = 0

    def dfs(r, c):
        if r < 0 or r >= rows or c < 0 or c >= cols or grid[r][c] != '1':
            return
        grid[r][c] = '0'    # mark visited by mutating grid (saves space)
        dfs(r+1, c)
        dfs(r-1, c)
        dfs(r, c+1)
        dfs(r, c-1)

    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == '1':
                dfs(r, c)
                count += 1

    return count

Grid traversal directions — memorize these four:

DIRECTIONS = [(0,1), (0,-1), (1,0), (-1,0)]   # right, left, down, up

# Eight-directional (includes diagonals)
DIRECTIONS_8 = [(0,1),(0,-1),(1,0),(-1,0),(1,1),(1,-1),(-1,1),(-1,-1)]

# Bounds check template
def in_bounds(r, c, rows, cols):
    return 0 <= r < rows and 0 <= c < cols

Cycle Detection in Directed Graph

Directed graphs require tracking a "currently in stack" set (gray nodes) — not just a "visited" set (black nodes).

def has_cycle_directed(graph, n):
    # Three states: 0=unvisited, 1=in stack (gray), 2=done (black)
    state = [0] * n

    def dfs(node):
        state[node] = 1    # mark as in current path
        for neighbor in graph[node]:
            if state[neighbor] == 1:
                return True     # back edge → cycle
            if state[neighbor] == 0 and dfs(neighbor):
                return True
        state[node] = 2    # done processing
        return False

    return any(dfs(i) for i in range(n) if state[i] == 0)

Why two sets? In directed graphs, a visited node is not necessarily in a cycle — you might reach it via a different path. A cycle exists only when you encounter a node that's currently on your recursion stack (gray). Finding a node that's already fully processed (black) is fine — it just means two paths lead to the same node.

Canonical Problems

Technique 2: BFS — Shortest Path in Unweighted Graphs

BFS is optimal for shortest path in unweighted graphs. It guarantees that when a node is first reached, it was reached via the shortest path — because BFS explores in order of increasing distance.

from collections import deque

def shortest_path(graph, start, end):
    if start == end:
        return 0
    visited = {start}
    queue = deque([(start, 0)])    # (node, distance)

    while queue:
        node, dist = queue.popleft()
        for neighbor in graph[node]:
            if neighbor == end:
                return dist + 1
            if neighbor not in visited:
                visited.add(neighbor)
                queue.append((neighbor, dist + 1))

    return -1    # no path exists

# Multi-source BFS — start from multiple sources simultaneously
# Useful for: nearest exit, rotting oranges, 0-1 matrix
def multi_source_bfs(grid):
    rows, cols = len(grid), len(grid[0])
    queue = deque()
    visited = set()

    # Enqueue ALL sources at the start
    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == 0:    # example: 0 is a source
                queue.append((r, c, 0))
                visited.add((r, c))

    while queue:
        r, c, dist = queue.popleft()
        for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
            nr, nc = r+dr, c+dc
            if in_bounds(nr, nc, rows, cols) and (nr,nc) not in visited:
                visited.add((nr, nc))
                queue.append((nr, nc, dist+1))

Word Ladder — BFS on Implicit Graph

Word Ladder is the canonical BFS problem where the graph is never explicitly built:

def word_ladder(begin_word, end_word, word_list):
    word_set = set(word_list)
    if end_word not in word_set:
        return 0

    queue = deque([(begin_word, 1)])    # (word, steps)
    visited = {begin_word}

    while queue:
        word, steps = queue.popleft()
        for i in range(len(word)):
            for c in 'abcdefghijklmnopqrstuvwxyz':
                new_word = word[:i] + c + word[i+1:]
                if new_word == end_word:
                    return steps + 1
                if new_word in word_set and new_word not in visited:
                    visited.add(new_word)
                    queue.append((new_word, steps + 1))

    return 0

Why BFS and not DFS? DFS finds a path, not the shortest path. BFS guarantees the first time you reach the target, it's via the fewest steps.

Canonical Problems

Rotting Oranges (Medium)
01 Matrix (Medium)
Word Ladder (Hard)

Technique 3: Topological Sort

Topological sort orders nodes in a Directed Acyclic Graph (DAG) so that for every directed edge A → B, A appears before B. Use it for any problem involving dependencies, ordering, or scheduling.

Two implementations: Kahn's algorithm (BFS-based) and DFS post-order.

Kahn's Algorithm (BFS + In-Degree)

from collections import deque

def topological_sort_kahn(n, prerequisites):
    graph = {i: [] for i in range(n)}
    in_degree = [0] * n

    for course, prereq in prerequisites:
        graph[prereq].append(course)
        in_degree[course] += 1

    # Start with all nodes that have no incoming edges
    queue = deque([i for i in range(n) if in_degree[i] == 0])
    order = []

    while queue:
        node = queue.popleft()
        order.append(node)
        for neighbor in graph[node]:
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:   # all prerequisites satisfied
                queue.append(neighbor)

    # If order contains all nodes, no cycle exists
    return order if len(order) == n else []

# Course Schedule: can you finish all courses?
def can_finish(num_courses, prerequisites):
    return len(topological_sort_kahn(num_courses, prerequisites)) == num_courses

# Course Schedule II: return a valid order
def find_order(num_courses, prerequisites):
    return topological_sort_kahn(num_courses, prerequisites)

DFS Post-Order Topological Sort

def topological_sort_dfs(graph, n):
    visited = set()
    stack = []

    def dfs(node):
        visited.add(node)
        for neighbor in graph[node]:
            if neighbor not in visited:
                dfs(neighbor)
        stack.append(node)    # append AFTER all descendants are processed

    for node in range(n):
        if node not in visited:
            dfs(node)

    return stack[::-1]    # reverse: last finished = topologically first

Kahn's vs DFS: Kahn's is easier to explain and naturally detects cycles (if output length < n, there's a cycle). DFS is more elegant for recursive thinkers. In interviews, reach for Kahn's — it's clearer under pressure.

Alien Dictionary (Hard Variant)

def alien_order(words):
    """Find character ordering from sorted alien dictionary."""
    # Build graph: for each adjacent pair of words, find first differing char
    graph = {c: set() for word in words for c in word}
    in_degree = {c: 0 for word in words for c in word}

    for i in range(len(words) - 1):
        w1, w2 = words[i], words[i+1]
        min_len = min(len(w1), len(w2))
        # Invalid case: w1 is prefix of w2 but longer (e.g., "abc" before "ab")
        if len(w1) > len(w2) and w1[:min_len] == w2[:min_len]:
            return ""
        for j in range(min_len):
            if w1[j] != w2[j]:
                if w2[j] not in graph[w1[j]]:
                    graph[w1[j]].add(w2[j])
                    in_degree[w2[j]] += 1
                break

    # Topological sort
    queue = deque([c for c in in_degree if in_degree[c] == 0])
    result = []
    while queue:
        c = queue.popleft()
        result.append(c)
        for neighbor in graph[c]:
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                queue.append(neighbor)

    return "".join(result) if len(result) == len(in_degree) else ""

Canonical Problems

Technique 4: Union-Find (Disjoint Set Union)

Union-Find tracks which nodes belong to the same connected component. It answers "are nodes u and v connected?" in near-O(1) time and merges two components in near-O(1) time — much faster than re-running BFS/DFS when the graph grows dynamically.

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))   # each node is its own parent initially
        self.rank = [0] * n            # rank = approximate tree height
        self.count = n                 # number of connected components

    def find(self, x):
        # Path compression: flatten tree as we find root
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]

    def union(self, x, y):
        px, py = self.find(x), self.find(y)
        if px == py:
            return False    # already connected

        # Union by rank: attach smaller tree under larger
        if self.rank[px] < self.rank[py]:
            px, py = py, px
        self.parent[py] = px
        if self.rank[px] == self.rank[py]:
            self.rank[px] += 1

        self.count -= 1
        return True    # successfully merged two components

    def connected(self, x, y):
        return self.find(x) == self.find(y)

Time complexity: With path compression + union by rank, both find and union run in O(α(n)) — inverse Ackermann function, effectively O(1) for all practical n.

Number of Islands with Union-Find

def num_islands_uf(grid):
    if not grid:
        return 0

    rows, cols = len(grid), len(grid[0])
    uf = UnionFind(rows * cols)
    islands = sum(grid[r][c] == '1' for r in range(rows) for c in range(cols))

    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == '1':
                for dr, dc in [(0,1),(1,0)]:   # only check right and down (avoid duplicates)
                    nr, nc = r+dr, c+dc
                    if in_bounds(nr, nc, rows, cols) and grid[nr][nc] == '1':
                        if uf.union(r*cols + c, nr*cols + nc):
                            islands -= 1   # two islands merged into one

    return islands

Redundant Connection — Find the Edge Creating a Cycle

def find_redundant_connection(edges):
    n = len(edges)
    uf = UnionFind(n + 1)

    for u, v in edges:
        if not uf.union(u, v):
            return [u, v]    # this edge connects two already-connected nodes → cycle

    return []

Accounts Merge

def accounts_merge(accounts):
    uf = UnionFind(len(accounts))
    email_to_id = {}

    for i, account in enumerate(accounts):
        for email in account[1:]:
            if email in email_to_id:
                uf.union(i, email_to_id[email])   # same email → same person
            else:
                email_to_id[email] = i

    # Group emails by their root component
    from collections import defaultdict
    components = defaultdict(set)
    for email, i in email_to_id.items():
        components[uf.find(i)].add(email)

    return [[accounts[i][0]] + sorted(emails)
            for i, emails in components.items()]

Canonical Problems

Bonus: Dijkstra's Algorithm (Weighted Shortest Path)

When edges have weights, BFS no longer gives the shortest path. Dijkstra's algorithm uses a min-heap to always process the cheapest unvisited node next.

import heapq

def dijkstra(graph, start, end, n):
    """Shortest path in weighted graph. graph[u] = [(cost, v), ...]"""
    dist = [float('inf')] * n
    dist[start] = 0
    heap = [(0, start)]    # (distance, node)

    while heap:
        d, node = heapq.heappop(heap)

        if d > dist[node]:
            continue       # stale entry — skip

        if node == end:
            return dist[end]

        for cost, neighbor in graph[node]:
            new_dist = d + cost
            if new_dist < dist[neighbor]:
                dist[neighbor] = new_dist
                heapq.heappush(heap, (new_dist, neighbor))

    return dist[end] if dist[end] != float('inf') else -1

Complexity: O((V + E) log V) — each node/edge processed once, heap operations O(log V).

When BFS vs Dijkstra:

All edges have equal weight (or no weight) → BFS
Edges have non-negative weights → Dijkstra
Edges can be negative → Bellman-Ford (rare in interviews)

The Master Decision Framework

GRAPH PROBLEM RECEIVED
         │
         ▼
Is this a GRID problem (cells as nodes, adjacent cells as edges)?
  ├─ "Count islands / components" → DFS (mutate grid to mark visited)
  ├─ "Shortest path / minimum steps" → BFS
  └─ "Flood fill / region size" → DFS
         │
         ▼
Is the graph UNWEIGHTED?
  ├─ "Shortest path between two nodes" → BFS
  ├─ "Are two nodes connected?" → DFS or Union-Find
  ├─ "Count connected components" → DFS (iterate all unvisited nodes)
  └─ "Detect cycle (undirected)" → Union-Find or DFS
         │
         ▼
Is the graph DIRECTED?
  ├─ "Detect cycle" → DFS with 3-state (unvisited/in-stack/done)
  ├─ "Task ordering / dependencies" → Topological Sort (Kahn's)
  └─ "All paths from source to target" → DFS + backtracking
         │
         ▼
Does the graph have WEIGHTS?
  ├─ "Shortest path (non-negative weights)" → Dijkstra
  └─ "Minimum spanning tree" → Kruskal (Union-Find) or Prim
         │
         ▼
Is the graph DYNAMIC (edges added over time)?
  └─ "Are two nodes in the same component?" → Union-Find
         │
         ▼
Does the problem involve DEPENDENCIES or ORDERING?
  └─ Yes → Topological Sort

Complexity Reference

Algorithm	Time	Space	Use Case
DFS (recursive)	O(V+E)	O(V) call stack	Connectivity, cycle detection, paths
BFS	O(V+E)	O(V) queue	Shortest path (unweighted), levels
Topological Sort (Kahn's)	O(V+E)	O(V)	DAG ordering, dependency resolution
Union-Find (with optimizations)	O(α(V)) per op	O(V)	Dynamic connectivity
Dijkstra	O((V+E) log V)	O(V)	Shortest path (non-negative weights)

The 10 Must-Know Graph Problems

#	Problem	Technique	Difficulty
1	Number of Islands	Grid DFS	Medium
2	Clone Graph	DFS + hash map	Medium
3	Flood Fill	Grid DFS	Easy
4	Rotting Oranges	Multi-source BFS	Medium
5	Course Schedule	Topological Sort / Cycle detect	Medium
6	Course Schedule II	Kahn's Algorithm	Medium
7	Number of Provinces	Union-Find or DFS	Medium
8	Redundant Connection	Union-Find	Medium
9	Word Ladder	BFS on implicit graph	Hard
10	Alien Dictionary	Topological Sort	Hard

Solve 1–3 first to get DFS on grids automatic. Problems 4–8 each isolate one technique. Problems 9 and 10 are the hardest — they require recognizing an implicit graph structure before applying BFS or topological sort.

Bottom Line

Graphs don't require new algorithms — they require applying familiar algorithms (DFS, BFS) to a more general structure, with one addition (the visited set) to handle cycles. Union-Find adds O(1) dynamic connectivity. Topological sort adds dependency ordering.

That's the complete toolkit. Four techniques. Master them and you can solve every graph problem in this post — and most graph problems you'll encounter in real interviews — by asking the decision framework questions and reaching for the right tool.

The candidates who struggle with graphs in interviews haven't failed to learn complex algorithms. They've failed to practice recognizing which of the four techniques a new problem is asking for. Build that recognition now, and graphs stop being the scary topic and start being the one you actually look forward to.

🎉 Chapter 2 Complete

With this post, Chapter 2 — Data Structures is finished. Here's what you've built across all six posts:

Post	Topic	Core Techniques
04	Arrays & Strings	Prefix Sum, Sliding Window, Two Pointers, Hash Map, In-Place
05	Hash Maps & Hash Sets	5 patterns: Lookup, Frequency, Grouping, Visited, Prefix+Hash
06	Linked Lists	Dummy Head, Reversal, Fast & Slow, Runner, Merging
07	Stacks & Queues	Matching Pairs, Evaluation, Monotonic Stack, BFS Queue, Deque
08	Trees & BST	Top-down/Bottom-up DFS, BFS, LCA, Construction, BST ops
09	Graphs	DFS, BFS, Topological Sort, Union-Find, Dijkstra

🧭 What's Next: Chapter 3 — Algorithms

Chapter 2 gave you the data structures. Chapter 3 gives you the algorithmic techniques that operate on them:

Post 10: Sorting & Searching — beyond built-in sort
Post 11: Recursion & Backtracking — when to recurse, when to prune
Post 12: Dynamic Programming — memoization, tabulation, state design
Post 13: Greedy Algorithms — when greedy is provably optimal
Post 14: Divide and Conquer — merge sort, binary search variants

Why Graphs Feel Hard — and Why They Aren't

Graph Representations

The One Addition: The Visited Set

Technique 1: DFS — Connectivity and Exploration

Count Connected Components

Number of Islands (Grid DFS)

Cycle Detection in Directed Graph

Canonical Problems

Technique 2: BFS — Shortest Path in Unweighted Graphs

Word Ladder — BFS on Implicit Graph

Canonical Problems

Technique 3: Topological Sort

Kahn's Algorithm (BFS + In-Degree)

DFS Post-Order Topological Sort

Alien Dictionary (Hard Variant)

Canonical Problems

Technique 4: Union-Find (Disjoint Set Union)

Number of Islands with Union-Find

Redundant Connection — Find the Edge Creating a Cycle

Accounts Merge

Canonical Problems

Bonus: Dijkstra's Algorithm (Weighted Shortest Path)

The Master Decision Framework

Complexity Reference

The 10 Must-Know Graph Problems

Bottom Line

🎉 Chapter 2 Complete

🧭 What's Next: Chapter 3 — Algorithms

Related

Divide & Conquer: The Strategy Behind the Fastest Algorithms

Greedy Algorithms: When Being Selfish Gives the Optimal Answer

Dynamic Programming: From Scary to Systematic in One Post

Leave a comment

Comments