Hash Maps & Hash Sets: Turn O(n²) Into O(n) Every Time
The Single Best Return on Your Study Time
If you had to pick one data structure that would improve your interview performance more than any other, it would not be trees. It would not be graphs. It would not be heaps.
It would be the hash map.
Not because hash maps are the most intellectually interesting structure. They're not. But because they are the answer — or part of the answer — to an enormous number of interview problems. Anytime you see a nested loop, ask: "Can a hash map eliminate the inner loop?" The answer is yes far more often than candidates realize.
But hash maps are also misunderstood. Many candidates treat them as magic O(1) boxes and don't understand the conditions under which they degrade — leading to wrong complexity analysis, wrong tradeoff discussions, and subtle bugs when hash map operations fail silently in edge cases.
This post covers everything: how hash maps work under the hood, when O(1) breaks down, hash sets vs hash maps, and the five patterns that appear in the majority of hash-related interview problems.
Under the Hood: How Hash Maps Actually Work
A hash map (called dict in Python, HashMap in Java, unordered_map in C++) is an array with a smart indexing function.
Here's the mechanism:
You call
map[key] = valueThe hash map runs a hash function on
key, converting it to an integerThat integer is reduced to an array index:
index = hash(key) % capacityThe value is stored at that index in an underlying array
key="alice" → hash("alice") = 92791 → 92791 % 16 = 7 → store at index 7
key="bob" → hash("bob") = 47203 → 47203 % 16 = 3 → store at index 3
key="carol" → hash("carol") = 82311 → 82311 % 16 = 7 → COLLISION at index 7Collisions are inevitable. Two different keys can hash to the same index. The two standard collision resolution strategies are:
Chaining: each array slot holds a linked list of (key, value) pairs; on collision, append to the list
Open addressing: on collision, probe adjacent slots until an empty one is found
Why O(1) Is "Average Case," Not Guaranteed
Most interview candidates say hash map operations are O(1). That's correct on average but wrong in two important scenarios:
Scenario 1 — Hash collisions degrade to O(n)
In the worst case, every key hashes to the same slot. The underlying linked list (in chaining) becomes n elements long. Every lookup requires traversing the entire list: O(n). Modern hash maps use random hash seeds to make this extremely unlikely, but it's theoretically possible — and it matters for complexity analysis.
Scenario 2 — Rehashing triggers O(n) work
When a hash map exceeds its load factor threshold (typically 0.75), it rehashes: allocates a new array roughly twice as large and reinserts all existing entries. This single operation costs O(n). Like dynamic array resizing, it's amortized — rare enough that average cost per insertion remains O(1) — but it happens, and it means individual insertions occasionally take O(n).
Hash Map: capacity=8, load_factor_threshold=0.75
→ Rehash triggers at 8 × 0.75 = 6 entries
→ New capacity: 16
→ All 6 entries are rehashed and reinserted: O(n)
→ Subsequent insertions are O(1) until next rehashThe correct interview answer: Hash map operations are O(1) amortized average case. Worst case is O(n) due to collisions or rehashing. In practice, with a good hash function and reasonable load factor, they behave as O(1).
What Makes a Good Hash Function
A hash function must be:
Deterministic — same key always produces same hash
Uniform — distributes keys evenly across the array
Fast — O(1) to compute (ideally)
In Python, integers, strings, and tuples are hashable. Lists and dicts are not hashable — they're mutable and can't be reliably hashed. This is why you can't use a list as a dict key. When you need a dict key from a list, convert it to a tuple first.
# ❌ TypeError: unhashable type: 'list'
seen = {}
seen[[1, 2, 3]] = True
# ✅ Tuples are hashable
seen = {}
seen[(1, 2, 3)] = True
# ✅ Converting list to tuple for dict key
def group_anagrams(strs):
groups = {}
for s in strs:
key = tuple(sorted(s)) # list → tuple → hashable
groups.setdefault(key, []).append(s)
return list(groups.values())Hash Map vs Hash Set
A hash set is a hash map with no values — just keys. Use a set when you only need to answer "have I seen this before?" and don't need to store any associated data.
# Hash Map: key → value
word_count = {}
for word in words:
word_count[word] = word_count.get(word, 0) + 1
# Hash Set: membership only
seen = set()
for num in nums:
if num in seen: # O(1) lookup
return True
seen.add(num)Space-wise, a set uses roughly half the memory of an equivalent dict because it stores no values. For membership-only problems, always prefer a set.
The 5 Hash Map Patterns
The following five patterns cover the overwhelming majority of hash-related interview questions.
Pattern 1: Two-Pass Lookup (Classic Two Sum)
Problem shape: Given a collection, find two elements that satisfy a relationship (sum to target, differ by k, etc.)
The insight: On your first encounter with each element, you don't know if its "partner" exists yet. Store it. On subsequent encounters, check if the partner is already stored.
def two_sum(nums, target):
seen = {} # value → index
for i, num in enumerate(nums):
complement = target - num
if complement in seen: # partner already stored
return [seen[complement], i]
seen[num] = i # store for future partners
return []Why one pass works: You process left to right. When you reach element at index j, all elements at indices 0..j-1 are already in the hash map. If the complement exists anywhere to the left, you'll find it. If it's to the right, that element will find you when it's processed. Every pair is checked exactly once.
Generalization: Any "find two elements satisfying condition X" problem — check if the partner of current element exists in the hash map, store current element.
Variants:
# Find pair with difference k
def find_pair_with_diff(nums, k):
num_set = set(nums)
for num in nums:
if num + k in num_set and num + k != num:
return [num, num + k]
return []
# Find pair whose product equals target
def find_pair_with_product(nums, target):
seen = set()
for num in nums:
if num != 0 and target % num == 0:
partner = target // num
if partner in seen:
return [partner, num]
seen.add(num)
return []Canonical Problems
Pattern 2: Frequency Counter
Problem shape: Count how many times each element appears. Use those counts to check equivalence, find outliers, or solve frequency-based conditions.
from collections import Counter
# Are two strings anagrams?
def is_anagram(s, t):
return Counter(s) == Counter(t)
# O(1) space version using a single counter
def is_anagram_v2(s, t):
if len(s) != len(t):
return False
count = {}
for c in s:
count[c] = count.get(c, 0) + 1
for c in t:
count[c] = count.get(c, 0) - 1
if count[c] < 0:
return False
return True
# Find all elements appearing more than n//3 times
def majority_element(nums):
count = Counter(nums)
n = len(nums)
return [num for num, freq in count.items() if freq > n // 3]Counter operations you need to know:
c = Counter("mississippi")
# Counter({'i': 4, 's': 4, 'p': 2, 'm': 1})
c.most_common(2) # [('i', 4), ('s', 4)]
c.most_common()[:-3:-1] # 2 least common
# Arithmetic
c1 = Counter(a=3, b=1)
c2 = Counter(a=1, b=2)
c1 + c2 # Counter({'a': 4, 'b': 3}) — add counts
c1 - c2 # Counter({'a': 2}) — subtract, keep positives only
c1 & c2 # Counter({'a': 1, 'b': 1}) — min of each count
c1 | c2 # Counter({'a': 3, 'b': 2}) — max of each countThe sliding window + counter combination is one of the most powerful patterns for substring problems:
def find_all_anagrams(s, p):
"""Find all starting indices of anagram substrings of p in s."""
result = []
p_count = Counter(p)
window = Counter(s[:len(p)])
if window == p_count:
result.append(0)
for i in range(len(p), len(s)):
# Add new char
window[s[i]] += 1
# Remove old char
old_char = s[i - len(p)]
window[old_char] -= 1
if window[old_char] == 0:
del window[old_char]
# Check
if window == p_count:
result.append(i - len(p) + 1)
return resultCanonical Problems
Valid Anagram (Easy)
Find All Anagrams in a String (Medium)
Pattern 3: Grouping by Canonical Key
Problem shape: Elements that are "equivalent" by some transformation should be grouped together. Build a key that is identical for all equivalent elements, use it as a hash map key.
# Group anagrams: canonical key = sorted characters
def group_anagrams(strs):
groups = {}
for s in strs:
key = tuple(sorted(s))
groups.setdefault(key, []).append(s)
return list(groups.values())
# Group by digit sum
def group_by_digit_sum(nums):
groups = {}
for num in nums:
key = sum(int(d) for d in str(num))
groups.setdefault(key, []).append(num)
return groups
# Isomorphic strings: map each character position to its 'canonical' position
def is_isomorphic(s, t):
s_to_t = {}
t_to_s = {}
for cs, ct in zip(s, t):
if cs in s_to_t and s_to_t[cs] != ct:
return False
if ct in t_to_s and t_to_s[ct] != cs:
return False
s_to_t[cs] = ct
t_to_s[ct] = cs
return TrueThe key insight: You don't group elements by content — you group them by their canonical form (a transformation that produces the same output for all equivalent inputs). Finding the right canonical form is the creative part of these problems.
# Group strings that become the same after removing vowels
def group_by_consonants(strs):
VOWELS = set('aeiou')
groups = {}
for s in strs:
key = ''.join(c for c in s if c not in VOWELS)
groups.setdefault(key, []).append(s)
return list(groups.values())
# Group numbers that have the same set of digits
def group_by_digit_set(nums):
groups = {}
for num in nums:
key = frozenset(str(num)) # frozenset is hashable, set is not
groups.setdefault(key, []).append(num)
return groupsCanonical Problems
Group Anagrams (Medium)
Isomorphic Strings (Easy)
Word Pattern (Easy)
Pattern 4: Seen-Before / Visited Tracking
Problem shape: Track which elements (or states) have been encountered to detect cycles, avoid revisiting, find duplicates, or determine membership.
# Contains duplicate: O(n) time, O(n) space
def contains_duplicate(nums):
seen = set()
for num in nums:
if num in seen:
return True
seen.add(num)
return False
# Happy number: detect cycle using seen set
def is_happy(n):
def digit_square_sum(x):
return sum(int(d) ** 2 for d in str(x))
seen = set()
while n != 1:
if n in seen:
return False # cycle detected
seen.add(n)
n = digit_square_sum(n)
return True
# Longest path in a directed graph without cycle: visited + recursion stack
def can_finish(num_courses, prerequisites):
graph = {i: [] for i in range(num_courses)}
for a, b in prerequisites:
graph[b].append(a)
# States: 0=unvisited, 1=visiting (in stack), 2=done
state = [0] * num_courses
def has_cycle(node):
if state[node] == 1: # back edge → cycle
return True
if state[node] == 2: # already processed
return False
state[node] = 1
for neighbor in graph[node]:
if has_cycle(neighbor):
return True
state[node] = 2
return False
return not any(has_cycle(i) for i in range(num_courses) if state[i] == 0)Seen-before in BFS/DFS: Always maintain a visited set when traversing graphs. Without it, you'll revisit nodes infinitely in cyclic graphs.
from collections import deque
def bfs_shortest_path(graph, start, end):
visited = {start}
queue = deque([(start, 0)])
while queue:
node, dist = queue.popleft()
if node == end:
return dist
for neighbor in graph[node]:
if neighbor not in visited:
visited.add(neighbor) # mark before enqueuing, not after
queue.append((neighbor, dist + 1))
return -1Important: Mark nodes as visited before enqueuing, not after dequeuing. Marking after dequeuing can enqueue the same node multiple times if it has multiple incoming edges.
Canonical Problems
Contains Duplicate (Easy)
Longest Consecutive Sequence (Medium)
Course Schedule (Medium)
Pattern 5: Prefix Sum + Hash Map
Problem shape: Count subarrays with a specific sum, find the longest subarray meeting a condition, or detect equilibrium points. This pattern extends prefix sum to handle negative numbers and "count" queries.
This is the pattern most candidates don't have, and the one that most frequently separates Medium-solvers from Hard-solvers.
# Count subarrays summing to k — works with negatives
def subarray_sum_equals_k(nums, k):
count = 0
prefix = 0
seen = {0: 1} # prefix_sum → frequency
for num in nums:
prefix += num
# prefix - k appeared before → subarray between then and now sums to k
count += seen.get(prefix - k, 0)
seen[prefix] = seen.get(prefix, 0) + 1
return count
# Longest subarray with sum 0
def longest_zero_sum_subarray(nums):
prefix = 0
first_seen = {0: -1} # prefix_sum → earliest index
max_len = 0
for i, num in enumerate(nums):
prefix += num
if prefix in first_seen:
max_len = max(max_len, i - first_seen[prefix])
else:
first_seen[prefix] = i # only store first occurrence
return max_len
# Contiguous array: find longest subarray with equal 0s and 1s
def find_max_length(nums):
# Transform: replace 0 with -1, then find longest subarray with sum 0
prefix = 0
first_seen = {0: -1}
max_len = 0
for i, num in enumerate(nums):
prefix += 1 if num == 1 else -1
if prefix in first_seen:
max_len = max(max_len, i - first_seen[prefix])
else:
first_seen[prefix] = i
return max_lenThe core insight: Two positions i and j where prefix[i] == prefix[j] means the subarray from i+1 to j has sum zero (or whatever difference you're looking for). Store the first time you see each prefix sum — that maximizes the subarray length.
# Subarray with sum divisible by k
def subarrays_div_by_k(nums, k):
count = 0
prefix = 0
remainder_count = {0: 1}
for num in nums:
prefix = (prefix + num) % k
if prefix < 0:
prefix += k # handle negative remainders in Python
count += remainder_count.get(prefix, 0)
remainder_count[prefix] = remainder_count.get(prefix, 0) + 1
return countCanonical Problems
Subarray Sum Equals K (Medium)
Contiguous Array (Medium)
Subarray Sums Divisible by K (Medium)
Python-Specific: defaultdict and OrderedDict
Two Python tools that simplify hash map code in interviews:
from collections import defaultdict, OrderedDict
# defaultdict: no KeyError on missing keys
graph = defaultdict(list)
graph['a'].append('b') # works without initializing 'a' first
word_count = defaultdict(int)
for word in words:
word_count[word] += 1 # no .get(word, 0) needed
# OrderedDict: maintains insertion order (useful for LRU Cache)
class LRUCache:
def __init__(self, capacity):
self.cap = capacity
self.cache = OrderedDict()
def get(self, key):
if key not in self.cache:
return -1
self.cache.move_to_end(key) # mark as recently used
return self.cache[key]
def put(self, key, value):
if key in self.cache:
self.cache.move_to_end(key)
self.cache[key] = value
if len(self.cache) > self.cap:
self.cache.popitem(last=False) # remove LRU (front)The LRU Cache is a classic hard interview problem that combines hash map (O(1) lookup) with ordered eviction (O(1) removal of least-recently-used). OrderedDict solves it elegantly.
When NOT to Use a Hash Map
Hash maps are powerful but not always correct. Three cases where they're the wrong choice:
Case 1: You need sorted order
Hash maps have no ordering. If you need to iterate elements in sorted order, use a sorted array, BST, or SortedList. Inserting into a hash map and sorting the output is O(n log n) — same as sorting upfront, but less clear.
Case 2: Space is severely constrained
A hash map for n elements takes O(n) space. If the problem asks for O(1) space, you need in-place techniques (cyclic sort, sign-as-flag) instead.
Case 3: Keys are integers in a known range [0, n)
A plain array is faster and simpler. An array of size n acts as a hash map with a perfect hash function (identity). No collisions, no overhead.
# Hash map approach — O(n) space, hash overhead
def count_chars_hashmap(s):
return Counter(s)
# Array approach — O(1) space (26 fixed chars), faster
def count_chars_array(s):
freq = [0] * 26
for c in s:
freq[ord(c) - ord('a')] += 1
return freqThe Master Cheat Sheet
CHOOSE HASH MAP WHEN:
✓ "Find two elements with relationship X" → Two-Pass Lookup
✓ "Count frequency / check equal distribution" → Frequency Counter
✓ "Group equivalent elements together" → Canonical Key Grouping
✓ "Track visited/seen to avoid revisiting" → Seen-Before Set
✓ "Count subarrays / longest subarray with sum K" → Prefix Sum + HashMap
PREFER HASH SET WHEN:
✓ Only need membership check (no stored value)
✓ ~50% less memory than equivalent dict
AVOID HASH MAP WHEN:
✗ Need sorted iteration → use sorted array / BST
✗ Need O(1) space → use in-place array technique
✗ Keys are integers [0, n) → use plain array (faster)
COMMON GOTCHAS:
! Lists are not hashable → convert to tuple first
! frozenset is hashable, set is not
! Mark BFS nodes visited before enqueuing, not after dequeuing
! Prefix sum + hash map handles negatives; sliding window does NOT
! Worst case O(n) due to collisions — say "amortized O(1)" in interviewsThe 8 Hash Map Problems Every Candidate Must Know
# | Problem | Pattern | Difficulty |
|---|---|---|---|
1 | Two-Pass Lookup | Easy | |
2 | Frequency Counter | Easy | |
3 | Seen-Before | Easy | |
4 | Frequency Counter | Easy | |
5 | Canonical Key | Medium | |
6 | Hash Set + Expand | Medium | |
7 | Prefix Sum + HashMap | Medium | |
8 | HashMap + OrderedDict | Hard |
Solve them in this order. Each one is a slightly more sophisticated version of the previous.
Bottom Line
The hash map is not just a data structure — it's a problem-solving move. When you see O(n²) from nested loops, the hash map question is always: "What can I precompute in one pass so the inner loop becomes a single O(1) lookup?"
The answer, more often than any other technique in this entire series, is: a hash map.
Master the five patterns. Know when O(1) breaks down. Volunteer the complexity analysis before you're asked. The candidates who do these three things in interviews don't just answer hash map questions correctly — they answer them in a way that signals deep understanding of how data structures work.
🧭 What's Next in This Series
You've now covered the two most interview-critical data structures: arrays and hash maps. Next up, the structure that trips up even experienced candidates under pressure:
Post 06: Linked Lists — reversal, cycle detection, the runner technique, and merging, with the mental model that eliminates null pointer errors for good
Related
Arrays & Strings: The Interview Bread and Butter
Arrays and strings appear in 40%+ of interviews and hide surprising complexity. Master prefix sums, sliding windows, two pointers, and in-place tricks that turn O(n²) into O(n).
Vibe Coding: How Non-Developers Are Building Real Software With AI in 2026
Vibe coding — describing what you want in plain English and letting AI write the code — has gone from a niche experiment to a mainstream movement. MIT Technology Review named it one of the 10 Breakthrough Technologies of 2026.
AI Agents Are Here — And They're About to Make Your Apps Obsolete
AI agents aren't just smarter chatbots — they're autonomous systems that plan, decide, and act across your entire software stack without you lifting a finger. Powered by Claude 4.6, Gemini 3.1 Pro, and GPT-5.3, this is the biggest shift in how we use technology since the smartphone. Here's what's ac
Comments