The Python collections module has been part of the standard library since version 2.4. It provides specialized data structures that solve recurring problems without any external dependency. Yet many developers keep writing counting loops, conditional key initialization, or Point classes with x, y, z fields when Counter, defaultdict, and namedtuple do exactly that, better and more readably.
Here are the six structures I use regularly, with the cases where they actually make a difference.
Counter : counting without a loop
Counter takes any iterable and returns a dict-like object where each element is mapped to its occurrence count. The most frequent key appears first.
from collections import Counter
fruits = ['apple', 'banana', 'orange', 'banana', 'apple', 'apple']
c = Counter(fruits)
print(c)
# Counter({'apple': 3, 'banana': 2, 'orange': 1})
most_common(n) returns the n most frequent elements in descending order:
c.most_common(2)
# [('apple', 3), ('banana', 2)]
What makes it genuinely useful in practice are the arithmetic operations between two Counter objects. Addition accumulates counts, subtraction keeps only positives, intersection (&) takes the minimum, union (|) takes the maximum:
c1 = Counter(a=3, b=1)
c2 = Counter(a=1, b=2)
print(c1 + c2) # Counter({'a': 4, 'b': 3})
print(c1 - c2) # Counter({'a': 2})
print(c1 & c2) # Counter({'a': 1, 'b': 1})
print(c1 | c2) # Counter({'a': 3, 'b': 2})
update() feeds a Counter from a new iterable without starting over:
c1.update(['a', 'a', 'c'])
# Counter({'a': 5, 'b': 1, 'c': 1})
When to use it. Frequency analysis, data histograms, token counting, log statistics. Any situation where you would write if key in d: d[key] += 1 else: d[key] = 1 is a direct candidate for Counter.
defaultdict : no more KeyError
defaultdict is a dict subclass that automatically creates a default value when a missing key is accessed. The factory is passed at construction time.
from collections import defaultdict
# list factory: each new key starts with an empty list
groups = defaultdict(list)
groups['fruit'].append('apple')
groups['fruit'].append('banana')
groups['car'] # simple access: creates the key with []
print(groups)
# defaultdict(<class 'list'>, {'fruit': ['apple', 'banana'], 'car': []})
With int as the factory, each new key starts at 0, enabling counting without a prior check:
counter = defaultdict(int)
for word in ['cat', 'dog', 'cat', 'parrot', 'dog', 'cat']:
counter[word] += 1
# defaultdict(<class 'int'>, {'cat': 3, 'dog': 2, 'parrot': 1})
A very common real-world case: grouping data by key without checking whether the key already exists.
data = [('FR', 'Paris'), ('US', 'New York'), ('FR', 'Lyon'), ('US', 'Chicago')]
groups = defaultdict(list)
for country, city in data:
groups[country].append(city)
# defaultdict(<class 'list'>, {'FR': ['Paris', 'Lyon'], 'US': ['New York', 'Chicago']})
When to use it. Any time you initialize a dict by testing if key not in d before inserting. defaultdict removes that pattern and makes the code more linear. For simple counting, Counter remains more idiomatic.
namedtuple : tuples with field names
A regular tuple forces access by index. namedtuple adds field names, making code self-documenting without the overhead of a full class.
from collections import namedtuple
Car = namedtuple('Car', ['brand', 'fuel'])
c = Car("BMW", "petrol")
print(c.brand) # 'BMW'
print(c.fuel) # 'petrol'
print(c[0]) # 'BMW' — index access still works
_asdict() returns a standard dict (since Python 3.8, OrderedDict before that):
print(c._asdict())
# {'brand': 'BMW', 'fuel': 'petrol'}
_replace() creates a new instance with one or more fields changed. The namedtuple is immutable, so this is the only way to “modify” a value:
c2 = c._replace(fuel="electric")
# Car(brand='BMW', fuel='electric')
_fields exposes field names, useful for introspection:
Point = namedtuple('Point', 'x y z')
p = Point(1, 2, 3)
print(p._fields) # ('x', 'y', 'z')
A namedtuple instance has no __dict__. That is what makes it lighter in memory than a standard dataclass on large collections: a dataclass allocates an attribute dictionary per instance, a namedtuple does not. For more on memory optimization, see Python __slots__ and memory optimization.
When to use it. Multi-value returns from functions, lightweight tabular data representation, replacing tuples like (x, y) or (id, name, date) where the index order inevitably becomes opaque. For mutable objects with business logic, dataclass is more appropriate.
deque : insertions and removals at both ends
A Python list is optimized for operations at the end. list.insert(0, x) and list.pop(0) are O(n) because they shift all elements. deque (double-ended queue) performs those same operations in O(1).
from collections import deque
d = deque([1, 2, 3, 4, 5])
d.append(6) # O(1) on the right : deque([1, 2, 3, 4, 5, 6])
d.appendleft(0) # O(1) on the left : deque([0, 1, 2, 3, 4, 5, 6])
d.pop() # O(1) on the right : deque([0, 1, 2, 3, 4, 5])
d.popleft() # O(1) on the left : deque([1, 2, 3, 4, 5])
rotate(n) performs a rotation of n steps to the right (negative for the left):
d = deque([1, 2, 3, 4, 5])
d.rotate(2) # 2 steps to the right : deque([4, 5, 1, 2, 3])
d.rotate(-2) # 2 steps to the left : deque([1, 2, 3, 4, 5])
maxlen turns the deque into a sliding window. When full, each right-side addition automatically ejects the leftmost element:
history = deque(maxlen=3)
for i in range(6):
history.append(i)
print(history)
# deque([3, 4, 5], maxlen=3)
When to use it. FIFO queue (appendleft + pop or append + popleft), LIFO stack, sliding window, fixed-size history buffer, BFS (breadth-first search). In a file processing pipeline, deque manages the in-memory buffer while shutil handles the disk side (copy, archive). Use list when frequent arbitrary index access is needed: deque does not support O(1) index access.
OrderedDict : when insertion order matters for comparison
Since Python 3.7, regular dicts maintain insertion order. So why does OrderedDict still exist?
Because it behaves differently during equality comparisons. Two dicts with the same key-value pairs but in different orders are equal. Two OrderedDict instances with the same pairs but different orders are not.
from collections import OrderedDict
d1 = {'a': 1, 'b': 2}
d2 = {'b': 2, 'a': 1}
print(d1 == d2) # True
od1 = OrderedDict([('a', 1), ('b', 2)])
od2 = OrderedDict([('b', 2), ('a', 1)])
print(od1 == od2) # False
move_to_end() is the other distinctive feature: moving a key to the end or start of the dict without recreating the object.
d = OrderedDict([('a', 1), ('b', 2), ('c', 3)])
d.move_to_end('a') # 'a' moves to the end
d.move_to_end('c', last=False) # 'c' moves to the front
When to use it. LRU cache (oldest at head, most recent at tail), protocols where field order in a message carries semantic meaning, tests where you want to verify not just values but order. For ordinary storage, dict is enough.
ChainMap : multiple dicts as a single unified view
ChainMap chains multiple dicts and exposes them as a single view. Lookups traverse the dicts from left to right and stop at the first match. Writes only touch the first dict.
from collections import ChainMap
global_config = {'debug': True, 'timeout': 30, 'retries': 3}
project_config = {'timeout': 60}
local_config = {'debug': False}
config = ChainMap(local_config, project_config, global_config)
print(config['debug']) # False (found in local_config)
print(config['timeout']) # 60 (found in project_config)
print(config['retries']) # 3 (found in global_config)
Writes do not propagate to underlying dicts:
config['timeout'] = 5
print(local_config) # {'debug': False, 'timeout': 5} — updated
print(project_config) # {'timeout': 60} — unchanged
new_child() adds a temporary layer on top of the existing chain, useful for a local override without touching other levels:
temp_config = config.new_child({'retries': 1})
print(temp_config['retries']) # 1 (temporary layer)
print(config['retries']) # 3 (original chain unchanged)
When to use it. Layered configuration management (defaults < project config < environment variables < CLI flags), nested execution contexts, variable scoping simulation (similar to how the Python interpreter handles namespaces). A lightweight alternative to {**defaults, **overrides} which creates a copy, whereas ChainMap copies nothing.
Summary
| Structure | Replaces | Main advantage |
|---|---|---|
Counter | counting loop | arithmetic operations, most_common |
defaultdict | if key not in d | automatic factory, linear code |
namedtuple | anonymous tuple | named access, immutable, lightweight |
deque | list as queue/stack | O(1) at both ends, maxlen |
OrderedDict | dict | order-sensitive comparison, move_to_end |
ChainMap | dict merge | unified view without copy, isolated writes |
These six structures cover the bulk of what collections offers. They are all implemented in C in CPython, well documented, and each eliminates a category of boilerplate code.
