Python collections : Counter, defaultdict, deque and the rest

The Python collections module has been part of the standard library since version 2.4. It provides specialized data structures that solve recurring problems without any external dependency. Yet many developers keep writing counting loops, conditional key initialization, or Point classes with x, y, z fields when Counter, defaultdict, and namedtuple do exactly that, better and more readably.

Here are the six structures I use regularly, with the cases where they actually make a difference.

Counter : counting without a loop

Counter takes any iterable and returns a dict-like object where each element is mapped to its occurrence count. The most frequent key appears first.

from collections import Counter

fruits = ['apple', 'banana', 'orange', 'banana', 'apple', 'apple']
c = Counter(fruits)
print(c)
# Counter({'apple': 3, 'banana': 2, 'orange': 1})

most_common(n) returns the n most frequent elements in descending order:

c.most_common(2)
# [('apple', 3), ('banana', 2)]

What makes it genuinely useful in practice are the arithmetic operations between two Counter objects. Addition accumulates counts, subtraction keeps only positives, intersection (&) takes the minimum, union (|) takes the maximum:

c1 = Counter(a=3, b=1)
c2 = Counter(a=1, b=2)

print(c1 + c2)  # Counter({'a': 4, 'b': 3})
print(c1 - c2)  # Counter({'a': 2})
print(c1 & c2)  # Counter({'a': 1, 'b': 1})
print(c1 | c2)  # Counter({'a': 3, 'b': 2})

update() feeds a Counter from a new iterable without starting over:

c1.update(['a', 'a', 'c'])
# Counter({'a': 5, 'b': 1, 'c': 1})

When to use it. Frequency analysis, data histograms, token counting, log statistics. Any situation where you would write if key in d: d[key] += 1 else: d[key] = 1 is a direct candidate for Counter.

defaultdict : no more KeyError

defaultdict is a dict subclass that automatically creates a default value when a missing key is accessed. The factory is passed at construction time.

from collections import defaultdict

# list factory: each new key starts with an empty list
groups = defaultdict(list)
groups['fruit'].append('apple')
groups['fruit'].append('banana')
groups['car']  # simple access: creates the key with []

print(groups)
# defaultdict(<class 'list'>, {'fruit': ['apple', 'banana'], 'car': []})

With int as the factory, each new key starts at 0, enabling counting without a prior check:

counter = defaultdict(int)
for word in ['cat', 'dog', 'cat', 'parrot', 'dog', 'cat']:
    counter[word] += 1
# defaultdict(<class 'int'>, {'cat': 3, 'dog': 2, 'parrot': 1})

A very common real-world case: grouping data by key without checking whether the key already exists.

data = [('FR', 'Paris'), ('US', 'New York'), ('FR', 'Lyon'), ('US', 'Chicago')]
groups = defaultdict(list)

for country, city in data:
    groups[country].append(city)

# defaultdict(<class 'list'>, {'FR': ['Paris', 'Lyon'], 'US': ['New York', 'Chicago']})

When to use it. Any time you initialize a dict by testing if key not in d before inserting. defaultdict removes that pattern and makes the code more linear. For simple counting, Counter remains more idiomatic.

namedtuple : tuples with field names

A regular tuple forces access by index. namedtuple adds field names, making code self-documenting without the overhead of a full class.

from collections import namedtuple

Car = namedtuple('Car', ['brand', 'fuel'])
c = Car("BMW", "petrol")

print(c.brand)   # 'BMW'
print(c.fuel)    # 'petrol'
print(c[0])      # 'BMW'  — index access still works

_asdict() returns a standard dict (since Python 3.8, OrderedDict before that):

print(c._asdict())
# {'brand': 'BMW', 'fuel': 'petrol'}

_replace() creates a new instance with one or more fields changed. The namedtuple is immutable, so this is the only way to “modify” a value:

c2 = c._replace(fuel="electric")
# Car(brand='BMW', fuel='electric')

_fields exposes field names, useful for introspection:

Point = namedtuple('Point', 'x y z')
p = Point(1, 2, 3)
print(p._fields)  # ('x', 'y', 'z')

A namedtuple instance has no __dict__. That is what makes it lighter in memory than a standard dataclass on large collections: a dataclass allocates an attribute dictionary per instance, a namedtuple does not. For more on memory optimization, see Python __slots__ and memory optimization.

When to use it. Multi-value returns from functions, lightweight tabular data representation, replacing tuples like (x, y) or (id, name, date) where the index order inevitably becomes opaque. For mutable objects with business logic, dataclass is more appropriate.

deque : insertions and removals at both ends

A Python list is optimized for operations at the end. list.insert(0, x) and list.pop(0) are O(n) because they shift all elements. deque (double-ended queue) performs those same operations in O(1).

from collections import deque

d = deque([1, 2, 3, 4, 5])
d.append(6)      # O(1) on the right : deque([1, 2, 3, 4, 5, 6])
d.appendleft(0)  # O(1) on the left  : deque([0, 1, 2, 3, 4, 5, 6])
d.pop()          # O(1) on the right : deque([0, 1, 2, 3, 4, 5])
d.popleft()      # O(1) on the left  : deque([1, 2, 3, 4, 5])

rotate(n) performs a rotation of n steps to the right (negative for the left):

d = deque([1, 2, 3, 4, 5])
d.rotate(2)   # 2 steps to the right : deque([4, 5, 1, 2, 3])
d.rotate(-2)  # 2 steps to the left  : deque([1, 2, 3, 4, 5])

maxlen turns the deque into a sliding window. When full, each right-side addition automatically ejects the leftmost element:

history = deque(maxlen=3)
for i in range(6):
    history.append(i)
print(history)
# deque([3, 4, 5], maxlen=3)

When to use it. FIFO queue (appendleft + pop or append + popleft), LIFO stack, sliding window, fixed-size history buffer, BFS (breadth-first search). In a file processing pipeline, deque manages the in-memory buffer while shutil handles the disk side (copy, archive). Use list when frequent arbitrary index access is needed: deque does not support O(1) index access.

OrderedDict : when insertion order matters for comparison

Since Python 3.7, regular dicts maintain insertion order. So why does OrderedDict still exist?

Because it behaves differently during equality comparisons. Two dicts with the same key-value pairs but in different orders are equal. Two OrderedDict instances with the same pairs but different orders are not.

from collections import OrderedDict

d1 = {'a': 1, 'b': 2}
d2 = {'b': 2, 'a': 1}
print(d1 == d2)  # True

od1 = OrderedDict([('a', 1), ('b', 2)])
od2 = OrderedDict([('b', 2), ('a', 1)])
print(od1 == od2)  # False

move_to_end() is the other distinctive feature: moving a key to the end or start of the dict without recreating the object.

d = OrderedDict([('a', 1), ('b', 2), ('c', 3)])
d.move_to_end('a')               # 'a' moves to the end
d.move_to_end('c', last=False)   # 'c' moves to the front

When to use it. LRU cache (oldest at head, most recent at tail), protocols where field order in a message carries semantic meaning, tests where you want to verify not just values but order. For ordinary storage, dict is enough.

ChainMap : multiple dicts as a single unified view

ChainMap chains multiple dicts and exposes them as a single view. Lookups traverse the dicts from left to right and stop at the first match. Writes only touch the first dict.

from collections import ChainMap

global_config  = {'debug': True, 'timeout': 30, 'retries': 3}
project_config = {'timeout': 60}
local_config   = {'debug': False}

config = ChainMap(local_config, project_config, global_config)

print(config['debug'])    # False  (found in local_config)
print(config['timeout'])  # 60     (found in project_config)
print(config['retries'])  # 3      (found in global_config)

Writes do not propagate to underlying dicts:

config['timeout'] = 5
print(local_config)    # {'debug': False, 'timeout': 5}  — updated
print(project_config)  # {'timeout': 60}                 — unchanged

new_child() adds a temporary layer on top of the existing chain, useful for a local override without touching other levels:

temp_config = config.new_child({'retries': 1})
print(temp_config['retries'])  # 1  (temporary layer)
print(config['retries'])       # 3  (original chain unchanged)

When to use it. Layered configuration management (defaults < project config < environment variables < CLI flags), nested execution contexts, variable scoping simulation (similar to how the Python interpreter handles namespaces). A lightweight alternative to {**defaults, **overrides} which creates a copy, whereas ChainMap copies nothing.

Summary

Structure	Replaces	Main advantage
`Counter`	counting loop	arithmetic operations, `most_common`
`defaultdict`	`if key not in d`	automatic factory, linear code
`namedtuple`	anonymous tuple	named access, immutable, lightweight
`deque`	`list` as queue/stack	O(1) at both ends, `maxlen`
`OrderedDict`	`dict`	order-sensitive comparison, `move_to_end`
`ChainMap`	dict merge	unified view without copy, isolated writes

These six structures cover the bulk of what collections offers. They are all implemented in C in CPython, well documented, and each eliminates a category of boilerplate code.

Counter : counting without a loop#

defaultdict : no more KeyError#

namedtuple : tuples with field names#

deque : insertions and removals at both ends#

OrderedDict : when insertion order matters for comparison#

ChainMap : multiple dicts as a single unified view#

Summary#

Newsletter