Python itertools: building lazy iterator pipelines

itertools is a standard library module that exposes composable iteration building blocks. Its value is not in replacing a for loop with a cryptically named function, but in processing data streams without ever loading them fully into memory. Every function returns a lazy iterator: nothing is computed until you consume the result. That is what lets you chain transformations over millions of elements with a constant memory footprint.

Here are the functions I actually use, grouped by purpose, along with the pitfalls that waste time.

Infinite iterators: count, cycle, repeat

These three functions produce endless streams. They are only useful with a stopping condition, otherwise the loop never terminates.

count(start, step) generates an infinite arithmetic sequence. Handy for numbering without managing a manual counter.

from itertools import count, islice

for i in count(10, 2):
    if i > 20:
        break
    print(i)  # 10, 12, 14, 16, 18, 20

cycle(iterable) repeats an iterable indefinitely. Typical for alternating between resources (round-robin over servers, colors, workers).

from itertools import cycle

colors = cycle(['red', 'green', 'blue'])
for _, color in zip(range(5), colors):
    print(color)  # red, green, blue, red, green

repeat(elem, times) repeats a value. Without times, it is infinite. Its main use is supplying a constant argument to map or starmap.

from itertools import repeat

list(repeat(7, 3))  # [7, 7, 7]
list(map(pow, range(5), repeat(2)))  # [0, 1, 4, 9, 16] — each base squared

The pitfall. count(), cycle(), and repeat() without times never stop. Always bound them with islice, takewhile, a zip over a finite sequence, or a break. A list(count()) freezes the process.

Slicing and filtering a stream

islice(iterable, start, stop, step) applies slice-style cutting to any iterator, including infinite ones, without materializing it into a list. It is the go-to tool for bounding a stream.

from itertools import islice, count

list(islice(count(), 5))        # [0, 1, 2, 3, 4]
list(islice(count(), 2, 8, 2))  # [2, 4, 6]

takewhile(pred, iterable) returns elements while the predicate holds, then stops at the first failure. dropwhile does the opposite: it skips elements while the predicate holds, then returns everything else without re-evaluating.

from itertools import takewhile, dropwhile

data = [2, 3, 8, 1, 9, 4]
list(takewhile(lambda x: x < 5, data))  # [2, 3]       — stops at 8
list(dropwhile(lambda x: x < 5, data))  # [8, 1, 9, 4] — skips 2 and 3, keeps the rest

filterfalse(pred, iterable) is the complement of filter: it keeps the elements for which the predicate is false. More readable than filter(lambda x: not pred(x), ...).

from itertools import filterfalse

list(filterfalse(lambda x: x % 2, range(10)))  # [0, 2, 4, 6, 8] — the even numbers

compress(data, selectors) filters data according to a second iterable of booleans. Useful when the selection mask is computed elsewhere, separately from the data.

from itertools import compress

names = ['Alice', 'Bob', 'Carol', 'Dan']
active = [True, False, True, False]
list(compress(names, active))  # ['Alice', 'Carol']

Combining and transforming

chain(*iterables) strings several iterables into a single sequence, without creating an intermediate list. Its chain.from_iterable variant flattens an iterable of iterables, ideal for lazy streams.

from itertools import chain

list(chain([1, 2], [3, 4], [5]))               # [1, 2, 3, 4, 5]
list(chain.from_iterable([[1, 2], [3, 4]]))    # [1, 2, 3, 4]

accumulate(iterable, func, initial) produces running results. By default it is a cumulative sum, but any binary function works (max, operator.mul, etc.). The initial parameter (Python 3.8+) sets a starting value.

from itertools import accumulate
import operator

list(accumulate([1, 2, 3, 4]))               # [1, 3, 6, 10] — running sum
list(accumulate([1, 2, 3, 4], operator.mul)) # [1, 2, 6, 24] — running product
list(accumulate([3, 1, 4, 1, 5], max))       # [3, 3, 4, 4, 5] — running maximum

starmap(func, iterable) applies a function to arguments already grouped into tuples. It is map when the arguments are pre-packed: starmap(f, [(a, b)]) calls f(a, b).

from itertools import starmap

points = [(3, 4), (6, 8), (5, 12)]
list(starmap(lambda x, y: (x**2 + y**2)**0.5, points))  # [5.0, 10.0, 13.0]

pairwise(iterable) (Python 3.10+) returns overlapping consecutive pairs of elements. Perfect for computing differences or comparing each element to the next.

from itertools import pairwise

list(pairwise([1, 2, 3, 4]))  # [(1, 2), (2, 3), (3, 4)]

temps = [10, 13, 12, 18]
[b - a for a, b in pairwise(temps)]  # [3, -1, 6] — successive changes

zip_longest(*iterables, fillvalue) merges several iterables like zip, but aligns on the longest by filling gaps with fillvalue instead of stopping at the shortest.

from itertools import zip_longest

list(zip_longest([1, 2, 3], ['a', 'b'], fillvalue='?'))
# [(1, 'a'), (2, 'b'), (3, '?')]

groupby: the sort-first pitfall

groupby(iterable, key) groups consecutive elements sharing the same key. The important word is consecutive: groupby only groups adjacent runs, it does not sort. On input not sorted by the key, you get fragmented groups.

from itertools import groupby

data = [('FR', 'Paris'), ('US', 'NYC'), ('FR', 'Lyon')]

# Wrong: not sorted by country → 'FR' shows up in two separate groups
for country, group in groupby(data, key=lambda x: x[0]):
    print(country, [v for _, v in group])
# FR ['Paris']
# US ['NYC']
# FR ['Lyon']

# Right: sort first by the same key
data.sort(key=lambda x: x[0])
for country, group in groupby(data, key=lambda x: x[0]):
    print(country, [v for _, v in group])
# FR ['Paris', 'Lyon']
# US ['NYC']

This is the most common mistake with this function: forgetting the sort with the same key as the groupby. Another subtlety, the group object is a shared iterator: if you move to the next group without consuming the previous one, its contents are lost. The key function pairs well with operator.itemgetter, faster and more readable than a lambda.

tee: duplicating an iterator, with caution

tee(iterable, n) returns n independent iterators from a single one. It does not copy the data: the iterators share an internal buffer that holds everything the slowest one has not yet consumed.

from itertools import tee

it = iter([1, 2, 3, 4])
a, b = tee(it, 2)
list(a)  # [1, 2, 3, 4]
list(b)  # [1, 2, 3, 4]

Two real pitfalls. First, do not touch the source iterable after tee: continuing to consume it desynchronizes the copies. Second, if one iterator races far ahead of the other, the internal buffer grows to hold the pending elements. Duplicating a stream then fully consuming the first copy before the second amounts to keeping everything in memory, which defeats the point of laziness. tee is efficient only if the copies advance at roughly the same pace.

Combinatorics: product, permutations, combinations

These functions generate arrangements. They stay lazy, but beware: the number of results explodes fast (factorial or exponential).

product(*iterables, repeat) computes the Cartesian product. It replaces nested for loops.

from itertools import product

list(product([1, 2], ['a', 'b']))   # [(1,'a'), (1,'b'), (2,'a'), (2,'b')]
list(product([0, 1], repeat=3))     # all binary combinations over 3 bits

permutations(iterable, r) generates all ordered arrangements of length r (order matters). combinations(iterable, r) generates unordered subsets of length r (order does not matter). combinations_with_replacement additionally allows repeating the same element.

from itertools import permutations, combinations, combinations_with_replacement

list(permutations('ABC', 2))                  # AB AC BA BC CA CB — 6 arrangements
list(combinations('ABC', 2))                  # AB AC BC          — 3 combinations
list(combinations_with_replacement('ABC', 2)) # AA AB AC BB BC CC — 6 with replacement

The pitfall. permutations(range(10)) produces 3,628,800 tuples. Never wrap these functions in a list() without bounding the input size, or you risk exhausting memory. Iterate directly with a break or an islice when you only need a sample.

Recap

Function	Role	Keep in mind
`count` / `cycle` / `repeat`	infinite streams	bound with `islice` or `takewhile`
`islice`	lazy slice	works on an infinite iterator
`takewhile` / `dropwhile`	cut by a predicate	stops / skips at the first switch
`filterfalse` / `compress`	filtering	complement of `filter` / external mask
`chain`	concatenate iterables	`from_iterable` to flatten
`accumulate`	running results	customizable `func` and `initial`
`starmap`	map over tuples of arguments	`f((a, b))` → `f(a, b)`
`pairwise`	consecutive pairs	Python 3.10+, ideal for deltas
`zip_longest`	zip without truncation	fills with `fillvalue`
`groupby`	group runs	sort first by the key
`tee`	duplicate an iterator	memory-heavy if desynchronized
`product` / `permutations` / `combinations`	combinatorics	output explodes, bound the input

itertools does not provide anything impossible to write by hand. It provides C-level, tested primitives that compose with each other and preserve laziness end to end. Where a list comprehension materializes everything, an itertools pipeline processes one element at a time. On large volumes, that is the difference between a script that fits in memory and one that saturates it. For complementary data structures, see the collections module.

Infinite iterators: count, cycle, repeat#

Slicing and filtering a stream#

Combining and transforming#

groupby: the sort-first pitfall#

tee: duplicating an iterator, with caution#

Combinatorics: product, permutations, combinations#

Recap#

Newsletter