Performance

Python itertools: building lazy iterator pipelines

itertools is a standard library module that exposes composable iteration building blocks. Its value is not in replacing a for loop with a cryptically named function, but in processing data streams without ever loading them fully into memory. Every function returns a lazy iterator: nothing is computed until you consume the result. That is what lets you chain transformations over millions of elements with a constant memory footprint. Here are the functions I actually use, grouped by purpose, along with the pitfalls that waste time. ...

Python collections : Counter, defaultdict, deque and the rest

The Python collections module has been part of the standard library since version 2.4. It provides specialized data structures that solve recurring problems without any external dependency. Yet many developers keep writing counting loops, conditional key initialization, or Point classes with x, y, z fields when Counter, defaultdict, and namedtuple do exactly that, better and more readably. Here are the six structures I use regularly, with the cases where they actually make a difference. ...

Python operator: itemgetter, attrgetter and the art of replacing lambdas

The operator library has been part of Python’s standard library forever, and yet many developers keep writing lambda x: x[0] or lambda obj: obj.name when an operator function would do the same job, faster and more readably. Understanding what this library offers, and how it is implemented, changes the way you write functional code in Python. What operator contains operator exposes functions that match the language’s operators. operator.add(2, 3) is the functional equivalent of 2 + 3, operator.lt(a, b) corresponds to a < b. The point is not to replace operators in ordinary arithmetic code, that would be absurd. The point is being able to pass an operation as an argument to a higher-order function (map, filter, sorted, reduce, functools.partial). ...

Materialized views vs Django cache for slow queries

The instinctive response to a slow reporting endpoint is often cache. A @cache_page, a cache.set(), and the problem seems to vanish until the next expiration. This approach has a structural limitation that PostgreSQL materialized views solve at the root. The problem with cache on analytics endpoints Django cache stores the result of a Python view. The expensive SQL query still runs on every cache expiration. For a report built on multiple JOINs and aggregations, this means the first user after each cache miss waits several seconds. ...

Optimizing Django ORM Queries with defer(), only() and Prefetch()

By default, Django loads every field of a model on every query. On a list view showing 50 posts, that means fetching the full content, excerpt, metadata, and translation fields 50 times, even when only the title and date are displayed. Four QuerySet methods let you control exactly what gets loaded: defer(), only(), values_list(), and Prefetch(). The result: 2 SQL queries instead of N+2, with only the necessary columns. Django defer(): Exclude Heavy Fields from the QuerySet Django defer() tells the ORM to exclude specific fields from the initial query. Excluded fields remain accessible on the instance, but each access triggers an additional query. ...

Django select_for_update(): row-level locking for concurrent transactions

Two concurrent requests read a product’s stock, both see one unit remaining, and both confirm the order. Stock drops to -1. This kind of race condition is nearly impossible to reproduce in development and devastating in production. select_for_update() is Django’s answer: acquire a SQL lock at read time so no other transaction can modify the row before the current operation finishes. What select_for_update() does in SQL select_for_update() generates a SELECT ... FOR UPDATE. The lock is acquired as soon as the queryset is evaluated and held until the end of the transaction.atomic() block. Any other transaction that tries to acquire a lock on the same rows is blocked until the lock is released. ...

Python slots: cut instance memory by 40–60% without changing your logic

By default, Python allocates a __dict__ for every instance of a class. Flexible, yes. Cheap, no. When you hold thousands or millions of objects in memory at once, that dictionary overhead adds up fast. __slots__ removes it and replaces per-instance storage with compact internal descriptors. Typical result: 40 to 60 percent less memory per instance. What Python does without slots Without any declaration, each instance carries its own __dict__: class Point: def __init__(self, x: float, y: float) -> None: self.x = x self.y = y p = Point(1.0, 2.0) print(p.__dict__) # {'x': 1.0, 'y': 2.0} This dictionary allows adding attributes at runtime: ...

Renaming Django ORM fields with F() in values()

When exposing data from a Django model to an API or serializer, the model’s field names don’t always match what you want to return. The usual approach: fetch instances, then rename in Python. There’s a better option: let the database do the work using F() inside values(). The problem: model field names dictate output class Task(models.Model): name = models.CharField(...) created_at = models.DateTimeField(...) If you want to return task_name instead of name, the typical approach is to fetch the data and rename in Python, either with a dict comprehension or inside the serializer. Either way, the transformation happens after the fact, in memory. ...

Django Window Functions vs GROUP BY: Chainable QuerySets

Django ORM gives you two ways to add a computed value across a set of rows: annotate() with a classic aggregation (Max, Count, Sum…) or annotate() with a Window function. On the surface they look similar. In practice, they behave in fundamentally different ways — and picking the wrong one can break your entire filtering chain. GROUP BY with annotate(): rows that collapse When you combine values() and annotate() with an aggregation, Django generates a GROUP BY in SQL. The result: rows get merged, and you end up with one row per group. ...

Django in_bulk(): why it beats filter() for bulk lookups

When you have a list of identifiers and want to retrieve the corresponding instances, the usual reflex in Django is filter(pk__in=[...]). It works — one SQL query. But in_bulk() is an often-overlooked ORM optimization: it returns a dictionary {id: instance} instead of a QuerySet, which fundamentally changes how you access results. Where filter() forces an O(n) traversal to find an object by ID, in_bulk() gives direct O(1) access. in_bulk() signature and behavior QuerySet.in_bulk(id_list=(), *, field_name='pk') id_list: list of identifiers to retrieve. If omitted (called without arguments), returns all objects in the table. field_name: field used as the dictionary key. Must have unique=True, otherwise Django raises a ValueError. The generated SQL is a simple WHERE pk IN (...) clause — one query regardless of list size. ...

Newsletter