Django ORM gives you two ways to add a computed value across a set of rows: annotate() with a classic aggregation (Max, Count, Sum…) or annotate() with a Window function. On the surface they look similar. In practice, they behave in fundamentally different ways — and picking the wrong one can break your entire filtering chain.
GROUP BY with annotate(): rows that collapse
When you combine values() and annotate() with an aggregation, Django generates a GROUP BY in SQL. The result: rows get merged, and you end up with one row per group.
from django.db.models import Max
def get_latest_dates(self) -> QuerySet:
return self.values('ctr_id').annotate(
latest_date=Max('evt_end_effect_date')
)
Generated SQL:
SELECT ctr_id, MAX(evt_end_effect_date) AS latest_date
FROM events
GROUP BY ctr_id
The result is a dictionary per group — {'ctr_id': 1, 'latest_date': date(2024, 12, 31)} — no longer full model instances, just the aggregated fields.
What you need to understand about chainability: you can still call .filter() or .exclude() afterwards, but the semantics shift completely. Filters apply to the aggregated groups, not to the original rows. You’re no longer filtering individual events — you’re filtering group results.
# ⚠️ This filter applies to groups, not the source rows
self.values('ctr_id').annotate(latest_date=Max('evt_end_effect_date')).filter(
latest_date__gte=date(2024, 1, 1)
)
# SQL: HAVING MAX(evt_end_effect_date) >= '2024-01-01'
# No select_related(), no access to other fields on the row
Window functions: annotate without touching the rows
A Window function computes a value over a partition of rows, but keeps all rows intact. Each row receives its computed value as an additional annotation.
from django.db.models import F, Window
from django.db.models.functions import FirstValue
def with_latest_dates(self) -> QuerySet:
return self.annotate(
latest_date=Window(
expression=FirstValue('evt_end_effect_date'),
partition_by=['ctr_id'],
order_by=F('evt_end_effect_date').desc(),
)
)
Generated SQL:
SELECT *,
FIRST_VALUE(evt_end_effect_date) OVER (
PARTITION BY ctr_id
ORDER BY evt_end_effect_date DESC
) AS latest_date
FROM events
All rows are present. Each one now has latest_date — the most recent date within its ctr_id group. And the QuerySet remains a normal QuerySet.
# ✅ Everything is still possible after a Window annotation
qs = self.with_latest_dates()
qs.filter(status='active') # normal filter (WHERE on non-window column)
qs.select_related('contract') # normal join
qs.exclude(latest_date__isnull=True) # filter on window annotation -> subquery
qs.order_by('ctr_id', '-latest_date')
Django GROUP BY vs Window Function: a visual comparison
GROUP BY Window function
────────────────────────────────── ──────────────────────────────────
ctr_id=1, evt=A → row 1 ctr_id=1, evt=A, latest=A → row 1
ctr_id=1, evt=B → ctr_id=1, evt=B, latest=A → row 2
ctr_id=1, evt=C → MAX(C) ──► C ctr_id=1, evt=C, latest=A → row 3
ctr_id=2, evt=D → row 2 ctr_id=2, evt=D, latest=D → row 4
ctr_id=2, evt=E → MAX(E) ──► E ctr_id=2, evt=E, latest=D → row 5
GROUP BY collapses. Window annotates.
Concrete use case: fetching the most recent row per group
Goal: for each contract, retrieve the event with the most recent end-effect date, with access to all its fields.
With GROUP BY, this is impossible directly — you lose the row’s fields. With Window + RowNumber:
from django.db.models import F, Window
from django.db.models.functions import RowNumber
def get_latest_event_per_contract(self) -> QuerySet:
return (
self.annotate(
row_num=Window(
expression=RowNumber(),
partition_by=['ctr_id'],
order_by=F('evt_end_effect_date').desc(),
)
)
.filter(row_num=1)
.select_related('contract')
)
RowNumber() numbers the rows within each partition, sorted by descending date. filter(row_num=1) keeps only the first one — the most recent. Django can’t add a direct WHERE clause on a Window function (not possible in standard SQL), so it generates a subquery instead: SELECT * FROM (...) "qualify" WHERE "row_num" = 1.
Window functions available in Django
from django.db.models.functions import (
FirstValue, # first value in the partition
LastValue, # last value
Lag, # value N rows back: Lag('field', offset=1)
Lead, # value N rows ahead: Lead('field', offset=1)
NthValue, # nth value: NthValue('field', nth=2)
Rank, # rank with ties (1, 1, 3)
DenseRank, # rank without gaps (1, 1, 2)
RowNumber, # unique row number per partition
CumeDist, # cumulative distribution (0.0 → 1.0)
PercentRank, # relative rank (0.0 → 1.0)
Ntile, # split into N buckets: Ntile(num_buckets=4)
)
GROUP BY or Django Window Function: a decision table
values().annotate(Max(...)) | annotate(Window(...)) | |
|---|---|---|
| SQL | GROUP BY | OVER (PARTITION BY ...) |
| Rows kept | One per group | All |
| Field access | Only those in values() | All |
| Chainability | Filter on groups (HAVING) | Filter via subquery (WHERE on rows) |
select_related() | ❌ | ✅ |
| Use case | Count, sum, global max | Rank, neighbouring value, per-row max |
The simple rule: if you need to keep rows intact and keep filtering normally after your computation, use a Window function. If you only want aggregated results (stats, totals, global maxima), values().annotate() is more direct and easier to read.
Working on Django ORM optimization? I also wrote about Django in_bulk(): why it beats filter() for bulk lookups.