Python dataclasses automatically generate __init__, __repr__, and __eq__ from type annotations. The moment an attribute needs a mutable default value (list, dict, set), you hit a fundamental Python pitfall. field(default_factory=...) is the solution, and understanding why it is necessary changes how you reason about initialization in Python.

The mutable default value trap

In Python, default parameter values are evaluated once at function definition time, not at each call. This is a language property, not a bug.

def append_to(value, lst=[]):
    lst.append(value)
    return lst

print(append_to(1))  # [1]
print(append_to(2))  # [1, 2]  ← the same list is reused

Dataclasses explicitly refuse this pattern:

from dataclasses import dataclass

@dataclass
class Cart:
    items: list = []
    # ValueError: mutable default <class 'list'> for field items is not allowed: use default_factory

Python raises a ValueError at class definition time, which is safer than a silent bug. The error message even tells you the fix: use default_factory.

What would happen without the protection

If Python allowed mutable defaults in dataclasses, all instances would share the same object in memory. The equivalent behavior with a plain class makes it clear:

class Cart:
    items = []  # class attribute, shared across all instances

a = Cart()
b = Cart()
a.items.append("apple")
print(b.items)  # ['apple'] — b is contaminated
print(a.items is b.items)  # True

This is the behavior @dataclass refuses to introduce silently.

The solution: default_factory

field(default_factory=...) takes a zero-argument callable and calls it on each instance creation:

from dataclasses import dataclass, field

@dataclass
class Cart:
    items: list = field(default_factory=list)

a = Cart()
b = Cart()

a.items.append("apple")
print(b.items)           # []
print(a.items is b.items)  # False

list here is the builtin function, not an empty list. On each Cart(), Python calls list() to create a fresh independent list.

What @dataclass generates internally

@dataclass generates __init__ via exec() at class definition time. For a field with default_factory, the factory is stored on the Field object, accessible through dataclasses.fields():

import dataclasses
from dataclasses import dataclass, field

@dataclass
class Cart:
    items: list = field(default_factory=list)

f = dataclasses.fields(Cart)[0]
print(f.default)           # <dataclasses._MISSING_TYPE object>  (no fixed value)
print(f.default_factory)   # <class 'list'>
print(f.default_factory()) # []  — calling the factory directly

The generated __init__ looks roughly like this. The real code uses dataclasses._HAS_DEFAULT_FACTORY as a sentinel (which displays as <factory>):

def __init__(self, items=_HAS_DEFAULT_FACTORY):
    if items is _HAS_DEFAULT_FACTORY:
        self.items = list()   # factory call
    else:
        self.items = items

The sentinel distinguishes “caller passed nothing” from “caller passed None”. This means you can pass an explicit list at instantiation and the factory will not be called:

existing_items = ["bread", "milk"]
c = Cart(items=existing_items)
print(c.items)  # ['bread', 'milk'] — factory was not called

default vs default_factory: the rule

For an immutable value, direct assignment is the right syntax:

@dataclass
class Product:
    count: int = 1
    label: str = "unknown"

field() only comes into play in two situations: a mutable value, or an immutable value combined with field configuration options.

@dataclass
class Product:
    count: int = 1                                     # direct assignment, no field()
    internal_id: int = field(default=1, repr=False)    # excluded from __repr__
    created_by: str = field(default="system", init=False)  # not settable via __init__
    items: list = field(default_factory=list)          # mutable, factory required

repr=False excludes the field from the representation shown by print(). init=False prevents passing the value at instantiation — Python always uses the default. These are behavior options, not value options, and they require the field() wrapper.

CaseSyntaxExample
Immutable valueDirect assignmentcount: int = 1
Immutable value + behavior (repr, init, compare…)field(default=)field(default=1, repr=False)
Mutable value (list, dict, set, …)field(default_factory=)field(default_factory=list)
Dynamically computed valuefield(default_factory=) with lambdafield(default_factory=lambda: uuid.uuid4().hex)

Real-world use cases

Auto-generated identifiers:

import uuid
from dataclasses import dataclass, field

@dataclass
class Transaction:
    id: str = field(default_factory=lambda: uuid.uuid4().hex)
    amount: float = 0.0

uuid.uuid4().hex returns a UUID without dashes (32 hex characters). Each instance gets a unique identifier without having to pass one explicitly.

Non-trivial default configurations:

from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class AuditReport:
    errors: list[str] = field(default_factory=list)
    metadata: dict = field(default_factory=dict)
    created_at: datetime = field(default_factory=datetime.now)

datetime.now is passed as a function reference, without parentheses. Each report gets the timestamp of its creation, not a value frozen at class definition time.

Custom factory for initialized objects:

from dataclasses import dataclass, field

def default_config() -> dict:
    return {"debug": False, "timeout": 30, "retries": 3}

@dataclass
class HTTPService:
    url: str
    config: dict = field(default_factory=default_config)

Modifying service.config["debug"] = True does not affect other instances. Each one gets its own copy of the configuration dictionary.

A note on Python 3.10+ and dataclass(slots=True)

Since Python 3.10, @dataclass(slots=True) automatically generates __slots__. default_factory continues to work normally: the initialization mechanism is the same, only attribute storage changes.

from dataclasses import dataclass, field

@dataclass(slots=True)
class Cart:
    items: list = field(default_factory=list)

This is the recommended combination when you want both dynamic default values and the memory optimization from __slots__. For concrete memory benchmarks and inheritance pitfalls, see Python slots: optimizing instance memory.

Key takeaway

default_factory solves a fundamental Python problem: mutable objects as default values are shared across all instances of a class. The internal mechanism is simple (a callable call at __init__ time), but it guards against a class of bugs that are hard to diagnose, because unexpected state sharing only surfaces when you actually mutate the value. Custom factories go further: they let you initialize attributes with any logic, turning @dataclass into a tool suited well beyond simple data containers.