Python dataclasses automatically generate __init__, __repr__, and __eq__ from type annotations. The moment an attribute needs a mutable default value (list, dict, set), you hit a fundamental Python pitfall. field(default_factory=...) is the solution, and understanding why it is necessary changes how you reason about initialization in Python.
The mutable default value trap
In Python, default parameter values are evaluated once at function definition time, not at each call. This is a language property, not a bug.
def append_to(value, lst=[]):
lst.append(value)
return lst
print(append_to(1)) # [1]
print(append_to(2)) # [1, 2] ← the same list is reused
Dataclasses explicitly refuse this pattern:
from dataclasses import dataclass
@dataclass
class Cart:
items: list = []
# ValueError: mutable default <class 'list'> for field items is not allowed: use default_factory
Python raises a ValueError at class definition time, which is safer than a silent bug. The error message even tells you the fix: use default_factory.
What would happen without the protection
If Python allowed mutable defaults in dataclasses, all instances would share the same object in memory. The equivalent behavior with a plain class makes it clear:
class Cart:
items = [] # class attribute, shared across all instances
a = Cart()
b = Cart()
a.items.append("apple")
print(b.items) # ['apple'] — b is contaminated
print(a.items is b.items) # True
This is the behavior @dataclass refuses to introduce silently.
The solution: default_factory
field(default_factory=...) takes a zero-argument callable and calls it on each instance creation:
from dataclasses import dataclass, field
@dataclass
class Cart:
items: list = field(default_factory=list)
a = Cart()
b = Cart()
a.items.append("apple")
print(b.items) # []
print(a.items is b.items) # False
list here is the builtin function, not an empty list. On each Cart(), Python calls list() to create a fresh independent list.
What @dataclass generates internally
@dataclass generates __init__ via exec() at class definition time. For a field with default_factory, the factory is stored on the Field object, accessible through dataclasses.fields():
import dataclasses
from dataclasses import dataclass, field
@dataclass
class Cart:
items: list = field(default_factory=list)
f = dataclasses.fields(Cart)[0]
print(f.default) # <dataclasses._MISSING_TYPE object> (no fixed value)
print(f.default_factory) # <class 'list'>
print(f.default_factory()) # [] — calling the factory directly
The generated __init__ looks roughly like this. The real code uses dataclasses._HAS_DEFAULT_FACTORY as a sentinel (which displays as <factory>):
def __init__(self, items=_HAS_DEFAULT_FACTORY):
if items is _HAS_DEFAULT_FACTORY:
self.items = list() # factory call
else:
self.items = items
The sentinel distinguishes “caller passed nothing” from “caller passed None”. This means you can pass an explicit list at instantiation and the factory will not be called:
existing_items = ["bread", "milk"]
c = Cart(items=existing_items)
print(c.items) # ['bread', 'milk'] — factory was not called
default vs default_factory: the rule
For an immutable value, direct assignment is the right syntax:
@dataclass
class Product:
count: int = 1
label: str = "unknown"
field() only comes into play in two situations: a mutable value, or an immutable value combined with field configuration options.
@dataclass
class Product:
count: int = 1 # direct assignment, no field()
internal_id: int = field(default=1, repr=False) # excluded from __repr__
created_by: str = field(default="system", init=False) # not settable via __init__
items: list = field(default_factory=list) # mutable, factory required
repr=False excludes the field from the representation shown by print(). init=False prevents passing the value at instantiation — Python always uses the default. These are behavior options, not value options, and they require the field() wrapper.
| Case | Syntax | Example |
|---|---|---|
| Immutable value | Direct assignment | count: int = 1 |
Immutable value + behavior (repr, init, compare…) | field(default=) | field(default=1, repr=False) |
| Mutable value (list, dict, set, …) | field(default_factory=) | field(default_factory=list) |
| Dynamically computed value | field(default_factory=) with lambda | field(default_factory=lambda: uuid.uuid4().hex) |
Real-world use cases
Auto-generated identifiers:
import uuid
from dataclasses import dataclass, field
@dataclass
class Transaction:
id: str = field(default_factory=lambda: uuid.uuid4().hex)
amount: float = 0.0
uuid.uuid4().hex returns a UUID without dashes (32 hex characters). Each instance gets a unique identifier without having to pass one explicitly.
Non-trivial default configurations:
from dataclasses import dataclass, field
from datetime import datetime
@dataclass
class AuditReport:
errors: list[str] = field(default_factory=list)
metadata: dict = field(default_factory=dict)
created_at: datetime = field(default_factory=datetime.now)
datetime.now is passed as a function reference, without parentheses. Each report gets the timestamp of its creation, not a value frozen at class definition time.
Custom factory for initialized objects:
from dataclasses import dataclass, field
def default_config() -> dict:
return {"debug": False, "timeout": 30, "retries": 3}
@dataclass
class HTTPService:
url: str
config: dict = field(default_factory=default_config)
Modifying service.config["debug"] = True does not affect other instances. Each one gets its own copy of the configuration dictionary.
A note on Python 3.10+ and dataclass(slots=True)
Since Python 3.10, @dataclass(slots=True) automatically generates __slots__. default_factory continues to work normally: the initialization mechanism is the same, only attribute storage changes.
from dataclasses import dataclass, field
@dataclass(slots=True)
class Cart:
items: list = field(default_factory=list)
This is the recommended combination when you want both dynamic default values and the memory optimization from __slots__. For concrete memory benchmarks and inheritance pitfalls, see Python slots: optimizing instance memory.
Key takeaway
default_factory solves a fundamental Python problem: mutable objects as default values are shared across all instances of a class. The internal mechanism is simple (a callable call at __init__ time), but it guards against a class of bugs that are hard to diagnose, because unexpected state sharing only surfaces when you actually mutate the value. Custom factories go further: they let you initialize attributes with any logic, turning @dataclass into a tool suited well beyond simple data containers.
