Pre-processing and post-processing¶
Decorator API¶
Data pre-processing and post-processing methods can be registered using the pre_load, post_load, pre_dump, and post_dump decorators.
from marshmallow import Schema, fields, post_load
class UserSchema(Schema):
name = fields.Str()
slug = fields.Str()
@post_load
def slugify_name(self, in_data, **kwargs):
in_data["slug"] = in_data["slug"].lower().strip().replace(" ", "-")
return in_data
schema = UserSchema()
result = schema.load({"name": "Steve", "slug": "Steve Loria "})
result["slug"] # => 'steve-loria'
Passing “many”¶
By default, pre- and post-processing methods receive one object/datum at a time, transparently handling the many parameter passed to the Schema’s dump/load method at runtime.
In cases where your pre- and post-processing methods needs to handle the input collection when processing multiple objects, add pass_many=True to the method decorators.
Your method will then receive the input data (which may be a single datum or a collection, depending on the dump/load call).
Example: Enveloping¶
One common use case is to wrap data in a namespace upon serialization and unwrap the data during deserialization.
from marshmallow import Schema, fields, pre_load, post_load, post_dump
class BaseSchema(Schema):
# Custom options
__envelope__ = {"single": None, "many": None}
__model__ = User
def get_envelope_key(self, many):
"""Helper to get the envelope key."""
key = self.__envelope__["many"] if many else self.__envelope__["single"]
assert key is not None, "Envelope key undefined"
return key
@pre_load(pass_many=True)
def unwrap_envelope(self, data, many, **kwargs):
key = self.get_envelope_key(many)
return data[key]
@post_dump(pass_many=True)
def wrap_with_envelope(self, data, many, **kwargs):
key = self.get_envelope_key(many)
return {key: data}
@post_load
def make_object(self, data, **kwargs):
return self.__model__(**data)
class UserSchema(BaseSchema):
__envelope__ = {"single": "user", "many": "users"}
__model__ = User
name = fields.Str()
email = fields.Email()
user_schema = UserSchema()
user = User("Mick", email="mick@stones.org")
user_data = user_schema.dump(user)
# {'user': {'email': 'mick@stones.org', 'name': 'Mick'}}
users = [
User("Keith", email="keith@stones.org"),
User("Charlie", email="charlie@stones.org"),
]
users_data = user_schema.dump(users, many=True)
# {'users': [{'email': 'keith@stones.org', 'name': 'Keith'},
# {'email': 'charlie@stones.org', 'name': 'Charlie'}]}
user_objs = user_schema.load(users_data, many=True)
# [<User(name='Keith Richards')>, <User(name='Charlie Watts')>]
Field-level pre- and post-processing¶
For field-level processing, pass pre_load and post_load
callables directly to individual fields. This is useful for simple, field-specific
transformations that don’t need access to the full schema data.
Each callable receives the field value and returns a transformed value. You can pass a single callable or a list of callables, which are applied in order.
from marshmallow import Schema, fields
class UserSchema(Schema):
name = fields.Str(pre_load=str.strip)
birthday = fields.Date(post_load=lambda value: value.year)
schema = UserSchema()
result = schema.load({"name": " Steve ", "birthday": "1994-05-12"})
result["name"] # => 'Steve'
result["birthday"] # => 1994
pre_load callables run before the field’s deserialization (and before allow_none is checked),
while post_load callables run after validation and deserialization.
Like validators, pre_load and post_load callables may raise a
ValidationError, which will be
stored under the field’s key in the errors dictionary.
Raising errors in pre-/post-processor methods¶
Pre- and post-processing methods may raise a ValidationError. By default, errors will be stored on the "_schema" key in the errors dictionary.
from marshmallow import Schema, fields, ValidationError, pre_load
class BandSchema(Schema):
name = fields.Str()
@pre_load
def unwrap_envelope(self, data, **kwargs):
if "data" not in data:
raise ValidationError('Input data must have a "data" key.')
return data["data"]
sch = BandSchema()
try:
sch.load({"name": "The Band"})
except ValidationError as err:
err.messages
# {'_schema': ['Input data must have a "data" key.']}
If you want to store and error on a different key, pass the key name as the second argument to ValidationError.
from marshmallow import Schema, fields, ValidationError, pre_load
class BandSchema(Schema):
name = fields.Str()
@pre_load
def unwrap_envelope(self, data, **kwargs):
if "data" not in data:
raise ValidationError(
'Input data must have a "data" key.', "_preprocessing"
)
return data["data"]
sch = BandSchema()
try:
sch.load({"name": "The Band"})
except ValidationError as err:
err.messages
# {'_preprocessing': ['Input data must have a "data" key.']}
Pre-/post-processor invocation order¶
In summary, the processing pipeline for deserialization is as follows:
@pre_load(pass_many=True)methods@pre_load(pass_many=False)methodsField-level
pre_loadcallablesField deserialization (
_deserialize)Field-level
validatecallables and@validatesmethodsField-level
post_loadcallables@validates_schemamethods (schema validators)@post_load(pass_many=False)methods@post_load(pass_many=True)methods
The pipeline for serialization is similar, except that the pass_many=True processors are invoked after the pass_many=False processors and there are no validators.
@pre_dump(pass_many=False)methods@pre_dump(pass_many=True)methodsdump(obj, many)(serialization)@post_dump(pass_many=False)methods@post_dump(pass_many=True)methods
Warning
You may register multiple processor methods on a Schema. Keep in mind, however, that the invocation order of decorated methods of the same type is not guaranteed. If you need to guarantee order of processing steps, you should put them in the same method.
from marshmallow import Schema, fields, pre_load
# YES
class MySchema(Schema):
field_a = fields.Raw()
@pre_load
def preprocess(self, data, **kwargs):
step1_data = self.step1(data)
step2_data = self.step2(step1_data)
return step2_data
def step1(self, data): ...
# Depends on step1
def step2(self, data): ...
# NO
class MySchema(Schema):
field_a = fields.Raw()
@pre_load
def step1(self, data, **kwargs): ...
# Depends on step1
@pre_load
def step2(self, data, **kwargs): ...