Fix: Marshmallow Not Working — Schema Errors, Load vs Dump, and Field Validation

Q: How do I fix "Marshmallow Not Working — Schema Errors, Load vs Dump, and Field Validation"?

How to fix Marshmallow errors — Schema not validated on dump, ValidationError messages format, unknown field handling, missing vs default, post_load object construction, and Marshmallow 3 to 4 migration.

The Error

You define a Marshmallow schema and the wrong direction validates:

from marshmallow import Schema, fields

class UserSchema(Schema):
    name = fields.Str(required=True)
    age = fields.Int(required=True)

schema = UserSchema()
result = schema.dump({"name": "Alice"})   # No age
print(result)   # {"name": "Alice"} — no error, age missing silently

Or invalid input doesn’t raise but returns wrong data:

result = schema.load({"name": "Alice", "age": "not a number"})
# Raises ValidationError — but format may differ from expectations

Or unknown fields silently disappear:

result = schema.load({"name": "Alice", "age": 30, "extra": "value"})
print(result)   # {"name": "Alice", "age": 30} — "extra" silently dropped

Or post_load doesn’t construct objects as expected:

@post_load
def make_user(self, data, **kwargs):
    return User(**data)

result = schema.load({"name": "Alice", "age": 30})
print(type(result))   # User instance — but only if @post_load was hit

Or you migrate from Marshmallow 3 to 4 and decorators change behavior.

Marshmallow is the Python serialization library that predates Pydantic — declarative schemas, separate load() (deserialize/validate) and dump() (serialize) methods, post-processing hooks. It’s still heavily used in Flask and Pyramid ecosystems via flask-marshmallow and marshmallow-sqlalchemy. The load/dump asymmetry and unknown-field handling produce specific failure modes that don’t exist in Pydantic’s single-direction model. This guide covers each.

Why This Happens

Marshmallow separates two directions explicitly:

load() — JSON/dict → Python object (validates input)
dump() — Python object → JSON/dict (serializes output)

Only load() runs validators. dump() formats values but doesn’t validate — useful for serializing trusted internal data, surprising when you expect validation everywhere.

The default for unknown fields was RAISE in Marshmallow 2 but INCLUDE/EXCLUDE patterns evolved across versions. Without explicit handling, the behavior depends on global settings.

Fix 1: Basic Schema Setup

from marshmallow import Schema, fields, validate

class UserSchema(Schema):
    id = fields.Int(dump_only=True)              # Only on dump (e.g., DB-assigned)
    name = fields.Str(required=True, validate=validate.Length(min=1, max=100))
    email = fields.Email(required=True)
    age = fields.Int(required=True, validate=validate.Range(min=0, max=150))
    is_active = fields.Bool(load_default=True)   # Optional on load with default
    created_at = fields.DateTime(dump_only=True)

schema = UserSchema()

# Load (deserialize and validate)
data = schema.load({"name": "Alice", "email": "[email protected]", "age": 30})
print(data)   # {"name": "Alice", "email": "[email protected]", "age": 30, "is_active": True}

# Dump (serialize)
user_obj = {"id": 1, "name": "Alice", "email": "[email protected]", "age": 30, "is_active": True, "created_at": datetime.now()}
result = schema.dump(user_obj)
print(result)   # {"id": 1, "name": "Alice", ...}

dump_only — field appears only in output, ignored on input. load_only — field appears only in input (passwords, secrets), removed from output. load_default — default value used during load if missing. dump_default — default value used during dump if attribute missing.

Common Mistake: Using a single default argument (deprecated in Marshmallow 3). Specify load_default and dump_default explicitly — they often differ. For example, id has no load default (auto-generated by DB) but might have a dump default (None for new instances).

For Pydantic comparison and migration patterns, see Pydantic validation error.

Fix 2: Validation Errors

from marshmallow import ValidationError

try:
    schema.load({"name": "Alice", "email": "not-an-email", "age": -5})
except ValidationError as err:
    print(err.messages)
    # {
    #   "email": ["Not a valid email address."],
    #   "age": ["Must be greater than or equal to 0."]
    # }
    print(err.valid_data)
    # {"name": "Alice"}   — partial valid data

err.messages is a dict of field → list of error messages. Multi-level for nested schemas.

Custom error messages:

class UserSchema(Schema):
    age = fields.Int(
        required=True,
        validate=validate.Range(min=0, max=150),
        error_messages={"required": "Age is required", "invalid": "Age must be an integer"},
    )

Custom validators:

from marshmallow import validates, ValidationError

class UserSchema(Schema):
    age = fields.Int(required=True)
    birth_year = fields.Int(required=True)

    @validates("age")
    def validate_age(self, value, **kwargs):
        if value < 0:
            raise ValidationError("Age cannot be negative")
        if value > 150:
            raise ValidationError("Age unreasonably high")

    @validates_schema   # Validate across multiple fields
    def validate_consistency(self, data, **kwargs):
        current_year = 2025
        if data.get("birth_year") and data.get("age"):
            expected_age = current_year - data["birth_year"]
            if abs(data["age"] - expected_age) > 1:
                raise ValidationError(
                    f"Age {data['age']} doesn't match birth_year {data['birth_year']}"
                )

Common Mistake: Raising ValueError from a validator instead of ValidationError. ValueError isn’t caught by Marshmallow’s error handling — it propagates as a raw Python error. Always use marshmallow.ValidationError for validation-related issues.

Fix 3: Unknown Field Handling

from marshmallow import Schema, fields, INCLUDE, EXCLUDE, RAISE

class UserSchema(Schema):
    name = fields.Str()
    age = fields.Int()

    class Meta:
        unknown = EXCLUDE   # Drop unknown fields silently (default in many setups)
        # Other options:
        # unknown = INCLUDE   — Include unknown fields in output
        # unknown = RAISE     — Raise ValidationError on unknown fields

EXCLUDE is the most common (and surprisingly silent) default:

schema = UserSchema()
result = schema.load({"name": "Alice", "age": 30, "ssn": "123-45-6789"})
# {"name": "Alice", "age": 30} — ssn silently dropped

This is often what you want (frontend sends extra fields, you don’t care) but it can hide bugs.

Use RAISE for strict validation:

class StrictSchema(Schema):
    class Meta:
        unknown = RAISE

schema = StrictSchema()
schema.load({"name": "Alice", "extra": "value"})
# ValidationError: {"extra": ["Unknown field."]}

Per-call override:

result = schema.load(data, unknown=RAISE)

Pro Tip: Use RAISE for inbound user input (API request bodies) — catches client bugs (typos, deprecated field names) immediately. Use EXCLUDE for trusted internal data flow where you want forward-compatibility. The contract for an external API should be strict; the contract for internal services can be permissive.

Fix 4: `dump()` Doesn’t Validate

class UserSchema(Schema):
    age = fields.Int(required=True, validate=validate.Range(min=0, max=150))

schema = UserSchema()
result = schema.dump({"age": -10})
print(result)   # {"age": -10} — no validation!

dump() serializes existing data without validation. If you want validation on output too:

# Manually re-validate
schema.load(schema.dump(user_obj))

Or use marshmallow’s validate method directly:

errors = schema.validate(data)
if errors:
    print(errors)   # Same format as ValidationError.messages

Common Mistake: Assuming dump() validates and shipping invalid data. The asymmetry is intentional — dumping a database row shouldn’t fail just because the DB has dirty data — but new users expect symmetric behavior. If you need validation on dump, do it explicitly.

Fix 5: `post_load` and Object Construction

from marshmallow import Schema, fields, post_load
from dataclasses import dataclass

@dataclass
class User:
    name: str
    age: int

class UserSchema(Schema):
    name = fields.Str(required=True)
    age = fields.Int(required=True)

    @post_load
    def make_user(self, data, **kwargs):
        return User(**data)

schema = UserSchema()
result = schema.load({"name": "Alice", "age": 30})
print(type(result))   # <class 'User'>
print(result.name)    # Alice

post_load runs after validation — convert the dict to a domain object.

For dump, use pre_dump:

@pre_dump
def to_dict(self, obj, **kwargs):
    if isinstance(obj, User):
        return {"name": obj.name, "age": obj.age}
    return obj

Or use class-based field access via Meta.model:

class UserSchema(Schema):
    name = fields.Str()
    age = fields.Int()

    class Meta:
        # No automatic model binding in pure Marshmallow
        # For SQLAlchemy auto-binding, use marshmallow-sqlalchemy
        ...

For SQLAlchemy integration that auto-builds schemas from models, see SQLAlchemy not working.

Fix 6: Nested Schemas

class AddressSchema(Schema):
    street = fields.Str()
    city = fields.Str()

class UserSchema(Schema):
    name = fields.Str(required=True)
    address = fields.Nested(AddressSchema, required=True)
    secondary_addresses = fields.List(fields.Nested(AddressSchema))

data = {
    "name": "Alice",
    "address": {"street": "123 Main", "city": "NYC"},
    "secondary_addresses": [
        {"street": "456 Side St", "city": "Boston"},
    ],
}
result = UserSchema().load(data)

Forward references for self-referential schemas:

class CategorySchema(Schema):
    name = fields.Str()
    children = fields.List(fields.Nested("CategorySchema"))   # String reference

Partial nested loading:

result = UserSchema().load(data, partial=True)
# All required fields become optional

Common Mistake: Using a class reference in fields.Nested(AddressSchema) when the class hasn’t been defined yet (forward reference). Use a string "AddressSchema" for forward references — Marshmallow resolves it at load time. Class references only work for already-defined schemas.

Fix 7: Flask Integration with flask-marshmallow

pip install flask-marshmallow flask-sqlalchemy marshmallow-sqlalchemy

from flask import Flask, request, jsonify
from flask_sqlalchemy import SQLAlchemy
from flask_marshmallow import Marshmallow

app = Flask(__name__)
app.config["SQLALCHEMY_DATABASE_URI"] = "sqlite:///app.db"
db = SQLAlchemy(app)
ma = Marshmallow(app)

class User(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(100))
    email = db.Column(db.String(100))

class UserSchema(ma.SQLAlchemyAutoSchema):
    class Meta:
        model = User
        load_instance = True   # Returns User instance from load()

user_schema = UserSchema()
users_schema = UserSchema(many=True)

@app.route("/users", methods=["GET"])
def list_users():
    users = User.query.all()
    return jsonify(users_schema.dump(users))

@app.route("/users", methods=["POST"])
def create_user():
    try:
        user = user_schema.load(request.json, session=db.session)
        db.session.add(user)
        db.session.commit()
        return user_schema.dump(user), 201
    except ValidationError as err:
        return jsonify({"errors": err.messages}), 400

SQLAlchemyAutoSchema introspects the model — no need to redeclare fields. load_instance=True builds a model instance directly on load.

For Flask routing patterns that pair with marshmallow, see Flask 404 not found.

Fix 8: `Meta.fields` and `Meta.exclude`

class UserSchema(Schema):
    id = fields.Int()
    name = fields.Str()
    email = fields.Str()
    password_hash = fields.Str()

    class Meta:
        # Only include specific fields
        fields = ("id", "name", "email")
        # Or exclude specific fields
        # exclude = ("password_hash",)

Per-call field selection:

# Dump only specific fields
schema = UserSchema(only=("name", "email"))
schema.dump(user_obj)   # {"name": "...", "email": "..."}

# Exclude fields
schema = UserSchema(exclude=("password_hash",))

Common Mistake: Putting password_hash in the schema but forgetting to exclude it on dump. The hash gets serialized into responses. Always use load_only=True on password fields (they appear in load, but not dump):

class UserSchema(Schema):
    name = fields.Str()
    password = fields.Str(load_only=True)   # Never appears in dump output

dump_only + load_only cleanly separates directions:

class UserSchema(Schema):
    id = fields.Int(dump_only=True)          # DB-generated, no input
    password = fields.Str(load_only=True)     # User input, never output
    name = fields.Str()                        # Bidirectional

Production Incident Lens — API Contracts Drift Silently

The reason a marshmallow schema breaks in production is rarely “wrong code today.” It is “wrong code three releases ago that nobody noticed until a downstream consumer changed.” The blast radius is per-endpoint: every consumer of the affected endpoint either gets stale fields, missing fields, or wrong types. With dump() skipping validation, that drift can ship undetected for weeks.

Incident pattern — the renamed field that wasn’t. A backend developer renamed created_at → creation_time in the model. The schema still serializes both — created_at via dump_only=True mapped to the old DB column, creation_time via attribute="creation_time". Each release picks the one that exists. A frontend deploy starts reading creation_time because backend said it would; backend then reverts the change because of a migration bug; frontend now sees null for every timestamp because the schema is still emitting creation_time but the attribute resolves to nothing. No exception, no log line — just blank dates in the UI.

Mitigation: snapshot the schema in CI.

# tests/test_schema_contract.py
def test_user_schema_matches_snapshot(snapshot):
    schema = UserSchema()
    sample = {"id": 1, "name": "Alice", "email": "[email protected]", ...}
    snapshot.assert_match(schema.dump(sample), "user_schema.json")

Use pytest-snapshot or syrupy. Any field rename, type change, or field removal fails the snapshot test — forcing an explicit decision instead of a silent drift.

Incident pattern — the unknown field that broke a deploy. A frontend team adds device_id to login requests. The marshmallow schema has unknown = RAISE. Every login starts failing with {"device_id": ["Unknown field."]} because the backend hasn’t been updated. The frontend team rolled forward; the backend rolled back; the gap was hours.

Mitigation: per-direction unknown policy.

class LoginSchema(Schema):
    username = fields.Str(required=True)
    password = fields.Str(required=True, load_only=True)

    class Meta:
        unknown = EXCLUDE   # Tolerate new fields from clients

For incoming requests from clients you don’t fully control, EXCLUDE is more resilient. Reserve RAISE for service-to-service contracts where both sides ship together — frontend-to-backend is rarely that case.

Blast radius checklist for schema changes:

List every endpoint that uses the schema. If you don’t know, your schemas need a registry.
For each, identify the consumers. Mobile apps shipped a year ago can still be in the wild.
Add a snapshot test before merging. If the snapshot updates, the PR description must explain why.
Deprecate before removing. Keep old fields with dump_default=None for at least one release cycle.
Version the API path or media type when breaking the contract is unavoidable.

Common Mistake: Removing a field from a schema during a “cleanup” PR. The PR looks safe — no callers in the repo. But callers outside the repo (mobile clients, third-party integrations, scheduled jobs in another service) silently break. The schema is part of the contract; treat removals like database drops, not like code refactors.

The audit obligation is real: every schema change is a contract change. Most teams add a BREAKING: prefix to commit messages for schema removals so the release notes surface them automatically.

Still Not Working?

Marshmallow vs Pydantic

Marshmallow — Mature, separate load/dump, no Pydantic-style class-as-instance pattern. Best for Flask projects, complex serialization workflows.
Pydantic — Single-direction validation, class instances are the data, faster for typical workloads. Best for FastAPI, modern type-driven code.
msgspec — Fastest, less flexible. Best for high-throughput message decoding where every microsecond counts.

For new projects without Flask-marshmallow legacy, Pydantic 2 is usually the right choice. Marshmallow’s strength is its load/dump asymmetry and post-processing hooks — sometimes exactly what you need.

Migrating from Marshmallow 2 to 3 to 4

Marshmallow 3 was a major rewrite (2019); 4 (planned) continues evolving. Key migration points:

load() returns data directly (no .data attribute in v3+)
MarshalResult namedtuple removed in v3
default argument split into load_default / dump_default
missing argument renamed to load_default (with dump_default=missing if needed)

Common Mistake: Following Marshmallow 2 tutorials with v3+. The .data attribute on results doesn’t exist — result = schema.load(data) directly returns the data dict.

Custom Field Types

from marshmallow import fields, ValidationError

class HexColor(fields.Field):
    def _serialize(self, value, attr, obj, **kwargs):
        return f"#{value:06x}" if isinstance(value, int) else value

    def _deserialize(self, value, attr, data, **kwargs):
        if not isinstance(value, str) or not value.startswith("#"):
            raise ValidationError("Must be a hex color string")
        try:
            return int(value[1:], 16)
        except ValueError:
            raise ValidationError("Invalid hex color")

class ThemeSchema(Schema):
    primary_color = HexColor()

result = ThemeSchema().load({"primary_color": "#FF5733"})
print(result)   # {"primary_color": 16734003}

dump = ThemeSchema().dump({"primary_color": 16734003})
print(dump)   # {"primary_color": "#ff5733"}

_serialize is for dump; _deserialize is for load. Pair them for round-trip conversion.

Polymorphic Schemas

For unions / discriminated types:

from marshmallow_oneofschema import OneOfSchema

class CatSchema(Schema):
    name = fields.Str()
    meow_count = fields.Int()

class DogSchema(Schema):
    name = fields.Str()
    bark_volume = fields.Int()

class PetSchema(OneOfSchema):
    type_schemas = {"cat": CatSchema, "dog": DogSchema}

pip install marshmallow-oneofschema

Use when a field can be one of several different types — the schema picks the right sub-schema based on a discriminator field.

Testing Schemas

import pytest
from marshmallow import ValidationError

def test_valid():
    data = {"name": "Alice", "age": 30, "email": "[email protected]"}
    result = UserSchema().load(data)
    assert result["name"] == "Alice"

def test_invalid_email():
    with pytest.raises(ValidationError) as exc:
        UserSchema().load({"name": "Alice", "age": 30, "email": "invalid"})
    assert "email" in exc.value.messages

def test_required():
    with pytest.raises(ValidationError) as exc:
        UserSchema().load({"name": "Alice"})
    assert "age" in exc.value.messages

Pair these tests with a baseline of representative payloads — real anonymized API requests from staging logs. Synthetic tests pass; real-world payloads catch the unknown-field handling, encoding quirks, and edge cases your schema accidentally tolerated.

Integration with WebArgs (for Flask request parsing)

from webargs import fields
from webargs.flaskparser import use_args

@app.route("/users", methods=["POST"])
@use_args({
    "name": fields.Str(required=True),
    "age": fields.Int(required=True),
})
def create_user(args):
    # args is a dict with validated input
    return jsonify(args)

webargs uses marshmallow under the hood — concise way to validate Flask request bodies/query strings without defining a full Schema class.

Combining with Pydantic in a Codebase

Some teams use Pydantic for new code, Marshmallow for legacy:

# Convert Pydantic to Marshmallow data
pydantic_model.model_dump()   # → dict
marshmallow_schema.dump(pydantic_model.model_dump())   # → output dict

For Pydantic Settings patterns that overlap, see Pydantic Settings not working.

Datetime Serialization Doesn’t Match Frontend Expectations

The classic source of “the date is wrong” tickets. Marshmallow’s fields.DateTime() defaults to ISO 8601 with no explicit timezone. A naive datetime in your DB serializes as "2026-05-22T10:30:00" (no Z, no offset); the frontend parses it as local time and displays a different hour to every user.

Fix by explicit format and timezone-aware datetimes:

from datetime import datetime, timezone

class EventSchema(Schema):
    occurred_at = fields.DateTime(format="iso", required=True)

# Make sure you store timezone-aware datetimes
schema.dump({"occurred_at": datetime.now(timezone.utc)})
# {"occurred_at": "2026-05-22T10:30:00+00:00"}

For APIs consumed by browsers, force UTC at the schema level and let the client localize. Mixing zones at the schema is how you ship bugs that only appear during daylight saving transitions.

`many=True` Silently Drops Errors for Sub-Items

When loading a list of items, partial failures don’t always raise — depending on configuration, marshmallow can return valid items and skip invalid ones, or raise with the full set of errors keyed by index.

try:
    result = UserSchema(many=True).load([
        {"name": "Alice", "age": 30, "email": "[email protected]"},
        {"name": "Bob", "age": "thirty", "email": "bad"},   # Invalid
    ])
except ValidationError as err:
    print(err.messages)
    # {1: {"age": ["Not a valid integer."], "email": ["Not a valid email."]}}

The error dict is keyed by index, not by item content. If your bulk endpoint silently drops bad items instead of raising, check whether you wrapped the load in a try/except that suppresses ValidationError — that’s how data loss happens at scale.

Schema Performance Cliff on Large Lists

Marshmallow is reasonably fast for small payloads but slows on very large lists — thousands of nested items can take seconds to serialize. The cause is per-field method dispatch through Python.

Mitigations:

Use partial=True on dump when you can trade strictness for speed.
For pure speed, switch the hot endpoint to msgspec or orjson with manual schemas.
Pre-compute derived fields outside the schema. A schema method that calls the DB per item is an O(N) query trap — load the related data once, then map by ID inside pre_dump.

If your endpoint serializes hundreds of items per request and CPU profiles show marshmallow at the top, the rewrite is usually faster than tuning. Marshmallow is a correctness tool; it is not optimized for tight loops.

Fix: Marshmallow Not Working — Schema Errors, Load vs Dump, and Field Validation

The Error

Why This Happens

Fix 1: Basic Schema Setup

Fix 2: Validation Errors

Fix 3: Unknown Field Handling

Fix 4: `dump()` Doesn’t Validate

Fix 5: `post_load` and Object Construction

Fix 6: Nested Schemas

Fix 7: Flask Integration with flask-marshmallow

Fix 8: `Meta.fields` and `Meta.exclude`

Production Incident Lens — API Contracts Drift Silently

Still Not Working?

Marshmallow vs Pydantic

Migrating from Marshmallow 2 to 3 to 4

Custom Field Types

Polymorphic Schemas

Testing Schemas

Integration with WebArgs (for Flask request parsing)

Combining with Pydantic in a Codebase

Datetime Serialization Doesn’t Match Frontend Expectations

`many=True` Silently Drops Errors for Sub-Items

Schema Performance Cliff on Large Lists

Related Articles

Fix: msgspec Not Working — Struct Definition, Type Validation, and JSON/MessagePack Encoding

Fix: attrs Not Working — Slots Conflict, Validator Errors, and dataclasses Migration

Fix: Gunicorn Not Working — Worker Timeout, Boot Errors, and Signal Handling

Fix: Flask Route Returns 404 Not Found

The Error

Why This Happens

Fix 1: Basic Schema Setup

Fix 2: Validation Errors

Fix 3: Unknown Field Handling

Fix 4: dump() Doesn’t Validate

Fix 5: post_load and Object Construction

Fix 6: Nested Schemas

Fix 7: Flask Integration with flask-marshmallow

Fix 8: Meta.fields and Meta.exclude

Production Incident Lens — API Contracts Drift Silently

Still Not Working?

Marshmallow vs Pydantic

Migrating from Marshmallow 2 to 3 to 4

Custom Field Types

Polymorphic Schemas

Testing Schemas

Integration with WebArgs (for Flask request parsing)

Combining with Pydantic in a Codebase

Datetime Serialization Doesn’t Match Frontend Expectations

many=True Silently Drops Errors for Sub-Items

Schema Performance Cliff on Large Lists

Related Articles

Fix: msgspec Not Working — Struct Definition, Type Validation, and JSON/MessagePack Encoding

Fix: attrs Not Working — Slots Conflict, Validator Errors, and dataclasses Migration

Fix: Gunicorn Not Working — Worker Timeout, Boot Errors, and Signal Handling

Fix: Flask Route Returns 404 Not Found

Fix 4: `dump()` Doesn’t Validate

Fix 5: `post_load` and Object Construction

Fix 8: `Meta.fields` and `Meta.exclude`

`many=True` Silently Drops Errors for Sub-Items