In event-driven systems, events are contracts.
A service publishes an event. Other services consume it. Those consumers may belong to different teams, different systems, or different parts of the business.
That means an event is not just an internal implementation detail.
Once other systems depend on it, the event schema becomes part of the architecture.
And like every contract, it needs to evolve carefully.
Imagine a service publishes this event:
{
"eventType": "OrderCreated",
"eventVersion": 1,
"orderId": "order_123",
"customerId": "customer_456",
"amount": 49.99
}
Several consumers use it:
Payment Service
Inventory Service
Email Service
Analytics Service
CRM integration
Later, the Order Service team decides to rename amount to totalAmount.
The new event becomes:
{
"eventType": "OrderCreated",
"eventVersion": 1,
"orderId": "order_123",
"customerId": "customer_456",
"totalAmount": 49.99
}
That looks like a small cleanup.
But every consumer expecting amount may now fail.
A small schema change can break multiple services.
That is why event versioning matters.
One thing that makes event schemas tricky is that events often live longer than normal API responses.
Events may be:
Stored in Kafka topics
Saved in event stores
Written to audit logs
Archived for compliance
Replayed months later
Used to rebuild projections
Consumed by systems you forgot about
An API response is usually consumed immediately.
An event may be consumed now, retried later, replayed next month, or inspected years later.
That means schema changes need to consider not only current consumers, but also old messages that may still exist.
If you replay old events, your current consumer code may need to understand old schemas.
That is easy to forget.
A common mistake is to treat events like database records.
In a database, you can migrate a column.
For example:
Rename amount to total_amount
Then the application is updated to use the new column.
But events are different.
Old events may still exist with the old field name.
New events may use the new field name.
Consumers may need to handle both.
For example:
{
"eventVersion": 1,
"amount": 49.99
}
and:
{
"eventVersion": 2,
"totalAmount": 49.99
}
If consumers only support the latest shape, replaying old events can break them.
Events are historical facts. You should be careful when changing their meaning.
The safest changes are backward-compatible.
A backward-compatible change does not break existing consumers.
The most common example is adding an optional field.
For example:
{
"eventType": "OrderCreated",
"eventVersion": 1,
"orderId": "order_123",
"customerId": "customer_456",
"amount": 49.99,
"currency": "EUR"
}
If old consumers do not care about currency, they can ignore it.
That is usually safe.
Backward-compatible changes include:
Adding optional fields
Adding new event types
Adding metadata fields
Adding fields with sensible defaults
Extending enums carefully when consumers tolerate unknown values
These changes work when consumers are designed to ignore fields they do not understand.
That is a useful rule:
Consumers should ignore unknown fields.
Breaking changes are changes that can break existing consumers.
Examples include:
Renaming a field
Removing a field
Changing a field type
Changing the meaning of a field
Changing required fields
Changing the event type name
Changing enum values in an incompatible way
Changing the structure of nested data
For example:
{
"amount": 49.99
}
to:
{
"amount": {
"value": 49.99,
"currency": "EUR"
}
}
That may be a better model, but it is a breaking change.
Existing consumers expecting a number will fail.
The dangerous part is that breaking changes are often made with good intentions.
A field name is unclear.
A type is too limited.
The domain model changed.
A new requirement arrived.
The reason may be valid, but consumers still need a safe migration path.
A simple way to make schema evolution explicit is to include an event version.
For example:
{
"eventId": "evt_123",
"eventType": "OrderCreated",
"eventVersion": 1,
"occurredAt": "2026-06-23T10:00:00Z",
"orderId": "order_123",
"amount": 49.99
}
If the schema changes in a breaking way, publish version 2:
{
"eventId": "evt_456",
"eventType": "OrderCreated",
"eventVersion": 2,
"occurredAt": "2026-06-23T10:05:00Z",
"orderId": "order_123",
"totalAmount": {
"value": 49.99,
"currency": "EUR"
}
}
Now consumers can choose how to handle each version.
For example:
If eventVersion is 1:
read amount
If eventVersion is 2:
read totalAmount.value and totalAmount.currency
This is not glamorous, but it is honest.
Adding eventVersion is useful, but it does not solve everything by itself.
You still need rules.
For example:
When do we increment the version?
How long do we support old versions?
Can old versions still be replayed?
Who owns the schema?
How are consumers notified?
Are schemas validated in CI?
Can a producer publish unsupported versions?
Without these rules, eventVersion becomes just another field.
Versioning only works when it is part of the engineering process.
When possible, prefer additive changes.
Instead of renaming a field immediately, add the new field while keeping the old one.
For example, start with:
{
"amount": 49.99
}
Then publish both:
{
"amount": 49.99,
"totalAmount": 49.99
}
Consumers can migrate from amount to totalAmount.
After you know all consumers have migrated, you can consider removing the old field in a new major version.
This is less elegant than a clean schema change, but distributed systems often need boring, safe migrations.
A safe migration might look like this:
1. Add new field
2. Publish old and new fields together
3. Update consumers to use new field
4. Monitor usage
5. Stop relying on old field
6. Remove old field only in a new event version
This avoids breaking consumers in one step.
Changing the meaning of a field is one of the most dangerous schema changes.
For example, imagine this event:
{
"eventType": "PaymentSucceeded",
"amount": 49.99
}
At first, amount means the amount paid by the customer.
Later, the producer changes it to mean the merchant net amount after fees.
The field name and type did not change.
Consumers do not crash.
But they may now calculate revenue incorrectly.
This is worse than a visible failure.
The system keeps running, but the business data becomes wrong.
If the meaning changes, treat it as a breaking change.
Use a clearer field name:
grossAmount
netAmount
feeAmount
or publish a new event version.
Schema compatibility is not only about types.
It is also about meaning.
Consumers should generally ignore fields they do not understand.
This makes additive changes safer.
For example, if a producer adds:
{
"discountCode": "SUMMER10"
}
a consumer that does not need discounts should not fail.
This is a useful compatibility rule:
Be strict about what you produce.
Be tolerant about what you consume.
That does not mean consumers should accept invalid messages silently.
Required fields, invalid types, and unsupported versions should still be handled carefully.
But unknown optional fields should not break the consumer.
Adding a required field can be breaking.
For example, if a new consumer expects every OrderCreated event to contain currency,
but old events do not have it, replay may fail.
Instead of assuming the field exists, the consumer needs a strategy.
Possible options:
Use a default value
Handle old versions differently
Backfill old events or projections
Reject unsupported versions explicitly
Avoid replaying old events into that consumer
The right choice depends on the business.
For money, guessing a currency may be dangerous.
For a display field, a default may be acceptable.
Again, the schema decision depends on the domain.
Enums look safe, but they can break consumers.
Imagine an event contains:
{
"paymentStatus": "Succeeded"
}
Consumers may support:
Pending
Succeeded
Failed
Later, the producer adds:
PartiallyRefunded
ChargebackPending
RequiresManualReview
If a consumer does not tolerate unknown enum values, it may crash.
This is common.
When designing event schemas, assume enums may grow.
Consumers should usually have an explicit fallback for unknown values.
For example:
If status is unknown:
log it
mark as unsupported
avoid corrupting state
alert if needed
Do not silently map unknown statuses to a default if that could hide business meaning.
There are different ways to version events.
One approach is a version field:
eventType: OrderCreated
eventVersion: 2
Another approach is versioned event names:
OrderCreatedV1
OrderCreatedV2
Both can work.
I usually prefer keeping the business event name stable and putting the version in metadata:
{
"eventType": "OrderCreated",
"eventVersion": 2
}
This keeps the event concept clear.
But for very different meanings, a new event type may be better.
For example:
OrderCreated
OrderSubmitted
OrderConfirmed
Those may not be versions of the same event. They may be different business facts.
Do not use versioning to hide a domain change.
Sometimes the right answer is a new event.
In larger systems, a schema registry can help.
A schema registry stores event schemas and validates compatibility.
It can answer questions like:
Is this new schema backward-compatible?
Which version is currently used?
Can this producer publish this event?
Can this consumer read this schema?
This is common with formats like Avro, Protobuf, and JSON Schema.
A schema registry is useful because it makes event contracts explicit.
But it does not replace good design.
A schema registry can tell you that a field type changed.
It may not fully understand that the business meaning of amount changed from gross to net.
Tools help with schema compatibility.
Humans still need to own semantic compatibility.
Another useful practice is consumer-driven contract testing.
The idea is simple:
Consumers define what they need from an event.
Producers are tested against those expectations.
For example, the Email Service may require:
OrderCreated contains orderId
OrderCreated contains customerId
OrderCreated contains customerEmail
The Analytics Service may require:
OrderCreated contains orderId
OrderCreated contains totalAmount
OrderCreated contains currency
If the producer changes the schema in a way that breaks these expectations, tests fail before production.
This is valuable because producers do not always know every consumer.
Contract tests make those dependencies visible.
A common migration problem is that producers and consumers are not deployed at the same time.
For a while, you may have:
New producer
Old consumers
or:
Old producer
New consumers
This means your schema changes must support rolling deployments.
For example, a new producer should not publish a format that old consumers cannot parse, unless those consumers have already been updated or isolated.
A new consumer should not assume fields that old events do not contain.
Distributed systems rarely change all at once.
Schema evolution needs to support mixed versions.
Even after every live consumer has migrated, old schemas may still matter because of replay.
For example, you may need to rebuild a projection from events that are six months old.
Those events may be version 1.
Your current consumer may be written for version 3.
If it cannot handle version 1, replay fails.
You need a strategy:
Keep support for old event versions
Transform old events during replay
Store upcasted events
Use dedicated migration jobs
Limit replay windows
Snapshot projections to reduce replay needs
This is especially important for event-sourced systems, but it also matters for normal event-driven architectures when events are retained for a long time.
Upcasting means transforming an old event schema into a newer shape before the business logic handles it.
For example, old event:
{
"eventVersion": 1,
"amount": 49.99
}
can be transformed into:
{
"eventVersion": 2,
"totalAmount": {
"value": 49.99,
"currency": "EUR"
}
}
If the old event did not contain currency, the upcaster needs a rule.
For example:
Use EUR for events before a known migration date
Look up currency from the order record
Reject if currency cannot be determined
Upcasting can keep business logic cleaner because handlers can work with a newer internal representation.
But upcasting rules need to be tested carefully.
A bad upcaster can corrupt historical meaning.
Good event metadata makes schema evolution easier.
Useful metadata includes:
eventId
eventType
eventVersion
occurredAt
publishedAt
producer
producerVersion
correlationId
causationId
tenantId
schemaId
For example:
{
"eventId": "evt_123",
"eventType": "OrderCreated",
"eventVersion": 2,
"occurredAt": "2026-06-23T10:00:00Z",
"producer": "order-service",
"correlationId": "corr_456",
"payload": {
"orderId": "order_123",
"totalAmount": {
"value": 49.99,
"currency": "EUR"
}
}
}
This metadata helps with debugging, tracing, compatibility checks, and replay.
It also makes events easier to reason about when something breaks.
Another common mistake is publishing events that directly mirror internal database models.
For example:
{
"table": "orders",
"operation": "insert",
"row": {
"id": "order_123",
"status": "P",
"amt": 49.99
}
}
This leaks implementation details.
If the database schema changes, the event changes.
Consumers become coupled to the producer's internal model.
A better event describes a business fact:
{
"eventType": "OrderCreated",
"orderId": "order_123",
"status": "PendingPayment",
"totalAmount": 49.99,
"currency": "EUR"
}
Domain events should be designed as integration contracts, not as database change notifications.
Change Data Capture can be useful, but if raw database changes become your public event contract, consumers may become tightly coupled to internal storage decisions.
Events should be documented.
At minimum, documentation should explain:
What business fact the event represents
When the event is published
Who owns the event
What each field means
Which fields are required
Which fields are optional
Current version
Compatibility rules
Example payloads
Known consumers
Retention and replay expectations
This does not need to be complicated.
Even a simple event catalog can help.
The important thing is that teams know what they are consuming.
An undocumented event becomes tribal knowledge.
Tribal knowledge becomes production risk.
Schema changes should be tested before production.
Useful tests include:
Producer schema validation
Consumer deserialization tests
Backward compatibility tests
Contract tests
Replay tests with old events
Unknown field tolerance tests
Unknown enum value tests
Unsupported version tests
For example, a consumer should be tested with:
A valid current event
A valid old event
An event with an unknown optional field
An event with an unsupported version
An event missing a required field
This gives confidence that schema evolution will not break the system silently.
A consumer should have a clear strategy for unsupported event versions.
Bad:
Crash with a generic error
Better:
Reject the event with a clear reason
Move it to a dead-letter queue
Log eventType, eventVersion, eventId, and correlationId
Alert if this should not happen
Sometimes the right behavior is to ignore the event.
Sometimes it should be retried later.
Sometimes it needs manual intervention.
But it should not fail mysteriously.
Unsupported versions are operational events too.
They should be visible.
If I had to explain event schema versioning in an interview, I would say:
In event-driven systems, events are contracts between producers and consumers. Once other services depend on an event, changing its schema can break them. This is especially important because events may be stored, retried, or replayed long after they were published.
I would try to make schema changes backward-compatible where possible. Adding optional fields is usually safer than renaming or removing fields. Consumers should ignore unknown fields and handle unknown enum values carefully.
For breaking changes, I would use explicit event versioning, for example eventType plus
eventVersion. Consumers can then support multiple versions or reject unsupported versions clearly.
I would also avoid changing the meaning of a field silently, because semantic changes can corrupt business data
without causing obvious errors.
In larger systems, I would use schema validation, contract tests, and possibly a schema registry. I would also test replay with old events, because old schemas may still exist in topics, logs, archives, or event stores.
Event schema versioning is not only about JSON, Avro, Protobuf, or field names.
It is about respecting the fact that events are shared contracts.
A producer may change quickly.
Consumers may change slowly.
Old events may still exist.
Replays may happen months later.
Different teams may depend on fields in different ways.
That is why event evolution needs care.
Prefer additive changes.
Version breaking changes.
Do not change meaning silently.
Document your events.
Test consumers against old and new schemas.
And remember that an event is not just a message in a broker.
It is a business fact that other systems may rely on.
This post is part of my Backend Architecture Notes series. In the next post, I will look at observability in event-driven systems, and why logs, metrics, traces, and correlation IDs matter even more when there is no single request path.