Back to blog

Testing event-driven systems beyond the happy path

Backend Architecture Notes: test duplicates, retries, out-of-order, and failure paths

November 14, 2025

Testing event-driven systems is different from testing a simple synchronous API.

In a synchronous flow, you often send a request and check the response.

For example:

POST /orders
Expect 201 Created
Check order exists in database

That kind of test is still useful.

But in an event-driven system, the real work may happen after the initial request returns.

A service publishes an event. One consumer processes it. Another consumer updates a projection. Another service retries after a failure. A saga moves to the next state. A dead-letter queue may receive a message that could not be processed.

The system is not only request and response.

It is a flow.

That means testing needs to cover more than the happy path.

The happy path is not enough

The happy path is usually the first test people write.

For example:

Given an order is created
When OrderCreated is published
Then inventory is reserved
And payment is authorized
And the order is confirmed

That test is useful.

But it only proves the system works when everything goes right.

Production is different.

Messages can be duplicated.

Consumers can crash.

Events can arrive late.

External APIs can timeout.

Schemas can change.

Retries can fail.

A dead-letter queue can start filling up.

A projection can lag behind.

If the tests only cover the perfect flow, they do not tell you whether the system can survive real conditions.

Event-driven systems should be tested against failure.

Start with the event handler

The smallest useful test is often the event handler test.

A handler receives an event and performs some business action.

For example:

Given an OrderCreated event
When the Inventory consumer handles it
Then inventory is reserved
And InventoryReserved is published

This kind of test is close to a unit test.

It should verify the business logic inside the consumer.

For example:

Does the handler validate the event?
Does it apply the right state change?
Does it publish the right follow-up event?
Does it handle missing data?
Does it ignore events it should not process?

These tests are fast and useful.

But they should not only test clean input.

They should also test unexpected input.

Test duplicate events

Duplicate messages are normal in at-least-once systems.

So every important consumer should be tested with the same event twice.

For example:

Given PaymentSucceeded for payment pay_123
When the event is processed twice
Then the payment is applied once
And the order is marked paid once
And no duplicate balance update is created

This is one of the most important tests in an event-driven system.

A handler that cannot safely process a duplicate event is fragile.

This is especially important for operations like:

Payments
Balance updates
Inventory reservations
Bonus grants
Invoice creation
Shipment creation
Email sending

The test should prove that the business effect happens once, even if the message is delivered more than once.

Test idempotency under concurrency

Testing duplicate events one after another is good.

But sometimes duplicates are processed at the same time.

For example, two consumer instances may receive the same message because of a retry, rebalance, or bug.

The dangerous case looks like this:

Consumer A checks if payment was processed
Consumer B checks if payment was processed
Both see it was not processed
Both apply the payment

A good idempotency design should survive this.

That usually means the database enforces uniqueness with a constraint or transaction.

A test should verify that.

For example:

Given two workers process the same PaymentSucceeded event concurrently
When both try to apply payment pay_123
Then only one succeeds
And the final balance is correct
And the duplicate is handled safely

This kind of test catches bugs that a normal sequential test misses.

Test out-of-order events

Events do not always arrive in the order you expect.

A consumer may receive:

OrderConfirmed

before:

OrderCreated

Or it may receive:

SubscriptionActivated

after:

SubscriptionCancelled

The consumer should have a clear strategy.

Possible behaviors:

Reject the event
Retry later
Ignore it as stale
Fetch current state from the source service
Move it to a dead-letter queue
Mark the workflow for manual review

The test should verify the expected behavior.

For example:

Given a subscription is already Cancelled
When an older SubscriptionActivated event is received
Then the subscription remains Cancelled
And the event is recorded as stale

This protects the system from old events corrupting newer state.

Test missing events

Ordering problems are not only about late events.

Sometimes an event appears to be missing.

For example:

Current projection version: 3
Incoming event version: 5
Missing version: 4

The consumer should not ignore that silently.

A test can verify the behavior:

Given the projection has processed version 3
When it receives version 5
Then it does not apply version 5 immediately
And it records a version gap
And it schedules a retry or rebuild

This is especially important for projections and read models that depend on event sequence.

If the consumer uses versions, test the version handling.

Do not only test the normal sequence.

Test retry behavior

Retries are part of event-driven systems.

If a temporary failure happens, the system should try again.

For example:

Database timeout
External API unavailable
Temporary network error
Broker connection issue

A retry test should verify that the system retries when it should.

For example:

Given the Payment provider times out
When AuthorizePayment is handled
Then the operation is retried with backoff
And the payment is not charged twice

The important part is not only that a retry happens.

The important part is that the retry is safe.

Retries without idempotency can create duplicate business effects.

Test retry limits

A system should not retry forever by accident.

For failures that do not recover, the message should eventually be moved aside or escalated.

For example:

Given a message has an unsupported schema version
When the consumer tries to process it
Then it fails
And after the retry limit it moves to the dead-letter queue

This test proves that poison messages do not block the system forever.

Retry limits are part of reliability.

Without them, one bad message can create an endless failure loop.

Test dead-letter queue behavior

A dead-letter queue is only useful if the system uses it correctly.

Tests should verify when messages go to the DLQ and what information is preserved.

For example:

Given an invalid OrderCreated event
When processing fails after all retries
Then the event is moved to the dead-letter queue
And the failure reason is stored
And the eventId, eventType, correlationId, and orderId are preserved

This matters because DLQ messages need to be investigated.

If the DLQ only contains a raw payload without context, debugging becomes harder.

A good test checks not only that the message is moved, but also that it is inspectable.

Test replay

Replay is useful, but dangerous.

A replay may rebuild a projection, recover from a bug, or backfill a new read model.

But replaying events can also trigger duplicate side effects if consumers are not careful.

For example, replay should not accidentally:

Send old customer emails again
Charge payments again
Create duplicate shipments
Grant bonuses again

A replay test might look like this:

Given historical OrderCreated events
When the reporting projection is rebuilt
Then the report state is correct
And no customer notifications are sent

Some consumers are safe to replay.

Some are not.

Tests should make that distinction explicit.

Test the outbox pattern

If a service uses the Outbox pattern, test the failure windows it is supposed to protect.

For example:

Given an order is created
When the database transaction commits
Then the OrderCreated event exists in the outbox table

Also test the publisher:

Given an unpublished outbox event
When the publisher runs
Then the event is published
And the outbox row is marked as published

But also test the duplicate case:

Given the publisher publishes an event
And crashes before marking it as published
When the publisher restarts
Then the event may be published again
And consumers handle the duplicate safely

The Outbox pattern prevents lost events after a database commit.

It does not eliminate duplicates.

Your tests should reflect that.

Test schema compatibility

Events are contracts.

Changing an event schema can break consumers.

So schema compatibility should be tested.

For example:

Can the consumer parse the current event version?
Can it parse older supported versions?
Does it ignore unknown optional fields?
Does it reject unsupported versions clearly?
Does it handle unknown enum values?

A useful test might be:

Given an OrderCreated v1 event
When the current consumer processes it
Then it still handles the event correctly

Another test:

Given an OrderCreated event with an extra optional field
When the consumer processes it
Then the consumer ignores the unknown field
And processing succeeds

These tests protect rolling deployments and replay.

Test consumer contracts

Consumer-driven contract tests can be very useful.

A consumer defines what it expects from an event.

For example, the Email Service may expect:

OrderCreated contains orderId
OrderCreated contains customerEmail
OrderCreated contains locale

The producer should be tested against those expectations.

That way, if the producer removes or renames a field, the test fails before production.

This is especially useful when multiple teams own different services.

The producer may not know every downstream dependency.

Contract tests make those dependencies visible.

Test sagas as workflows

Sagas need workflow-level tests.

A saga is not just one handler.

It is a sequence of steps with success paths, failure paths, retries, timeouts, and compensating actions.

A happy path test might be:

Create order
Reserve inventory
Authorize payment
Confirm order

But the more important tests are failure paths.

For example:

Given inventory is reserved
When payment fails
Then inventory is released
And the order is cancelled

Another example:

Given payment is authorized
When inventory reservation fails
Then payment authorization is voided
And the order is cancelled

The test should prove that compensation happens correctly.

Test compensation failures

Compensating actions can also fail.

For example:

Release inventory fails
Refund payment fails
Cancel shipment fails

A mature saga should know what to do.

Possible outcomes:

Retry compensation
Move to manual review
Alert support
Mark saga as compensation failed

A test might look like this:

Given payment was captured
And inventory reservation failed
When refund fails
Then the saga moves to RequiresManualReview
And an alert is created
And the order is not marked as successfully cancelled

This is the kind of test that separates a happy-path workflow from a production workflow.

Test timeouts

Not every failure arrives as a clean error event.

Sometimes nothing happens.

For example:

Payment Service does not respond
Inventory reservation event never arrives
External provider times out

A saga should handle timeouts.

For example:

Given the order saga is waiting for PaymentAuthorized
When no response arrives within 15 minutes
Then the saga checks payment status
Or retries the command
Or moves to RequiresManualReview

Timeout tests are important because many real failures are silent.

If you only test explicit failure events, you miss half the problem.

Test projections

Projections and read models should be tested too.

For example, a reporting projection may consume:

OrderCreated
PaymentAuthorized
OrderCancelled

The test should verify that the final read model is correct.

But again, test more than the happy path.

For example:

Duplicate OrderCreated
Late OrderCancelled
Unknown event version
Missing version gap
Replay from scratch

A projection should be deterministic.

If you rebuild it from the same event history, you should get the same result.

That is a valuable property to test.

Test observability

Observability should be tested more often than it usually is.

If a message fails, does the system log the right identifiers?

If a saga gets stuck, does the metric change?

If a message goes to the DLQ, does an alert fire?

For example:

Given a consumer fails after all retries
When the message is sent to the DLQ
Then a metric is incremented
And the log includes eventId, correlationId, and business identifiers

This may feel less important than business logic tests.

But during an incident, missing observability becomes a real production problem.

If you rely on correlation IDs to debug the system, tests should prove they are propagated.

Integration tests with the broker

Unit tests are not enough.

At some point, you need integration tests with the real broker or a realistic test environment.

These tests verify things like:

Message serialization
Topic or queue configuration
Consumer group behavior
Acknowledgement behavior
Retry configuration
Dead-letter routing
Partition key behavior
Header propagation

Mocks can hide problems.

For example, a mocked broker may not behave like Kafka, RabbitMQ, SQS, or whatever the real system uses.

A good test suite has fast unit tests for handlers and slower integration tests for infrastructure behavior.

Both are useful.

End-to-end tests

End-to-end tests verify that the full business flow works.

For example:

User places order
Order is created
Inventory is reserved
Payment is authorized
Order is confirmed
Confirmation email is scheduled
Support view is updated

These tests are valuable because they catch wiring problems.

But they should be used carefully.

End-to-end tests can be slower and more fragile.

I would use them for the most important business flows, not every edge case.

Most detailed failure cases can be tested at the handler, workflow, or integration level.

Use test fixtures for events

Events should have realistic test fixtures.

For example:

OrderCreated v1
OrderCreated v2
PaymentSucceeded
PaymentFailed
InventoryReserved
InventoryReservationFailed

These fixtures should look like real production events.

They should include metadata:

eventId
eventType
eventVersion
occurredAt
correlationId
causationId
business identifiers
payload

Good fixtures make tests more realistic.

They also help document the event contract.

A bad fixture with only the fields needed for one test may hide schema problems.

Test data should include ugly cases

Real production data is messy.

Tests should include ugly cases too.

For example:

Missing optional fields
Unknown enum values
Large payloads
Old event versions
Unexpected but valid state transitions
Duplicate business IDs
Events from replay
Events with old timestamps

The point is not to make tests complicated for no reason.

The point is to avoid assuming that every event will look like the clean example in the documentation.

Manual testing is not enough

It is tempting to test event-driven flows manually by clicking through the application and watching logs.

That can help during development.

But it is not enough.

Manual testing rarely covers:

Duplicate delivery
Consumer crash after database write
Broker redelivery
Out-of-order events
Dead-letter queue routing
Retry exhaustion
Concurrent duplicate processing
Replay safety
Compensation failure

These are exactly the cases that matter in production.

They need automated tests.

What I would test first

If I had limited time, I would start with the highest-risk areas.

For an event-driven system, that usually means:

Idempotency for important consumers
Saga failure paths
Outbox publishing
DLQ behavior
Schema compatibility
Projection rebuilds
Observability for stuck workflows

I would not try to test every possible event combination from day one.

But I would make sure the system is safe around money, access, inventory, external side effects, and business-critical workflows.

Risk should drive the testing strategy.

The interview version

If I had to explain testing event-driven systems in an interview, I would say:

Testing event-driven systems requires more than testing the happy path. I would start with unit tests for event handlers, but I would also test failure scenarios like duplicate messages, retries, out-of-order events, missing events, schema changes, and consumer crashes.

Because most systems use at-least-once delivery, I would test that important consumers are idempotent. For example, processing the same PaymentSucceeded event twice should not apply the payment twice. I would also test concurrency, because two workers may try to process the same business operation at the same time.

For sagas, I would test the full workflow, including compensation. The happy path is not enough. I want to know what happens when inventory succeeds but payment fails, or when compensation itself fails.

I would also test infrastructure behavior with integration tests: broker configuration, acknowledgements, retries, dead-letter queues, outbox publishing, and schema compatibility. Finally, I would test observability, making sure failed messages include correlation IDs, business identifiers, and useful metrics.

The goal is to prove that the system behaves correctly when messages are duplicated, delayed, retried, replayed, or processed out of order.

Final thought

Event-driven systems fail in different ways than simple request-response systems.

The first version of the system may work perfectly when every service is healthy and every message arrives once.

But production will test different conditions.

Messages will be retried.

Consumers will crash.

Events will arrive late.

Schemas will evolve.

Projections will lag.

External APIs will timeout.

Sagas will fail halfway through.

A good test strategy accepts that reality.

Do not only test that the system works when everything goes right.

Test what happens when the system is under stress, when messages are duplicated, when dependencies fail, and when the business process gets stuck.

That is where confidence comes from.

This post is part of my Backend Architecture Notes series. In the next post, I will look at how to choose between synchronous APIs and events, because not every service interaction should be asynchronous.