Back to blog

Eventual consistency without hand-waving

Backend Architecture Notes: designing the in-between states, not hoping they converge

October 9, 2025

Eventual consistency is one of those terms that sounds simple until you have to explain what it means in a real system.

It is also one of the first trade-offs you run into when working with event-driven architecture.

In a synchronous monolith with one database, it is tempting to think about consistency as something immediate. A request comes in, a transaction runs, tables are updated, and when the request returns, the system is in the new state.

In an event-driven system, that is often not how things work.

A service may update its own database and publish an event. Other services react to that event later. Sometimes that means milliseconds later. Sometimes seconds. Sometimes longer, especially when there are retries, outages, backpressure, or delayed consumers.

That delay is where eventual consistency appears.

The system is not inconsistent forever. But it is not immediately consistent everywhere either.

What is eventual consistency?

Eventual consistency means that different parts of the system may temporarily have different views of the same business process, but if no new changes happen and the system keeps processing, those views should eventually converge.

For example, imagine an order flow:

Order Service creates an order
Order Service publishes OrderCreated
Payment Service processes the payment
Inventory Service reserves stock
Email Service sends a confirmation

Immediately after the order is created, the Order Service knows about the order.

But the Payment Service may not have processed payment yet.

The Inventory Service may not have reserved stock yet.

The Email Service may not have sent anything yet.

For a short period of time, the system is in progress.

That is eventual consistency.

It does not mean the system is random. It does not mean data quality is bad. It means the business process is spread across multiple services and those services do not all update at the exact same moment.

A concrete example

Imagine a user places an order.

The user clicks “Buy”.

The Order Service creates this state:

OrderCreated
status: PendingPayment

Then it publishes:

OrderCreated

The Payment Service receives the event and authorizes the payment.

Then it publishes:

PaymentAuthorized

The Order Service receives that event and updates the order:

status: PaymentAuthorized

Then Inventory reserves the stock and publishes:

InventoryReserved

Finally, the order becomes:

status: Confirmed

At no point did one big transaction update every service at once.

Each service made a local change. Each change moved the process forward.

This is a very different model from a single database transaction.

Why not use one big transaction?

In a monolith, you can often use one database transaction:

Begin transaction
Create order
Create payment record
Reserve inventory
Commit transaction

If something fails, you roll back.

That is simple and powerful.

But in a microservice architecture, each service often owns its own database.

The Order Service owns orders.

The Payment Service owns payments.

The Inventory Service owns inventory.

The Shipping Service owns shipments.

Once data ownership is split across services, one big database transaction becomes difficult or undesirable. You do not want every service sharing the same database. You also usually do not want distributed transactions across many services because they add operational complexity and coupling.

So instead of one global transaction, you use a business process made of smaller local transactions.

That is where eventual consistency becomes part of the design.

Eventual consistency is not an excuse

A bad explanation of eventual consistency sounds like this:

It will be consistent eventually.

That is not enough.

Eventually when?

What happens before that?

What does the user see?

What happens if one step fails?

Can the process get stuck?

Can someone retry it?

Can support understand what happened?

Eventual consistency should not be used as a vague excuse for unclear behavior.

It should be designed explicitly.

You need to know the states the business process can be in.

For example:

PendingPayment
PaymentAuthorized
PaymentFailed
InventoryReserved
InventoryReservationFailed
Confirmed
Cancelled
RequiresManualReview

These states matter because they make the in-between moments visible.

Without clear states, eventual consistency becomes confusing. With clear states, it becomes manageable.

Model the process, not just the data

One mistake is to only think about the final state.

For example:

Order is confirmed

But in a distributed system, the journey to that final state matters.

The order may be waiting for payment.

The payment may have succeeded, but stock may not be reserved yet.

Stock may be reserved, but shipping may not have started.

Payment may fail after the order was created.

Inventory may be unavailable after payment authorization.

These are not edge cases. They are normal parts of the business flow.

So instead of pretending an order is either “created” or “done”, you model the process:

OrderCreated
PaymentPending
PaymentAuthorized
InventoryPending
InventoryReserved
OrderConfirmed

And for failures:

PaymentFailed
InventoryFailed
OrderCancelled
RefundPending
Refunded

The clearer the process states are, the easier the system becomes to reason about.

What does the user see?

Eventual consistency is not only a backend concern.

It affects the user experience.

Suppose a user places an order and payment processing is asynchronous.

What should the UI show?

Bad:

Something went wrong.

Better:

Your order has been received. Payment confirmation is pending.

Or:

We are processing your order. This usually takes a few seconds.

The important thing is that the UI should reflect the real state of the process.

If the backend is asynchronous, the frontend should not pretend everything is instant.

That does not mean the user needs to understand the architecture. But the product should communicate what is happening in a way that makes sense.

Temporary inconsistency is normal

Let's say the user opens their account page immediately after making a payment.

One part of the system may already show:

Payment succeeded

Another part may not yet show the updated subscription status.

That can happen if the payment event has been processed, but the subscription projection has not caught up yet.

This is temporary inconsistency.

The question is not “can this happen?”

In an event-driven system, it can.

The better questions are:

Is this acceptable for the business?
How long can this inconsistency last?
Can the user take harmful actions during this window?
Do we need to block certain actions until the state catches up?
Can we show a pending state instead?

Some inconsistencies are harmless.

Some are not.

For example, analytics being a few seconds behind is usually fine.

A player balance being wrong, an order being shipped without payment, or a user getting access to something they did not pay for is not fine.

Different data has different consistency requirements.

Strong consistency where it matters

Event-driven architecture does not mean everything must be eventually consistent.

Some operations need strong consistency.

For example:

Charging a payment once
Preventing a negative balance
Checking access rights
Reserving limited inventory
Applying a bonus only once

For these cases, you may still need synchronous checks, database constraints, locks, unique indexes, conditional writes, or a single service that owns the critical decision.

A good design does not blindly make everything asynchronous.

It decides where eventual consistency is acceptable and where stronger guarantees are required.

For example, you might publish events after a payment succeeds, but the decision to charge the payment should still be controlled carefully by the Payment Service.

That service should own idempotency, validation, and the payment state.

The event is how the rest of the system learns about the result.

Read models and projections

Eventual consistency often appears when using read models or projections.

A projection is a view of data built from events.

For example, an Order Service may publish events like:

OrderCreated
PaymentAuthorized
InventoryReserved
OrderConfirmed

A Reporting Service may consume those events and build a read model for dashboards.

That dashboard may lag behind the source of truth.

This is often acceptable because reporting does not always need to be updated instantly.

But users and teams need to understand that the projection is not the source of truth. It is a derived view.

If a projection is delayed, the source service may already have the correct state while the read model is still catching up.

That is not necessarily a data bug. It is a property of the architecture.

Failure makes the model visible

Eventual consistency is easiest to understand when everything works.

It becomes more interesting when something fails.

Imagine this flow:

OrderCreated
PaymentAuthorized
InventoryReservationFailed

Now what?

The system has to decide.

Possible options:

Cancel the order
Refund the payment
Keep the order pending
Ask the user to choose another item
Send the case to manual review

There is no purely technical answer. This is a business decision.

That is why eventual consistency and business process design are connected.

The system needs to know how to move forward when a later step fails.

This is also where the Saga pattern usually appears.

A saga coordinates a long-running process across services using local transactions and compensating actions.

For example:

Payment was authorized
Inventory reservation failed
Release or refund the payment
Cancel the order

That is not a database rollback. It is a business-level correction.

Avoid hidden inconsistency

One danger is hidden eventual consistency.

That happens when the system is asynchronous, but the product and operations pretend it is synchronous.

For example:

The API returns success before downstream processing is complete.
The UI shows a final state while the backend is still processing.
Support tools do not show pending states.
Failed messages go unnoticed.
There is no way to replay or repair stuck processes.

This creates confusion.

Users think something is done.

Support sees incomplete data.

Engineers have to dig through logs to understand what happened.

A better design makes progress visible.

For example:

Order status: PendingPayment
Payment status: Authorized
Inventory status: ReservationFailed
Next action: RefundPayment

That kind of visibility turns eventual consistency from a mystery into a manageable workflow.

Observability matters

Eventual consistency requires good observability.

You need to know whether the system is actually converging.

Useful signals include:

Queue lag
Consumer processing time
Retry counts
Dead-letter queue size
Oldest unprocessed message age
Number of stuck business processes
Time spent in pending states
Failed state transitions

Technical metrics are useful, but business metrics are just as important.

For example:

How many orders are stuck in PendingPayment?
How many payments succeeded but orders are not confirmed?
How many refunds are pending after inventory failure?
How long does confirmation usually take?

These metrics tell you whether eventual consistency is healthy or broken.

The interview version

If I had to explain eventual consistency in an interview, I would say:

Eventual consistency means that different services may temporarily have different views of the data, but the system is designed so those views converge over time. This often happens in event-driven systems because one service updates its own database and publishes an event, while other services process that event asynchronously.

The important part is to model the business process clearly. Instead of pretending everything is immediately complete, I would use explicit states like PendingPayment, PaymentAuthorized, InventoryReserved, Confirmed, or Cancelled.

I would also think carefully about where eventual consistency is acceptable and where stronger consistency is required. Analytics can usually lag behind. Payment processing, balance updates, and inventory reservation need stricter guarantees.

In production, I would monitor queue lag, retries, dead-letter queues, and stuck business processes. Eventual consistency should be visible and measurable, not just something we hope will work.

Final thought

Eventual consistency is not a weakness by itself.

It is a trade-off.

You accept that not every service updates at the exact same moment in exchange for looser coupling, better scalability, and more resilient asynchronous workflows.

But the trade-off only works if the system is designed honestly.

You need clear states.

You need failure handling.

You need observability.

You need to know which parts of the business can tolerate delay and which parts cannot.

When explained badly, eventual consistency sounds like hand-waving.

When designed well, it is simply a realistic way to model business processes that do not happen all at once.

This post is part of my Backend Architecture Notes series. In the next post, I will look at the Saga pattern and how it helps coordinate long-running workflows across multiple services.