In the previous post, I wrote about the Saga pattern as a way to coordinate long-running business processes across multiple services.
A saga is useful when one business process cannot be handled inside a single database transaction.
For example:
Create order
Reserve inventory
Authorize payment
Confirm order
Start shipping
Each step may belong to a different service. Each service owns its own data. If a later step fails, the system needs to run compensating actions.
That raises an important design question:
Who coordinates the process?
There are two common answers:
Choreography
Orchestration
Both can work. Both have trade-offs. The right choice depends on the complexity of the workflow, the number of services involved, the failure cases, and how much visibility you need.
In a choreography-based saga, there is no central coordinator.
Each service reacts to events and publishes new events.
The process moves forward because services listen to each other.
For example:
Order Service publishes OrderCreated
Inventory Service receives OrderCreated
Inventory Service reserves stock
Inventory Service publishes InventoryReserved
Payment Service receives InventoryReserved
Payment Service authorizes payment
Payment Service publishes PaymentAuthorized
Order Service receives PaymentAuthorized
Order Service confirms the order
No single service is explicitly telling everyone what to do.
The workflow emerges from the event reactions.
That is why it is called choreography. Each service knows its own steps, and the overall process happens because every participant reacts to the right signals.
Choreography can feel very natural in event-driven systems.
A service publishes a fact:
OrderCreated
Other services react:
Inventory reserves stock
Payment starts authorization
Email sends a message
Analytics updates reporting
The producer does not need to know who is listening. That keeps services loosely coupled.
This is one of the main benefits of event-driven architecture.
Choreography is attractive because:
There is no central workflow service
Services stay independent
New consumers can be added without changing the producer
The architecture feels loosely coupled
Simple flows are easy to start with
For small workflows, this can work very well.
If the flow is simple and the failure cases are limited, choreography may be enough.
The downside is that the business process becomes distributed across many services.
At first, that may not be a problem.
But as the workflow grows, it can become hard to answer simple questions:
What is the full process?
Which service triggers the next step?
What happens if payment fails?
What happens if inventory reservation fails?
Which service compensates which action?
Where can I see the current state of the workflow?
The logic is no longer in one place.
It is spread across event handlers.
The Order Service reacts to some events. The Inventory Service reacts to others. The Payment Service reacts to others. The Shipping Service reacts to others.
Each service may be easy to understand in isolation, but the full business process becomes harder to see.
This is the main risk of choreography.
You can end up with a system where every service looks clean locally, but the overall workflow is difficult to reason about.
Imagine the order process starts simple:
OrderCreated
InventoryReserved
PaymentAuthorized
OrderConfirmed
That is manageable.
Later, the business adds more rules:
Fraud check before payment
Different payment methods
Partial inventory reservation
Customer loyalty points
Promotional discounts
Manual review for high-value orders
Refunds when inventory fails after payment
Shipping provider selection
Notification preferences
Now the event chain becomes harder to understand.
A possible flow could look like this:
OrderCreated
FraudCheckRequested
FraudCheckPassed
InventoryReservationRequested
InventoryPartiallyReserved
PaymentAuthorizationRequested
PaymentAuthorized
LoyaltyPointsApplied
OrderConfirmed
ShippingRequested
And for failure paths:
FraudCheckFailed
InventoryReservationFailed
PaymentFailed
PaymentStatusUnknown
RefundRequested
RefundFailed
ManualReviewRequired
At this point, choreography can become difficult.
Not because events are bad.
But because the business process has become complex enough that hiding it across many services hurts understandability.
In an orchestration-based saga, one component coordinates the workflow.
This component is often called:
Saga orchestrator
Process manager
Workflow service
Workflow engine
The orchestrator knows the steps of the process.
It sends commands to services and waits for results.
For example:
Order Saga Orchestrator receives CreateOrder request
Orchestrator asks Order Service to create the order
Orchestrator asks Inventory Service to reserve stock
Orchestrator asks Payment Service to authorize payment
Orchestrator asks Order Service to confirm the order
Orchestrator asks Shipping Service to start shipping
The services still own their own data and rules.
The orchestrator does not reserve stock directly. It asks the Inventory Service.
The orchestrator does not charge a card directly. It asks the Payment Service.
The orchestrator coordinates the process.
Orchestration makes the business flow explicit.
Instead of reconstructing the process by reading many event handlers, you can look at the orchestrator and see the workflow.
That is useful when the process is complex or business-critical.
With orchestration, it is easier to answer:
Which step are we in?
What happens next?
What happens if this step fails?
What compensating action should run?
How long has this workflow been waiting?
Can we retry this step?
Does this require manual review?
This visibility matters in production.
When an order is stuck, support and engineering need to know where it is stuck.
When a payment succeeded but inventory failed, the system needs to know whether to refund, retry, or escalate.
An orchestrator gives you a natural place to track that state.
The risk is that the orchestrator becomes too powerful.
A bad orchestrator starts to contain business logic that belongs inside the services.
For example, the orchestrator should not know exactly how inventory is reserved internally.
It should not know payment provider details.
It should not directly update another service's database.
It should not become a god service.
A good orchestrator coordinates.
The individual services still own their decisions.
For example:
Orchestrator sends ReserveInventory
Inventory Service decides whether stock can be reserved
Inventory Service publishes InventoryReserved or InventoryReservationFailed
That keeps ownership clear.
The orchestrator knows the workflow.
The services own the domain rules.
Orchestration often works well with commands and events.
The orchestrator sends a command:
ReserveInventory
The Inventory Service handles that command and publishes an event:
InventoryReserved
or:
InventoryReservationFailed
The orchestrator receives the result and decides the next step.
For example:
If InventoryReserved:
send AuthorizePayment
If InventoryReservationFailed:
send CancelOrder
This gives a clean separation.
Commands express intent.
Events report what happened.
The orchestrator moves the process forward based on those results.
Choreography gives you loose coupling, but the process can become harder to see.
Orchestration gives you visibility and control, but you introduce a central workflow component.
A simple comparison:
Choreography:
- Services react to events
- No central coordinator
- Good for simple flows
- Very loosely coupled
- Can become hard to trace as complexity grows
Orchestration:
- A workflow coordinates the process
- Central place for saga state
- Good for complex flows
- Easier failure handling
- Risk of creating a god service
Neither option is automatically better.
The question is what kind of workflow you are building.
I would choose choreography when the flow is simple and each reaction is independent.
For example:
UserRegistered
This event might trigger:
Send welcome email
Create CRM contact
Update analytics
Start onboarding checklist
These actions are related, but they do not necessarily form one strict business transaction.
If the analytics update is delayed, it probably does not block the email.
If the CRM integration fails, it probably does not mean the user registration should be cancelled.
This is a good fit for choreography.
The User Service publishes UserRegistered, and other consumers react independently.
I would also use choreography when adding side effects to an existing event:
Update search index
Send notification
Update reporting
Invalidate cache
Trigger recommendation update
These are often good event consumers, as long as failure handling is clear.
I would choose orchestration when the workflow has a clear sequence, important failure paths, and compensating actions.
For example:
Create order
Reserve inventory
Authorize payment
Confirm order
Start shipping
This is not just a set of independent reactions.
The order matters.
Failure matters.
If payment fails, inventory may need to be released.
If inventory fails after payment authorization, payment may need to be voided.
If shipping fails, support may need to intervene.
This is a good fit for orchestration.
The process is important enough to deserve a central place where the state and transitions are visible.
Debugging is one of the biggest practical differences.
In choreography, debugging often means following an event trail across many services.
You ask:
Who consumed this event?
Which service should have published the next event?
Did the consumer fail?
Was the message retried?
Did it go to a dead-letter queue?
Was a later event ignored?
In orchestration, debugging can be more direct.
You ask:
What is the current workflow state?
Which step is waiting?
Which command was sent?
What result came back?
What retry policy is active?
What compensation is pending?
This does not mean orchestration removes complexity.
But it gives you a central model for that complexity.
In production, that can make a big difference.
One useful way to choose between choreography and orchestration is to ask:
Who owns the business process?
If the process is just a set of independent reactions to a fact, choreography may be fine.
But if the process has a clear lifecycle, state transitions, compensations, and business-level success or failure, then something should own that process.
That something can be an orchestrator.
For example, an order lifecycle is not owned by the Payment Service or the Inventory Service alone.
It is the larger order process.
That process may deserve its own workflow state.
Many systems start with accidental choreography.
A service publishes an event.
Another service reacts.
Then another service reacts.
Then someone adds a compensation handler.
Then someone adds a retry handler.
Then support needs to know why something is stuck.
Before long, the system contains a saga, but no one designed it as one.
That is dangerous.
The workflow exists, but it is hidden.
When the process is business-critical, I prefer to make the saga explicit.
Even if the first version is simple, it helps to know where the workflow state lives and who owns the transitions.
In practice, systems often use a mix of both.
A core business workflow may be orchestrated.
Side effects may be choreographed.
For example, an order saga might be orchestrated:
Create order
Reserve inventory
Authorize payment
Confirm order
After OrderConfirmed, other services react independently:
Email Service sends confirmation
Analytics Service updates reporting
Recommendation Service updates customer profile
CRM integration receives order data
This is often a good balance.
Use orchestration for the critical workflow.
Use choreography for independent reactions.
That way the core process remains visible, while the system still benefits from loose coupling.
If I had to explain the difference in an interview, I would say:
In a choreography-based saga, services react to each other's events without a central coordinator. For
example, Order Service publishes OrderCreated, Inventory Service reacts and publishes
InventoryReserved, then Payment Service reacts and publishes PaymentAuthorized. This
keeps services loosely coupled and works well for simple flows, but complex workflows can become hard to
understand because the logic is spread across many services.
In an orchestration-based saga, a central workflow or orchestrator coordinates the process. It sends commands
like ReserveInventory or AuthorizePayment, waits for events that report success or
failure, and decides the next step or compensating action. This gives better visibility and control for complex
workflows, but the orchestrator should not become a god service.
For simple independent reactions, I would use choreography. For business-critical workflows with ordering, retries, timeouts, and compensating actions, I would prefer orchestration.
Choreography and orchestration are not enemies.
They are two ways to coordinate distributed work.
Choreography is useful when services can independently react to business events.
Orchestration is useful when the business process itself needs to be explicit, observable, and controlled.
The mistake is choosing one because it sounds more elegant.
The better approach is to look at the workflow.
How many steps are there?
Do they need to happen in order?
What happens when one step fails?
Who owns the overall process?
How will support and engineering debug it at 2 a.m.?
For small independent reactions, choreography keeps things simple.
For complex business workflows, orchestration often makes the system easier to understand.
This post is part of my Backend Architecture Notes series. In the next post, I will look at idempotency, and why it is one of the most important concepts in event-driven systems.