Back to blog

Event-driven architecture in plain English

Backend Architecture Notes: what events solve, and what they cost

October 1, 2025

When preparing for a backend interview, I like to revisit the concepts that sound simple until you have to explain them clearly.

Event-driven architecture is one of those concepts.

Most developers have heard the words before. Events, queues, brokers, consumers, producers. But the real value is not in knowing the terminology. The real value is understanding why this architecture exists, what problem it solves, and what kind of complexity it introduces.

This post is the first in a short series on backend architecture notes. I want to explain the concepts in a practical way, without turning them into academic definitions.

What is event-driven architecture?

Event-driven architecture is a style of building software where services communicate by publishing and reacting to events.

An event is something that already happened in the system.

For example:

OrderCreated
PaymentSucceeded
UserRegistered
InvoiceGenerated
GameSessionStarted
PlayerBalanceUpdated

That last detail is important: an event is a fact.

It does not say:

Create an order
Charge this payment
Send this email

It says:

An order was created
A payment succeeded
An email was sent

The event describes something that has already happened.

In a traditional synchronous flow, one service calls another service directly. For example, an Order Service might call the Payment Service, then the Inventory Service, then the Email Service.

That can work well for simple systems. But as the system grows, those direct calls can create tight coupling. The Order Service starts to know too much about everything that needs to happen after an order is created.

In an event-driven system, the Order Service can simply publish an OrderCreated event. Other services can listen for that event and react to it.

For example:

Order Service publishes OrderCreated

Payment Service starts the payment flow
Inventory Service reserves stock
Email Service sends a confirmation email
Analytics Service updates reporting
Fraud Service checks the order

The Order Service does not need to know all those consumers exist. It only publishes the event.

That is the core idea.

Why use events?

The biggest benefit of event-driven architecture is loose coupling.

A service can publish a business fact, and other parts of the system can react independently. This makes it easier to add new behavior without changing the original service.

Imagine you already have this flow:

OrderCreated -> Send confirmation email

Later, the business wants to update analytics whenever an order is created. In a tightly coupled system, you might have to change the Order Service to call the Analytics Service.

In an event-driven system, you can add a new Analytics consumer that listens to OrderCreated.

The existing flow does not need to change.

That is powerful.

Event-driven architecture is especially useful when:

Multiple systems need to react to the same business fact
The work can happen asynchronously
You want to avoid tight coupling between services
You need to scale consumers independently
You want a better audit trail of what happened

This is common in domains like ecommerce, payments, logistics, gaming platforms, banking, SaaS products, and internal business workflows.

A simple example

Take an ecommerce order.

A user places an order. The system needs to do several things:

Create the order
Reserve inventory
Authorize payment
Send confirmation email
Update reporting
Start shipping process

You could build this as one big synchronous flow:

POST /orders
  -> create order
  -> call inventory service
  -> call payment service
  -> call email service
  -> call shipping service
  -> return response

The problem is that the user is now waiting for everything. If the Email Service is slow, order creation becomes slow. If the Shipping Service is temporarily unavailable, should the order fail? If Analytics is down, should the customer be blocked from buying?

Probably not.

With events, the flow can look different:

Order Service creates the order
Order Service publishes OrderCreated
Other services react in their own time

Now the system is more flexible.

The Order Service owns the order. The Payment Service owns payments. The Inventory Service owns stock reservation. Each service can make progress independently.

But this flexibility comes with trade-offs.

The trade-off: eventual consistency

Event-driven systems are often eventually consistent.

That means not every part of the system is updated at the exact same moment.

For a short period of time, the Order Service may know that an order exists, while the Payment Service has not processed the payment yet and the Inventory Service has not reserved stock yet.

This is not automatically bad. It is just a different model.

Instead of pretending the whole process happens instantly, you model the business process as a series of states.

For example:

OrderPending
PaymentAuthorized
InventoryReserved
OrderConfirmed
OrderCancelled

This is often closer to how the real world works.

A business process is not always one instant transaction. It is a flow. Some steps succeed. Some steps fail. Some steps need to be retried. Some steps need manual intervention.

Event-driven architecture forces you to be honest about that.

What gets harder?

Event-driven architecture solves some problems, but it also introduces new ones.

The most important ones are:

Duplicate messages
Out-of-order events
Retry handling
Dead-letter queues
Schema versioning
Observability
Debugging
Eventual consistency

In a synchronous request, you can often follow one call stack.

In an event-driven system, the flow is spread across services, queues, topics, logs, and databases. The system may continue processing long after the original request returned.

That means you need good operational discipline.

You need correlation IDs so you can trace one business flow across services. You need metrics for queue lag, retry counts, processing times, and failed messages. You need logs that include event IDs and business identifiers.

Without that, debugging becomes painful.

At-least-once delivery and duplicate messages

One of the first lessons in event-driven systems is this:

The same event may be delivered more than once.

Many message brokers use at-least-once delivery. That means the broker tries to make sure a message is not lost, but the trade-off is that the same message can be delivered multiple times.

For example:

Consumer receives PaymentSucceeded
Consumer updates the database
Consumer crashes before acknowledging the message
Broker sends the message again
Consumer processes PaymentSucceeded again

If the handler is not careful, the system may apply the same payment twice.

That is why idempotency matters.

An idempotent operation can safely be executed multiple times and still produce the same result as executing it once.

For example, instead of saying:

Add 50 EUR to the account balance

You might model the operation as:

Apply payment pay_123 once

Then the system can check whether pay_123 has already been processed.

If it has, skip it.

That small design choice can prevent serious production bugs.

Events are not magic

It is tempting to think of events as a way to make systems automatically scalable and resilient.

They can help, but they are not magic.

Bad event-driven systems are very possible.

You can create a system where nobody understands the full business flow. You can create consumers that silently fail. You can create event schemas that break other teams. You can create retry storms that overload your database. You can create duplicate processing bugs that are very hard to fix afterwards.

The architecture only works well when the failure cases are part of the design.

A good event-driven system thinks about questions like:

What happens if the same message arrives twice?
What happens if a consumer is down for one hour?
What happens if an event arrives late?
What happens if a message can never be processed?
How do we replay events safely?
How do we trace one business process across services?
How do we change an event schema without breaking consumers?

These questions are not edge cases. They are the real design work.

When would I use event-driven architecture?

I would use event-driven architecture when the business process naturally has multiple independent reactions to the same fact.

For example:

An order was created
A payment succeeded
A user registered
A document was uploaded
A game round completed
A subscription was cancelled

These are good events because other parts of the system may care about them.

I would also use it when asynchronous processing makes the user experience or system reliability better.

For example, a user should not have to wait for analytics, emails, reporting, and downstream integrations before getting a response.

But I would not use events everywhere.

If one service simply needs an immediate answer from another service, a synchronous API call may be simpler and better.

For example:

Can this user access this resource?
What is the current price?
Is this token valid?

Those questions often need an immediate response.

Good architecture is not about always choosing events or always choosing APIs. It is about choosing the communication style that matches the business need.

The interview version

If I had to explain event-driven architecture in an interview, I would say:

Event-driven architecture is a way to decouple services by having them publish and react to business events. An event represents something that already happened, like OrderCreated or PaymentSucceeded. Other services can subscribe to those events and react independently.

The benefit is that services are less tightly coupled and can scale independently. It is useful when multiple parts of the system need to react to the same business fact, or when work can happen asynchronously.

The trade-off is that the system becomes eventually consistent and harder to debug. You need to think about duplicate messages, retries, ordering, schema versioning, dead-letter queues, and observability.

In production, I would assume messages can be duplicated, delayed, retried, and sometimes processed out of order. So I would design consumers to be idempotent, use correlation IDs, monitor queue lag and failures, and use patterns like the outbox pattern when reliable publishing is important.

Final thought

For me, event-driven architecture is not mainly about queues or Kafka or RabbitMQ.

It is about designing systems around business facts.

Something happened. Other parts of the system may care. They should be able to react without the original service knowing everything about them.

That is the strength of the model.

But the moment you choose events, you also choose eventual consistency, retries, duplicates, and more operational complexity.

That is not a reason to avoid event-driven architecture. It is a reason to design it properly.

This post is part of my Backend Architecture Notes series. In the next post, I will look at the difference between events and commands, because that small distinction has a big impact on how you design message-driven systems.