Stop Calling It Event-Driven
The Problem: Everything Is an Event on a Queue
The default "event-driven architecture" looks like this: lots of small services connected by queues, with event objects flowing between them. Events, signals, and queues get lumped together as one concept. "An event is a data structure that sits on a queue and acts as a signal". All three conflated into a single architectural primitive.
This is not necessarily wrong at scale, but it gets adopted by default long before scale justifies it. The complexity cost is real: state scattered across services, no single place to trace what happened and why, distributed debugging as the norm rather than the exception.
You have seen this system. A constellation of services with queues of fat JSON objects flowing between them. Its proponents call it flexible: "you can just read from the queue". In practice it is brittle. Changing a single field means coordinating across five teams to agree on a data model.
All I wanted to do was send a welcome email and a webhook.
The underlying idea is sound. Designing a system around things that happen, rather than procedures that execute, produces code that is easier to extend, easier to test, and easier to reason about. The problem is not the thinking; it is the assumption that the thinking requires a particular infrastructure. "Event-driven" has become synonymous with "queue-driven", and the two are not the same thing.
Untangling the Concepts
There are three distinct things here, each with its own purpose:
- An event is a domain model. A record of something that happened. It has a shape, a meaning, and a place in the domain. It is data, not infrastructure.
- A queue is a storage and access pattern: a list with frontier or claim semantics. It is infrastructure, not domain modelling. (See Your Database is Already a Queue.)
- A signal is an in-process notification that something happened. A function call with a subscriber list. Zero infrastructure.
Treating all three as one thing, "event-driven", leads teams to reach for a queue every time something needs to react to something else. Most of the time, a signal is sufficient.
Events as Domain Models
An event, properly modelled, is a first-class domain record. It captures who performed the action, what changed, when it happened, the prior and new state, and the context that made the change meaningful. It belongs in your domain layer, not your infrastructure.
This distinction matters because of how teams typically encounter events. The first time most codebases model an event, it is as a message format: a JSON blob shaped for a specific consumer, with fields chosen for serialisation convenience rather than domain clarity. That gets the relationship backwards. The event is the domain's authoritative record of what happened. A queue message is one possible projection of that record, shaped for a particular delivery mechanism.
Model events the same way you model any other domain concept: as typed, immutable structures with explicit fields.
@dataclass(frozen=True)
class OrderStatusChanged:
order_id: str
previous_status: OrderStatus
new_status: OrderStatus
changed_by: str
reason: str
timestamp: datetime
frozen=True is the quiet enforcer here. An event is a record of something that already happened; it should be immutable by construction. If you find yourself mutating an event after creation, you are not recording history; you are editing it.
Notice what is missing: no queue_name, no routing_key, no serialization_format. Those are concerns of the delivery mechanism, not the domain. When the time comes to put this on a queue or write it to a log, you project it into whatever shape the consumer needs. The domain model stays clean.
Events recorded as data are the foundation for audit trails, replay, and debugging. They exist independently of whether they ever touch a queue. A well-modelled event can be written to an append-only table, projected onto a queue, or fed into an analytics pipeline; all without changing its shape. The domain said what happened; the infrastructure decides who needs to know.
Fat Events, Thin Messages
The domain event should be rich. The message you send does not have to be.
There is a much underused pattern where the notification carries only a type and an identifier: "order 4821 changed status". The consumer receives the signal, fetches the current state from the source of truth, and works with that. No fat JSON payloads duplicated across queues. No versioning headaches when the event schema evolves. No stale data baked into a message that sat on a queue for thirty seconds while the record was updated again.
This works because the domain event and the delivery message are separate concerns. The event is your rich, immutable record in the domain layer. The message is a notification that something worth knowing about has happened. Keeping them separate means you can change what a consumer reads without changing what a producer writes. It also means the consumer always sees the latest state, not a snapshot from the moment the message was published.
The pattern applies at every level of the escalation ladder. A signal can carry just an ID. A queue message can carry just an ID. Even a broker message can carry just an ID. The richer the payload you put on the wire, the tighter the coupling between producer and consumer, and the more likely you are to end up coordinating across five teams to change a field.
Signals: The Ignored Power Tool
Signals encapsulate reactions within a process boundary. They coordinate complex operations without distributed infrastructure, and they are simple to implement: the full mechanism is a registry and a loop. If you have used notify(Signal.PROCESSING_STARTED, updated, context) from Your System is a State Machine, you have already used one. That was a signal, not a queue message.
Signals are ideal for audit logging, notifying downstream consumers, triggering secondary writes, billing event capture, and sending notifications. They do have a limitation: they are synchronous and in-process. They are not suitable for heavy processing that needs retries, independent failure handling, or work that should survive a process restart. But for the vast majority of "something happened, now react" cases inside a monolith, a signal is the right tool.
The entire pattern fits in a single file. Here is the whole thing:
# --- Domain layer: define your signal types ---
class Signal:
pass
@dataclass
class OrderCreatedSignal(Signal):
new_order: Order
@dataclass
class OrderUpdatedSignal(Signal):
old_order: Order
new_order: Order
...
# --- Infrastructure: a generic registry wired up at startup ---
S = TypeVar("S", bound=Signal)
class SignalHandlerRegistry:
def __init__(self, context: Context):
self._context: Context = context
self._handlers: dict[Type[Signal], list[Callable[[Signal, Context], None]]] = dict()
def register(self, signal: Type[S], handler: Callable[[S, Context], None]):
self._handlers.setdefault(signal, list()).append(handler)
def notify(self, signal: Signal):
for handler in self._handlers.get(signal.__class__, ()):
handler(signal, self._context)
...
# --- Application code: fire a signal after a state transition ---
def application_code():
...
registry.signals.notify(OrderCreatedSignal(order))
Signal types live in the domain layer. Each signal is a simple dataclass carrying the data a handler needs. The base Signal class exists purely for the type constraint; it carries no behaviour. You define one signal per meaningful thing that happened, not per consumer.
The registry is generic infrastructure. SignalHandlerRegistry is the entire engine: a dictionary mapping signal types to handler lists, a register method to wire them up, and a notify method that loops through the matching handlers. The Context object gives handlers access to the data layer and other services without coupling them to global state. You wire this up once at application startup.
Calling code stays ignorant of who is listening. The application fires OrderCreatedSignal(order) and moves on. Whether that triggers an audit log write, a notification, or nothing at all is a configuration concern, not a caller concern. Adding a new reaction means registering one more handler; the code that created the order never changes.
Collapsing the Queue Spaghetti
Consider a typical over-engineered order flow. The order service publishes to an "order created" queue. A billing service reads from that queue and publishes to a "billing initiated" queue. A notification service reads from both queues to decide whether to send a welcome email. An audit service reads from yet another queue to write a log entry. Four services, four queues, four deployment pipelines, four sets of retry logic, four places where a message can silently disappear.
Now ask: do any of these actually need to be separate processes? Billing is a database write. The welcome email is an API call to your email provider. The audit log is another database write. None of them are computationally heavy. None of them need an independent deployment lifecycle. None of them need to survive a process restart. The only reason they are separate services is that someone reached for a queue when a function call would have done.
Replace the queues with signals and the entire flow collapses into the order service itself:
# At startup: register handlers for the OrderCreated signal
registry.register(OrderCreatedSignal, create_billing_record)
registry.register(OrderCreatedSignal, send_welcome_email)
registry.register(OrderCreatedSignal, write_audit_log)
Three lines of registration. The order service creates the order, fires OrderCreatedSignal(order), and the registry calls each handler in sequence. The billing record is written in the same transaction. The audit log is written alongside it. The email is dispatched in the same process. If any handler fails, you know immediately, in the same call stack, with a real stack trace instead of a dead letter queue.
You have not lost flexibility. Adding a new reaction is still just registering another handler. You have not created a monolithic tangle; each handler is a small, focused function with a single responsibility. What you have lost is four queues, four services, and the operational burden that came with them.
The Escalation Ladder: Signal → Queue → Broker
- Start with signals. If the reaction is lightweight, in-process, and failure means the parent operation also fails, a signal is the right tool. It can even remain the right tool when enqueuing a future work item in another system. Most reactions in a monolith fall here.
- Graduate to a queue when the work is heavy, needs retries, or should not block the parent operation. The queue is a delivery mechanism, not an architectural commitment.
- Graduate to a broker when you genuinely need durable, cross-process delivery: independent services with their own deployment lifecycle consuming the same events. This is the "earn the gadget" threshold. If you can't name the specific constraint that a signal or a simple queue cannot satisfy, you haven't reached it. The operational and cognitive overhead is significant; make sure you are buying something with it.
Summary
Events are domain models. Queues are infrastructure. Signals are in-process coordination. Three distinct concepts, not one fuzzy one.
Start with signals. Escalate to a queue when concrete constraints demand it. Reach for a broker only when cross-process delivery is genuinely required.
The welcome email that triggered a five-team coordination exercise should have been a signal and a function call. Most of your "event-driven" reactions should be too.