HN Discussion: https://news.ycombinator.com/item?id=19072850
Posted by goostavos
(karma: 1502)Post stats: Points: 139 - Comments: 56 - 2019-02-03T23:15:15Z
Iʼm going to give it to you straight: event sourcing actually comes with drawbacks. If youʼve read anything about the topic on the internet this will surely shock you. After all, itʼs commonly sold as one big fat bag of sunshine and rainbows. You got some kind of a problem? Turns out its actually solved by event sourcing. In fact, most of your life troubles up till now were probably directly caused by your lack of event sourcing.
You, having been seduced by the internet, are probably off to start your event sourcing journey and begin living the good life. Well, before you do that, Iʼm here to ruin it for you and tell you that event sourcing is not actually a bag filled with pure joy, but instead a bag filled with mines designed to blow your legs off and leave you to a crippled life filled with pain.
Why would I say such things? Because Iʼm a guy who previously drank the juice, had the power to make design calls, and took a team down the path of building an event sourced system from scratch. After an aggressive year of deploying a complex application, Iʼve collected a lot of scars, bruises, and lessons learned. Below are my opinions, unexpected hurdles, bad assumptions, bad understandings, after growing an Event Sourced application.
To be clear, this is not a "you should never event source", or an "event sourcing is the worst thing ever", this is just a collection of the unexpected costs and problems that popped up while putting an event sourcing powered system into production. The bulk of these probably fall under "he obviously didnʼt understand X," or "you should never do Y!" in which case you would be absolutely right. The point of this is that I didnʼt understand the drawbacks or pain points until Iʼd gotten past the "toy" stage.
Without further ado...
The core selling point of Event Sourcing is largely an anti-patternIn my humble, opinion, of course
The big Event Sourcing "sell" is the idea that any interested sub-systems can just subscribe to an event stream and happily listen away and do its work. Yʼknow, this picture, that youʼll find in pretty much any Event Sourcing Intro:
The typical event sourcing example pictureImage via: Microservices with Clojure
In practice, this manages to somehow simultaneously be both extremely coupled and yet excruciatingly opaque. The idea of a keeping a central log against which multiple services can subscribe and publish is insane. You wouldnʼt let two separate services reach directly into each otherʼs data storage when not event sourcing – youʼd pump them through a layer of abstraction to avoid breaking every consumer of your service when it needs to change its data – However, with the event log, we pretend this isnʼt the case. "Reach right on in there and grab those raw data events", we say. Theyʼre immutable "facts" after all. And Immutable things donʼt change, right? (cough
In effect, the raw event stream subscription setup kills the ability to locally reason about the boundaries of a service. Under "normal" development flows, you operate within the safe, cozy little walls which make up your service. Youʼre free to make choices about implementation and storage and then, when youʼre ready, deal with how those things get exposed to the outside world. Itʼs one of the core benefits of "services". However, when people are reaching into your data store and reading your events directly, that ʼblack boxʼ property goes out the window. Coordination canʼt be bolted on later, you have to talk to the people who will be consuming the events you produce to ensure that the events include enough data for the consuming system to make a decision.
If you fight through the above obstacle and mange to successfully wire a fleet of services together via an event stream, youʼll be rewarded with a new problem: opacity. With multiple systems just reading an event stream sans any coordination layer, how these system actually work and connect together will eventually be completely baffling. Youʼve basically got all the problems that come with Observer heavy code, but now on the system level. Control becomes inverted in a way that makes it difficult to reason about how data actually flows through the systems, or which systems consume / produce events, or care if theyʼre added / removed / modified, etc.. etc..
Now, to be fair, Eric Evans has a talk where he mentions these problems and advocates for solving them via Process Managers or simple Actor based setups i.e. introducing something which can serve as central coordination point which can route events. However, I didnʼt see that talk until much later. I went in thinking that ledgers would rule the world, and had to slowly discover the need for this meta management layer by painfully bumping into all the bits that donʼt work with the Event Sourcing setup as commonly sold.
The upstart costs are large
Event Sourcing is not a "Move Fast and Break Things" kind of setup when youʼre a green field application. Itʼs a more of a "Letʼs all Move Slow and Try Not to Die" sort of setup. For one, youʼre probably going to be building the core components from scratch. Frameworks in this area tend to be heavy weight, overly prescriptive, and inflexible in terms of tech stacks. If you want to get something up in running in your corporate environment with the tech available to you today, rolling your own is the way to go (and a suggested approach!).
While this path is honestly a ton of fun, itʼs also super time consuming. It will all be time which is not being spent making actual forward progress on your application. Entire sprints will be lost to planning out how you deploy things on the infrastructure available, how to ensure streams behave, messages get processed, how failures will be retried, and then youʼve got to actually go about implementing it, learning what sucks about your choices, implementing it again with your newly gained knowledge, until you end up with a solid enough foundation upon which you can actually begin to build the application in question.
And once youʼre into the implementation stage, youʼll realize something else: the shear volume of plumbing code involved is staggering. Instead of your friendly N-tier setup, youʼve now got classes for commands, command handlers, command validators, events, aggregates, AND THEN your projections, those model classes, their access classes, custom materialization code, and so on. Getting from zero to working baseline requires significant scaffolding. Now, admittedly, how much this hurts is somewhat language dependent, but if youʼre is an already verbose language like Java (like I was), your fingers will be tired at the end of each day.
As a final point on the Getting Started side of things, thereʼs a certain human / political cost involved. Getting an entire development team onboard philosophically is non-trivial. There will be those excited by the idea who read up on it outside work and are down for riding out the growing pains involved in trying alternative development methodologies, and then there will be those who arenʼt into it at all. However, regardless of which "camp" a person is in, disagreements will still mount as everyone tries to figure out how best to build a maintainable a system under a foreign methodology with unclear best practices.
These team problems can additionally creep outside of your immediate development group. Getting tertiary members like UX involved presents its own challenges. Which leads to the unexpected point of...
Event sourcing needs the UI side to play along
This one, while obvious in retrospect, caught me by surprise. If you have a UI, it generally needs to play along with the event driven aspect of the back end. Meaning, it should be task based. However, the bulk of common UI iterations arenʼt designed that way. Theyʼre static and form based. Which means you end up with a massive impedance mismatch between the back-end, which wants small semantic events, and the front-end, which is giving you fat blobs of form data.
A common response to would be the argument that maybe those heavy form driven parts of the application shouldnʼt be written to a ledger at all – let CRUD be CRUD, and thatʼs an interesting argument, which brings me to..
Youʼll potentially be building two entirely different systems along side each other
A super common piece of advice in the ES world is that you donʼt event source everywhere . This is all well and good at the conceptual level, but actually figuring out where and when to draw those architectural boundaries through your system is quite tough in practice.
The core reason is that the requirements that likely led you to Event Sourcing in the first place generally donʼt go away just because some parts of your application are more "CRUD-y". If you still need to audit your data, do you build out a totally different audit strategy for those non-event driven parts, or just reuse the ledger setups youʼve already deployed and tested? What about communication with other systems? Do you build out new communication channels, or reuse the streaming architecture already in place?
Thereʼs no clear answer because no path is ideal. Each one comes with its own pain points and draw backs.
...although this flies in the face of other advice like ["only CRUD when you can afford it"](https://blog.csdn.net/waterboy/article/details/143597+&cd=4&hl=en&ct=clnk&gl=us" target="_blank">https://webcache.googleusercontent.com/search?q=cache:7LlBzKXMJhUJ:https://blog.csdn.net/waterboy/article/details/143597+&cd=4&hl=en&ct=clnk&gl=us))
Past system states from the audit Log will often have fidelity problemsUnless youʼre willing to go into crazy person territory.
Software changes, requirements change, focuses shift. Those immutable "facts," along with your ability to process them, wonʼt last as long as you expect.
We made it about a month before a shift in focus caused us to hit our first "oh, so these events are no longer relevant, at all?" situation. Once you hit this point, youʼve got a decision to make: what to do with the irrelevant / wrong / outdated events.
Do you keep the now deprecated events in the ledger, but "cast" them up to new events (or no-ops) during materialization, or do you rewrite the ledger itself to remove/cast the old events? The best practices in this area are often debated.
Regardless of which path you take, as soon as you take it, youʼve lost the ability to accurately produce the state of your system at the point in time of the rewrite. (unless you have the deep character flaws required to do something completely psychotic, of course).
So, the often sold idea of a "100% accurate audit log" and "easy temporal queries!" ends up suffering from a case of "nope" once you get past the conceptual / toy stage and bump into the real world. If youʼve sold your magical log idea to stake holders, this fidelity loss over time could pose issues depending on your domain.
The audit log is often too chatty for direct use
This one is obviously very business / use case dependent, but having a full low-level audit log of every action in the application was often more of a hindrance than a help. Meaning, most of it ends up being pure noise that actually needs filtered out, both by end users, and by consuming sub-systems. All of those transient "Bob renamed field x to y" are seldom of interest. If youʼre showing the audit log to an end user, more often than not, discrete logical states are of far more value than transient intermediates. So, the "free audit log" actually turns into "tedious projection writing." For downstream systems, this chattiness causes similar coordination woes. "When should I actually run?" and "should I care about event X?" was a common question during design meetings. Itʼs all in the class of problems that require either Process Managers or the introduction of queues to solve.
The audit log as a debugging tool considered: over hyped
Minor, but worth pointing out: another touted benefit to being ledger based is that it helps with debugging. "If you find a bug in your application, you can replay the log to see how you got into that state!" Iʼm yet to see this play out. 99% of the time "bad states" were bad events caused by your standard run-of-the-mill human error. No different than any other "how did that get in the database?" style problem. Having a ledger provided little value over your normal debugging intuition when using a standard db set. Meaning, if an age field was corrupt, youʼd probably know which code to start investigating.
Projections are not actually free
"Youʼre no longer bound to a single table structure", says Event Sourcing. If you need a different view of your data, just materialize the event log in a new way. "Itʼs so easy!"
In practice, this is expensive both in terms of initial development cost and ongoing maintenance. That first extra projection you add doubles the amount of code that touches your event stream. And odds are, youʼll be writing more than one projection. So now you have N things processing this event stream instead of 1 thing. Thereʼs no more DRY from this point forward. If you add, modify, or remove an event type, youʼre on the hook for spreading knowledge of that change to N different places.
Youʼll deal with materialization lag:
Once your data grows to the point where you can no longer materialize from the ledger in a reasonable amount of time, youʼll be forced to offload the reads to your materialized projections. And with this step comes materialization lag and the loss of read-after-write consistency.
Information is now either outdated, missing, or just wrong. Newly created data will 404, deleted items will awkwardly stick around, duplicate items will be returned, you get the gist. Basically all the joys of the eventual part of consistency.
Individually, theyʼre not a huge deal, but these are still things you have to spend time solving. Do you bake in a fall-back strategy for reads? Do you spend time adding smarts to the materialization itself in order to make it faster? Do you write logic to allow the caller to request the type of read they want (i.e. ledger, at the cost of latency, or projected, at the cost of consistency)?
There are a ton of ways to solve it. But you having to solve it is the key thing Iʼm getting at here. This is time that needs to be accounted for, planned, implemented, and deployed (all at the expense the thing youʼre supposed to be solving!).
Finally: You wonʼt really know the pain points until youʼre past the toy level.
This is just the reality of maintaining any long-lived software. Regardless of how much you try to prepare, how much background reading you do, or how many prototypes you build, youʼre doing something totally new. The problems that cause the most pain wonʼt manifest themselves in small test programs. Itʼs only once you have a living, breathing machine, users which depend on you, consumers which you canʼt break, and all the other real-world complexities that plague software projects that the hard problems in event sourcing will rear their heads. And once you hit them, youʼre on your own.
So what now?
Event Sourcing isnʼt all bad, my complaint with it is just that it is wildly over sold as a cure all and rarely are any negative side-effects talked about. I still really like the ideas from event sourcing, itʼs just that putting it into practice caused more pain than I would have otherwise liked.
Whatʼs the take away here? Should I event source or not!?
I think you can generally answer it with some alone time, deep introspection, and two questions:
1. For which core problem is event sourcing the solution?
2. Is what you actually want just a plain old queue?
If you canʼt answer the first question concretely, or the justification involves vague hand-wavy ideas like "auditablity", "flexibility," or something about "read separation": Donʼt. Those are not problems exclusively solved by event sourcing. A good olʼ fashion history table gets you 80% of the value of a ledger with essentially none of the cost. It wonʼt have first class change semantics baked in, but those low-level details are mostly worthless anyway and can ultimately be derived at a later date if so required. Similarly CQRS doesnʼt require event sourcing. You can have all the power of different projections without putting the ledger at the heart of your system.
The latter question is to weed out confused people like myself who thought the Ledgers would rule the world. Look at the interaction points of your systems. If youʼre going full event sourcing, what events are actually going to be produced? Do those downstream systems care about those intermediate states, or will it just be noise that needs to be filtered out? If the end goal is just decoupled processes which communicate via something, event sourcing is not required. Put a queue between those two bad boys and start enjoying the good life.
HackerNewsBot debug: Calculated post rank: 111 - Loop: 100 - Rank min: 100 - Author rank: 41
Don't Let the Internet Dupe you, Event Sourcing is Hard - Blogomatanochriskiehl.com