Why I Chose the Monolith

I stopped product development for three and a half months to collapse a bunch of microservices into a single monolith. Best engineering decision I’ve made.

What I walked into

Developers on my team were burned out from the friction of doing simple things in a system that fought them at every step.

I’d inherited a collection of Node.js services, a React frontend, some Python services, and shared type libraries - each in its own repo, two languages, two type systems. The architecture had been designed by a larger team for traffic volumes that never materialized.

Any change could cascade across services. Triaging meant jumping between separate logging dashboards, guessing at what broke what. A single schema update meant publishing a package, bumping versions, and updating consumers across repos one by one.

The scalability ceiling was impressive. We were nowhere near it and never would be.

Convincing the business to stop shipping

I went back and forth on this for a while before even bringing it up. The technical case was clear, but I wasn’t proposing a refactor - I was asking the business to stop shipping features for months. The optics of telling product “we need to pause for a quarter” are brutal.

I laid out both paths with honest estimates: collapse now and pay 3 to 3.5 months up front, or incrementally improve and drag it out indefinitely while still living with a system too complex for our team size.

I pushed for that transparency deliberately. If things got bumpy mid-migration, I needed stakeholders who understood the reasoning, not stakeholders who’d panic and want to pull the plug. That ended up mattering - there were rough weeks where progress was slow and the pressure to resume feature work was real. Having leadership already bought into the why kept the project alive.

How it went

We ran the new system alongside the old one, shifted traffic gradually, and kept the ability to route back at any point. Services got absorbed one at a time. We consolidated from two languages down to one, which on its own was a huge win - one set of patterns, one build pipeline, one dependency tree instead of maintaining parallel worlds in TypeScript and Python.

Three months. No catastrophic outages.

The problem the split was hiding

After combining HTTP and WebSocket traffic into the same containers, some boxes started struggling under the load of keeping long-lived connections alive while also handling regular requests.

We fixed it at the infrastructure level - dedicated a separate container fleet for WebSocket traffic, same code on both, just different routing. No code changes. But that interaction was always there. It was just invisible when the services happened to be isolated from each other.

That’s a point in the monolith’s favor. It’s easier to find problems when everything is in one place.

Why this keeps happening

The original team was building what was trendy. Microservices were the prevailing architecture, so that’s what they built. I don’t think they stopped to ask whether their system actually needed independent scaling or whether their team was large enough to justify the overhead.

If you don’t have those problems, you’re paying a distributed systems tax that compounds quietly - every feature takes a little longer, every debug session is a little harder, every new hire ramps up a little slower. It adds up to months of lost velocity before anyone notices.

One thing I didn’t anticipate is how much this matters now that AI coding tools are part of our workflow. A single codebase is dramatically easier for both humans and AI to reason about. That advantage is only going to grow.

There’s probably a scale at which we’d need to split things apart again. I’ll deal with that when it’s real.