You build a startup quickly, scale the tech as far as it goes. You’ve got one codebase, and one scary-as-hell deploy. You’re known for failure at least as well as you’re known for your product.
So you break up your service into microservices. Each can be deployed independently. Each can be owned by a team who understands the whole of the thing, and so the changes therein. Responsibility and accountability are delegated to those teams.
It’s a proven pattern. And very successful.
At Twitter, we run dozens of microservices. We have macaw-users, serving the user information apis. We have macaw-tweets, serving tweets. We have macaw-timelines, serving timelines. It all makes a lot of sense. The tweets service can roll out a deploy, observe issues with their endpoints, roll back and fix – all without bothering the other services.
This is an ideal way to scale an api.
Your mobile apps can use this api. Maybe with JSON, or something more efficient. If an endpoint fails now, a single team is paged and the service recovers. Users might see a momentary error with some actions. The mobile app teams needn’t even know; they worry about their own code deploys, not the reliability of their apis.
A website is a different animal. One web page is the combined output from many of your apis, many of those services. And for speed of delivery, you are rendering those web pages on the server.
For server-side rendered web pages, the output of your apis is combined inside your own datacenters. This is happening on your own servers.
The microservice model starts to fail. If a service you depend upon is experiencing issues, your web pages are seeing that issue and throwing error pages themselves. Your service may fall below SLA before theirs does. Suddenly, you’re a single point of failure, and so primary oncall for all microservices. Farewell sleep.
It gets worse. Since your service is inside the DC, you use a more efficient api layer with compile-time dependencies. When you compile your project, you need to build every other service. Of course your build system has caching, but what does that matter when every backend change forces a recompile on you?
It gets worse. The other services start to focus on mobile. They build business logic into the json formatter. It’s not as crazy as it sounds – of course they use shared objects for json. And checks at the edge are the simplest. But now you’re missing them on web. And, oh, resources are tight. Build it yourself or miss out.
It gets worse. Your integration test suite needs fixture data in code. But the mobile clients just depend on manual QA and json fixtures. Effective integration testing falls away, and deploys gets even more scary.
Development grinds to a halt.
This is where we were. Let me tell you where we went, how we made things better.
We can write, deploy, test and be oncall for our own app, and stop worrying about others.
We can use the new api formats and features as soon as they’re ready.
We did it.
Or did we?
Another route we could have taken is to rethink this from the perspective of the user. To the user it doesn’t matter whether an api call failed in the client or the server. A failure is catchable and can be well handled. We had an SLA on the server because we could easily measure it. Should that be different on the client where we can’t? There’s no doubt in my mind that our service could have been coded more defensively, mitigating the vast majority of pages and alerts.
So why did we not?
Partly because our service isn’t built on a robust web framework. It’s a migration of our ruby-on-rails system to Scala. This was necessary for the complexity of a migration while operating at scale. The framework is optimised towards api development, which again makes sense, as most services are apis. It’s logical that our graphs, monitoring and alerts are also configured this way. APIs are not websites.
And finally it’s because web engineers have a tricky enough job writing good quality modular scalable robust crossbrowser user experiences. We should be allowed our focus and not have to worry about the intricacies of server code in a foreign language.
Try hiring a CSS developer with Scala skills. I’m sure Google has some. Good luck.
So we went PWA and we’re not going back. Was it necessary? Maybe, maybe not. Was it worth it? Hell yes.
I shall sleep well tonight.