Yes, but does it scale?
Scale is a journey that always leads to a distributed system. But we need to evolve an architecture when pressure, not philosophy, demands it.
You get the pleasure of designing the architecture of a new product. You do research, a lot of research. You find the perfect database, you figure out the architecture, and you host a meeting to make your proposal.
Your team hears you out, and they nod in agreement - you’ve done a good job.
But then someone unmutes and you hear a voice.
“Yes, but does it scale?”
That question, that dreaded question, has haunted me for the better part of my career. In meetings, in Slack threads, in RFCs, in interviews, in chats at the bar after work. I hear it with my morning coffee, and I half-expect my wife to ask me the same when I tell her about my day at home.
“Yes, but does it scale?”
Honestly, I don’t know that much about scale.
My credentials go as far as working on a couple of critical pieces of infrastructure during my time in media. One of them was the service that made the paywalling decisions for every request going through, the other was the central content gateway that fed data to multiple publishers.
I was on-call at the time of the Queen’s passing, and I saw the lines in Graphana go up in real time.
But I haven’t worked at Netflix or Google. I don’t know what it’s like to be under a natural DDoS purely because of your scale. I haven’t had to implement my own storage mechanism because no off-the-shelf solution works for me.
The one thing I do know, though, is that this question rarely has a yes or no answer.
The Levels of Scale
The fact that something works today under its current environment doesn’t mean that it will work tomorrow when things change. Most often, when we discuss scalability, we focus on increased load and whether the current architecture can withstand it.
But an architecture is designed with a current level of scale in mind, and going above it is a matter of rearchitecting the entire system, not just changing one piece of it.
And a change in scale naturally leads us to create a distributed system.
When I worked for startups, we always started with a monolithic application that all engineers worked on - one service that handles all your logic, all your requests. One database to store all your data. In the beginning, scalability is a distant term that you don’t think about.
But after an outage, you learn that your single application server has been running at 80%-100% CPU capacity, utilizing most of its memory, and every time your users unintentionally synchronize, it risks toppling over.
“When we discussed this, you said it would scale!”
Yes, with the environment in which you were working, it would have.
When you were doing mostly I/O operations, Node.js can do a lot of heavy lifting, but you’re no longer doing that. You’re doing some computationally heavy work - looping and parsing large data structures, and the application is starting to get winded.
So you go into your cloud provider’s dashboard, you find the instance size dropdown, and you select a larger one.
You have just scaled vertically - you found a bigger box for your application where it can fit with some room to spare.
A Larger Jump in Scale
You can handle small jumps in scale by pumping more resources into your application.
Whenever possible, I’d always rather take this approach and scale vertically instead of looking for alternatives because it leaves you with a simpler system. A larger instance size for your server or database doesn’t affect its behavior, and it doesn’t change your day-to-day work.
I can’t stress this enough.
Having the entire complexity of your application in a single box and being able to reason about it without including the network in the equation is a blessing that you should take advantage of.
“Simplicity is a great virtue but it requires hard work to achieve it and education to appreciate it. And to make matters worse: complexity sells better.” ― Edsger Wybe Dijkstra
But there comes a moment when this is no longer feasible. There’s only so much memory and CPU you can give to a single service before it starts giving diminishing returns.
Every system that jumps in scale will hit a non-CPU or non-RAM bottleneck.
The Need to Scale Horizontally
Computational power is more accessible than ever, but Moore’s Law is slowing down, and we’re hitting physical and economic limits faster than before.
Let’s go back to our startup that’s building with Node.
If you need to read a big file or do something that keeps the event loop busy, you can throw a 64-core CPU at your machine, but the single Node.js process still uses one core.
You can keep adding RAM, but Node’s V8 garbage collector becomes slower with huge heaps, and your throughput will drop even if you’re pumping memory like coal.
You can rewrite in a more efficient language, you can add more resources, but this will only buy you some time.
It won’t give you better reliability, stability, or speed.
If your monolith runs on one EC2 instance and it dies, users will face a full outage. With multiple nodes behind a load balancer that are acting as one, the others can handle the traffic while the fallen one restarts.
Developer coordination itself becomes very difficult when everyone is working on the same service. If you’re sharing a database, things like adding tables or indexes becomes difficult.
This hurts both your product’s stability and the engineering team’s productivity, creating a spiraling effect that keeps getting worse. At a certain level of scale, scaling vertically is not the best option.
You will need to scale horizontally.
You’d need to add more machines that work in parallel and split the load among them. But sometimes even that’s not enough because for larger jumps in scale, you will always need to redesign your system.
You Can’t Scale the Design
That’s why, in the beginning, I said I despised the scale question. No solution works at an infinite scale - it will require a redesign. This is not limited only to the vertical-to-horizontal paradigm shift.
In his book Designing Data-Intensive Applications, Martin Kleppmann says this:
“... it is not a one-dimensional label that we can attach to a system: it is meaningless to say “X is scalable” or “Y doesn’t scale”. Rather scalability means considering questions like “If the system grows in a particular way, what are our options for coping with that load” and “How can we add computing resources to handle that additional load?”
He continues by giving an example of how Twitter scaled to handle tweets from accounts that had a large number of followers.
In the beginning, posting a tweet inserted it into a global collection of tweets. When a user opens their timeline, the application looks up the people they follow, finds all their tweets, sorts them by time, and retrieves them.
As Twitter grew, they had no trouble handling the larger quantity of write requests, but the system struggled to keep up with the timeline queries.
So they switched to an approach that maintained a cache for every user. Whenever someone tweeted, they looked up all the people who follow them and inserted the new tweet in their respective timeline cache. Then, retrieving the timeline became cheap since it was computed ahead of time.
But to handle this, they had to redesign their system. Simply adding more resources wouldn’t have helped.
Ask the Right Questions
When you’re discussing scale, ask about a business scenario that’s only present in your domain.
“Will this work for our most followed users?”
That’s a good question. Thinking about the edge cases that may occur in your system can help you identify future problems.
“Will this be able to handle spikes in traffic twice a day?”
That’s another good one. A website that receives large quantities of traffic when people are commuting to and back from work needs to have not only scalability but also elasticity.
“What happens if a third-party API rate-limits us or goes down when we get more traffic?”
That’s a resilience question. It makes you think about timeouts, retries, circuit breakers, and how you can degrade the application without killing it.
You Can’t Architect for a Future You Don’t Know
Talk to an engineer working in a company that’s in the process of transitioning from one level of scale to another, and they will tell you how the engineers who built the monolith had no idea what they were doing.
“Had this been a distributed system from the start, it would’ve scaled without an issue!”
Then they would’ve been able to focus on writing actual features and generating value for the company instead of dealing with outages and detangling logic.
But skipping that step in the evolution of scale usually leads to more problems.
Planning for scalability is difficult.
How do you decide what to scale up front? How do you know which parts of your application need attention? Uber created Schemaless when they faced problems with their storage mechanism, but does this mean they should’ve started with a custom storage mechanism on day one?
Maybe You Don’t Need to Scale
Infinite scale is a foreign idea in most other industries.
There’s no such thing as an infinitely tall building. Everything’s built with a hard limit on how large it can become, but the nature of software is different. We’re not bound by the laws of physics the same way civil engineers are.
But do we always need to handle scale?
Modern games often have queues on launch day, even though they can probably design their infrastructure to handle that load. However, a system designed to handle that initial load would be different from one that can handle normal pressure, and it would be more complex to create.
Would that enormous engineering effort and money spent be worth it for the spike in players during the first couple of days?
Some startups resort to rate-limiting their signups. I’d imagine that it’s partially a hype-generation strategy, but it definitely helps them scale more naturally when they don’t keep the floodgates open.
No need to worry about surprising cloud bills when you’re not scaling at all costs.
Scalability Is About Work, Not Boxes
Something I want to put stress on is that scaling isn’t limited to the size or number of boxes your application is running on. Again, by adding more components to our system, we can improve our scalability, but this doesn’t always translate into load balancers and replicas.
You can scale by adding replicas and resources, but you can also scale by reducing work.
A good caching strategy can take you far.
Especially if you’re using a runtime that’s not that good with files, like Node, you can rely on a CDN to serve your assets. If you’re serving infrequently-changing content, you can cache that in a CDN and rely on it to soak up large portions of the incoming traffic.
When you change the content, you have to invalidate it in the CDN, but cache invalidation is a whole other topic.
This still makes your product more complex - your assets will need to be uploaded elsewhere, you’ll need to figure out access, and build a pipeline for it. But you will improve your scalability without having to manage more infrastructure yourself.
One Fast Service Won’t Save You
There are multiple aspects to load, and when we’re talking about scalability, we need to think about all the components that make up our product.
You can increase the number of services that are handling requests, but if your database can’t handle that traffic, your services’ scalability is pointless. If you’re making calls to an external service as part of your logic, you need to make sure that they can handle that scale of traffic as well.
Something I’ve seen happening in large companies is one team scaling their part of the system to handle more load, only to drown another team’s services in requests they can’t handle.
The whole product still doesn’t work in this scenario, and this partial scalability is meaningless.
Scale and Conway’s Law
Often, we discuss scalability in the context of a codebase, but the word we should actually be using is extensibility.
Certain implementations don’t scale in larger teams, because they’re not organized in a way that allows it - they require developers to modify the same core, which quickly creates complexity hotbeds in the codebase.
So scalability in terms of code means whether we can continue to be productive as we add more functionality and more engineers.
Scale means your team is growing. It means people will start stepping on each other’s toes, they’ll have merge conflicts, they’ll have trouble testing their changes on an actual environment, handling deployments, setting up feature flags, and so on.
Some companies find ways to work productively with large monolithic applications, but the level of engineering discipline it requires makes it difficult. So most teams naturally move to a service-oriented architecture, splitting up the domain between each other.
Then these services that you’ve split between the teams will inevitably start communicating with each other, and you will naturally create a distributed system.
My favorite question to ask in interviews was whether microservices are solving a technical or an organizational problem. Sometimes people get too focused on the technical aspects of scale, but forget that Conway’s Law is the driving force behind many of our architectural decisions.
To Wrap It Up…
I remember my first architectural meeting in a corporation after a few years in startups.
“This is better off as a separate service... We’ll use one database, no we’ll need a graph store too...”
I was sitting there, shocked by how quickly they gobbled up the resources that I would’ve been given to build an entire startup with. And that wasn’t even for a flagship feature!
This was the first time I saw people discussing solutions made to scale. We were designing for large traffic, large quantities of data, and a large team.
But this write-up is long enough as it is, so I’ll leave you with a small piece of advice.
People underestimate how far they can go vertically.
A change in scale will force your system to change as well, but these big shifts don’t happen out of nowhere. They take time, they take effort, and vertical scaling will help you in the meantime.



This piece is so educative!