
Why You Need MicroservicesWhy Your Enterprise Software Will Choke on a Monolith
There’s a common, expensive fairy tale in engineering: “To scale, you must start with microservices.” Real systems don’t start complex, they earn complexity.
Aggrey
Distributed Systems · Spring Boot · Scaling · Service Mesh
7 min read · Oct 17, 2026
In reality, starting with microservices is often just a sophisticated way to ensure your startup dies of operational complexity before it ever finds product-market fit. The most successful systems follow a different path: they don’t start complex; they earn complexity.
This is the story of the Architecture Graduation: seven stages of scaling SparrowX, a distributed systems project that evolves a Twitter-like social platform from a single Spring Boot application into a microservices-based, event-driven, agentic system. Along the way, it explores the engineering trade-offs behind Spring Boot, Kafka, gRPC, Kubernetes, AI agents, and RAG architecture in a production-style backend environment.
Stage 1: The Hardened Monolith (S1)
Don’t mistake “Monolith” for “Simple.” S1 is a single deployable, but it’s surrounded by production-grade infrastructure.
In S1, you optimize for product velocity while preparing for the future by cutting clean seams. You apply Logical CQRS: write-heavy Tweet flows and read-heavy Feed flows live in the same codebase, but they are intentionally separated in structure, contracts, and execution paths. This isn’t about multiple databases yet—it’s about making sure that when the day comes to split, the surgery isn’t fatal.
- Logical CQRS: separate write paths (Tweet) from read paths (Feed) inside one deployable.
- Latency containment: Resilience4j bulkheads/timeouts so slow reads can’t starve writes.
- Work reduction: Caffeine (L1) + Redis (L2) to shield the relational DB.
- Visibility first: LGTM stack (Loki, Grafana, Tempo, Prometheus) to spot cracks early.
- Async paths: Outbox pattern to push expensive work off the critical request thread.
The Point of M1: minimize architectural complexity, survive operational reality
- One deployment pipeline for business logic, fast iteration and tight feedback loops.
- Production-grade guardrails around it: observability, caching, resilience, and async work.
- Clear seams (CQRS boundaries) so future splits are driven by data, not guesswork.
Result: you stay in the monolith as long as humanly possible without being blind, brittle, or one incident away from panic-splitting into microservices.
The User Shift That Drove This Stage: Presence & Expression
At the beginning, users are testing existence, not scale. They want to know if they can speak, if the system responds, and if their actions matter. Writes dominate value, tolerance for failure is high, and product velocity matters more than throughput.
Stage 2 : The Timeline Breakout (The Read Spike) - (S2)
You don’t split the monolith because it’s “cool.” You split it because you’ve hit the Decision Rubric:
- Hotspot Scaling Waste: Timeline needs ~10× more CPU than anything else.
- Noisy Neighbor Risk: Timeline spikes are causing tweet writes to miss their Service Level Objectives (SLOs).
At this point, the system is no longer failing because of bugs. It is failing because of physics. Timeline traffic is read-heavy, bursty, and cache-resistant at the edges, while tweet creation is write-heavy and latency-sensitive. Keeping both on the same execution path guarantees that success in one actively harms the other.
The Timeline is carved out into its own service. This is the most critical pivot because it forces you to define how your data lives. You move from DB joins to service calls, from implicit coupling to explicit contracts, and from accidental contention to intentional isolation.
The User Shift That Drove This Stage: Consumption & Dopamine
Once content exists, users stop creating and start consuming. Scrolling becomes the dominant behavior, amplifying reads far beyond writes and concentrating load into predictable peaks. The feed becomes the system, and anything that slows creation is no longer acceptable.
Stage 3 : The Tweet Service: Hydration & Reaction Strain (S3)
After the timeline split, profile hydration fan-out and reaction write amplification become the next dominant load.
With the timeline split out, reads and writes begin colliding in new ways. Timeline construction now requires hydrating each entry with profile data, while increased consumption drives more likes, replies, and retweets back into the system. What used to be a single tweet write becomes a cascade of secondary writes triggered by engagement. Each scroll now fans out in two directions: outward to fetch profile context for dozens of users, and inward as reactions flow back into the core write paths. The system still functions, but latency becomes multiplicative and write contention becomes visible under peak load. At this stage, profile data and reaction writes still live in the core system, but their cost is no longer hidden. The architecture hasn’t failed yet, but the pressure has become structural rather than incidental.
- Hydration fan-out: each timeline page requires profile lookups for dozens of distinct users.
- Reaction amplification: likes, replies, and retweets outnumber original tweet writes by multiples.
- Bidirectional load: reads trigger writes, and writes feed back into reads.
- Latency multiplication: small per-call delays compound across hydration and reaction paths.
The User Shift That Drove This Stage: Reaction & Social Proof
As users engage, stamps of validation are often expressed through likes, replies, and shares that become signals of relevance. One write now triggers many secondary writes, creating feedback loops that intensify hot paths. The system must protect creation from the very engagement that gives it value.
Stage 4: The Profile Hydration Layer (S4)
With the timeline isolated, identity stops being an implementation detail and starts behaving like infrastructure.
With Timeline and Tweet paths separated, the system hits a new, unavoidable cost: the Hydration Tax. A timeline is no longer just tweet IDs, it must be enriched with user-facing identity data:
- Display name and avatar
- Verification or reputation signals
- Follow and relationship context
If the Timeline service synchronously calls back into the core system dozens of times per request, you haven’t escaped the monolith, you’ve rebuilt it over the network as a distributed monolith.
The Solution: spin out the Profile Service
- Batch-oriented APIs (e.g. GetProfilesBatch(userIds)) to eliminate N+1 calls
- Aggressively cached profile blobs for globally hot, read-heavy access
- Clear ownership of identity data and the follow graph
Result: Timeline hydration becomes bounded and predictable, and profile data turns into globally cacheable infrastructure. An 8-10 person team naturally splits: one sub-team owns identity and social graphs, while another owns feed construction and ranking.
The User Shift That Drove This Stage: Identity & Trust
When content floods the feed, identity becomes the filter. Users increasingly ask who is speaking, clicking profiles to assess credibility, authority, or fame. Profile data turns into globally hot state, and inefficient hydration becomes a tax paid on every interaction.
Stage 5: Searching with Different Physics (S5)
Once identity is infrastructure, the next bottleneck isn’t scale. It’s meaning.
Relational databases are excellent for questions like “Who does User A follow?” They are terrible for questions like “Find tweets about climate that are trending right now in London or San Fransisco.” As discovery features grow, Search reveals a different set of laws: relevance, ranking, tokenization and inverted indexes. Not joins and transactions.
- Query patterns shift from bounded lookups to open-ended exploration.
- Workloads become CPU-heavy (analysis, scoring) and index-heavy (writes + merges).
- Freshness matters, but strict consistency often doesn’t.
The Solution: isolate Search as its own system
- Use Elasticsearch/Solr-style indexing instead of relational querying.
- Consume TweetCreated events (outbox → broker) and update the index asynchronously.
- Treat staleness as acceptable; treat backpressure as mandatory.
Result: Search can scale, rebuild, or fall behind without taking down the social core. The platform gains a dedicated discovery engine with its own data model, failure modes, and tuning knobs.
The User Shift That Drove This Stage: Meaning & Discovery
At scale, users no longer want more content, they want relevant content. Search, topics, and trends reflect a shift from consumption to sense-making. Discovery workloads follow different physical laws, where eventual consistency is acceptable and relational models begin to fail.
Stage 6: The Fanout Explosion (S6)
At scale, attention becomes more volatile than traffic.
Notifications introduce a new asymmetry into the system. A single action such as a celebrity tweet or a breaking news event can require millions of downstream deliveries. In a shared execution environment, this fanout storm can exhaust database connections, worker threads, and queues meant for core product flows.
- Fanout grows with follower count, not request rate.
- Delivery urgency varies by user and event type.
- Backlogs are acceptable; blocking the write path is not.
The Solution: isolate Notifications and control fanout
- Dedicated workers and queues for notification generation and delivery.
- Priority tiers (e.g. verified users, mentions, direct interactions).
- Aggressive deduplication and rate-limiting per user.
Result: Fanout storms are absorbed instead of amplified. Notifications can lag, batch, or shed load during peaks, while core social interactions remain fast, predictable, and protected.
The User Shift That Drove This Stage: Attention & Reach
Notifications introduce attention as a first-class resource. A single action can now fan out to millions, and timeliness matters more than completeness. Fanout must be isolated, prioritized, and allowed to lag without ever blocking the core product.
Stage 7: The Agentic Frontier (S7)
At this point, the system stops serving interactions and starts serving cognition.
Agentic Context Engineering workloads operate under entirely different constraints than social traffic. While a timeline request is measured in milliseconds, an agent may take seconds to reason, retrieve context, call tools, and synthesize an answer. If these workloads share execution paths with the core product, thread pools saturate instantly and tail latencies explode.
- Latency shifts from milliseconds to seconds.
- Cost per request becomes highly asymmetric and unpredictable.
- Throughput matters less than isolation and bounded concurrency.
The Solution: isolate agentic intelligence
- Dedicated compute pools (CPU/GPU) isolated from web and feed traffic.
- Async-first workflows with streaming or callback-based results.
- Explicit orchestration of retrieval, tool calls, and LLM execution.
Result: The platform gains a cognitive layer that can reason, summarize, and synthesize at scale. You don't compromise latency, reliability, or economics of the core social system.
The User Shift That Drove This Stage: Understanding & Leverage
Finally, users seek understanding rather than interaction. Agentic and RAG workloads trade latency for synthesis, operating on entirely different cost and time scales. At this stage, the system is no longer serving clicks—it is serving cognition.
The Golden Rule of Graduation
The journey from Stage 1 to Stage 7 is a series of trade-offs between user needs, product ambition, and technical reality. Every stage introduces pain: stay too long and fragility grows; move too early and complexity explodes. The Principal Engineer / Lead's role is to choose which pain the business can afford. As the system matures, user behavior forces change. Read amplification, fanout storms, and hydration tax are signals of success, not smells. Graduation occurs when debt stops buying speed and starts charging interest. At that point, splitting a module is organizational alignment encoded in software. Scale comes from staying in the monolith until the data proves otherwise.