Scalability is a Function of Your Data Layer
Introduction
The modern scaling landscape is overwhelming. Queues, event buses, microservices, serverless, Kubernetes—the list of "essential" technologies grows longer every year. It's very easy to get lost in the choices, and even easier to feel pressure to adopt whatever patterns are currently deemed "industry standard". Everyone else seems to be doing it, so it must be right.
Right?
The Stack Overflow Anomaly
Consider Stack Overflow. For over a decade, they debugged the world's software problems using a handful of servers and a surprisingly simple architecture. No microservices. No event sourcing. No exotic databases. Just a well-designed system built on proven, boring technology that scaled to tens of millions of users.
How did they achieve this? And more importantly, why does their approach feel like such an anomaly in today's architecture discussions?
The Danger Zone
The answer lies in understanding a critical but often overlooked danger zone: the space between over-optimisation and under-optimisation.
Over-optimise too early, and you'll drown in premature complexity, vendor lock-in, and crushing cognitive overhead. Your team will spend more time wrestling with distributed systems than building features.
Under-optimise, and you'll build fragile and ad-hoc architectures that buckle under the first real load, forcing expensive rewrites that could have been avoided.
There's a third trap worth naming: "career-driven development". We've all seen it—the choice to adopt a hot new technology not because it solves a real problem, but because it looks good on LinkedIn. These decisions are costly, and they're more common than we'd like to admit.
A Different Path
This article offers a different path. It distills the practical, essential essence of scaling down to a framework you can actually use: a pathway that works now and evolves with you, focusing on the decisions that matter whilst deferring those that don't.
It's not about picking the perfect technology stack upfront. It's about building systems that give you the freedom to make those choices later, when the constraints are clear and the costs are justified.
The secret? Your data layer.
Data Modelling: The Foundation
Data modelling is often treated as an "also-ran" in system design; something rushed through in favour of picking databases and frameworks. Yet it's arguably the most important architectural decision you'll make. Get the domain model right, and everything else becomes easier. Get it wrong, and you'll spend years fighting technical debt.
Storage-Agnostic Design
The key principle is deceptively simple: model the business problem first.
- Create types that represent meaningful domain concepts
- Think in terms of entities, relationships, and operations
- Critically: Don't think about storage yet
That decision comes later, once you understand how these models will actually be used.
Keep It Simple
This approach draws from Domain-Driven Design, but in simplified form:
- Clear types: Models should clearly refer to meaningful types within your system
- Extendable but not complex: They should grow without becoming over-engineered
- Business-aligned: Focus on clarity and business alignment, not database schemas
Consider a simple e-commerce system. Here's what storage-agnostic domain modelling looks like:
class Product:
id: ProductId
name: string
price: Money
inventory_count: int
class Order:
id: OrderId
customer: Customer
items: List[OrderItem]
total: Money
status: OrderStatus # PENDING, CONFIRMED, SHIPPED, DELIVERED
class Customer:
id: CustomerId
email: string
shipping_addresses: List[Address]
class OrderItem:
product: Product
quantity: int
price_at_purchase: Money # Capture historical pricing
Notice what's missing: no database fields, no save() methods, no ORM annotations. Just clean domain concepts that model the business problem. These models don't know or care whether they'll live in PostgreSQL, MongoDB, or flat files. That's intentional.
Storage decisions come later, after we understand touchpoints.
Data Touchpoints: Understanding Access Patterns
A touchpoint is how your system actually uses the data. Not "what data exists" but rather "how is it accessed". This includes:
- Queries: What data do you read?
- Updates: What data do you write?
- Frequency: How often?
- Latency requirements: How fast must it be?
- Volume: How much data?
Understanding touchpoints is the bridge between domain modelling and technology choices.
Why Touchpoints Matter
Touchpoints reveal your system's actual constraints. Different touchpoints have different optimal technologies, and understanding these patterns prevents premature tech choices.
You're not picking technologies based on marketing materials or blog posts, you're picking them based on measured, understood access patterns.
Common Touchpoint Patterns
Transactional touchpoints involve complex updates with low latency and ACID requirements. Consider a LegalCase in a case management system: the access pattern is individual records with frequent small modifications. This implies a need for strong consistency and transaction support.
Analytical or reporting touchpoints are characterised by time-range queries, aggregations, and read-heavy workloads. An AuditReport querying AuditLogItem entries is typical—scan many records, filter by date or other criteria. These touchpoints are optimised for reads and often benefit from denormalized data structures.
Graph or relational touchpoints involve traversal operations and relationship-heavy queries. A GraphQuery exploring interconnected data follows edges and performs multi-hop queries. These can be satisfied either by specialized graph databases or by clever relational modelling with recursive queries and smart indexing.
Beyond these three, other common patterns include high-throughput writes (constant data ingestion), time-series data (temporal queries and trends), and full-text search (text matching and ranking). Each has distinct access characteristics that inform technology choices.
Revisiting Our E-Commerce Example
Let's examine our earlier domain model through the lens of touchpoints:
Product Catalogue:
- Transactional touchpoint: Frequent updates to
inventory_count, price changes - Search touchpoint: Customers browsing by category, filtering by price range
- A single domain model, two very different access patterns
Order Processing:
- Purely transactional: ACID properties required
- Order creation and inventory deduction must be atomic
- Cannot tolerate race conditions during checkout
Customer Analytics:
- Analytical touchpoint: Reporting on order history, revenue by segment
- The same
Ordermodel that required strict transactional guarantees now serves aggregate queries across thousands of records
The key observation: One domain model (Order) can have multiple touchpoint types depending on how it's used. This is why premature technology choices are dangerous—you don't yet know all the ways your data will be accessed.
The Key Insight
Touchpoints determine constraints, not technologies.
- You can satisfy most touchpoints with multiple tech choices
- The right choice depends on scale, team expertise, and cost
- A single PostgreSQL instance can handle transactional, analytical, and even graph workloads ... until it can't
The data layer ensures that "until" doesn't become a crisis.
The Data Layer: Your Scaling Insurance
If you're unfamiliar with the concept of a data access layer, I've written about it in depth in How to Build a Data Access Layer.
The Provider Pattern (In Brief)
The core principles:
- Immutable domain models: No ActiveRecord patterns, no embedded queries
- Provider pattern: Well-defined interface accepting and returning domain types
- Dependency injection: Application code doesn't know about storage implementation
Why This Matters for Scaling
The data layer is your abstraction boundary:
- Application code depends on the interface, not the implementation
- You can swap storage technologies without touching application logic
- When that analytical touchpoint outgrows PostgreSQL, you can introduce an analytics database behind the data layer whilst the application remains blissfully unaware
Your Insurance Policy
Think of the data layer as insurance:
- You're not optimising for scale today
- You're creating optionality for tomorrow
- It's your insurance against lock-in—technological, architectural, and strategic
The Scaling Pathway
The core philosophy is simple: defer decisions, not preparation.
- Don't pick technologies for hypothetical scale
- Do create the structure that allows future evolution
- This distinction is everything
A Pattern I've Seen Repeatedly
I've worked on multiple systems where this philosophy was either followed or ignored. The contrast is stark.
The Multi-Database Trap
An early-stage product used MySQL, MongoDB, and other databases for different features. Each choice seemed logical at the time. Different tools for different jobs.
But when analytics requirements emerged (they always do), joining data across systems became a costly nightmare. Months were spent building ETL pipelines and synchronisation layers that a unified data layer would have avoided entirely.
The Boring-Tech Winner
A high-throughput transaction processing system built with a standard web framework and SQL. Nothing exotic.
It achieved low-latency performance by understanding its touchpoints and building a clean data layer from the start. When scale demands grew, the abstraction made evolution straightforward:
- Read replicas were added
- Indexes were optimised
- High-volume touchpoints were migrated to specialized storage
- All without rewriting the application
The lesson: Premature technology diversity costs more than premature optimisation.
Start Simple
Default to boring, proven technology:
- SQLite for side projects and MVPs
- Single PostgreSQL instance for most startups
- Sharded SQL with customer affinity for B2B SaaS products
"Boring technology" is a feature, not a bug.
Why Simple Wins Early
- Lower cognitive overhead whilst iterating on the product
- Well-understood failure modes and debugging
- Easy to hire for with abundant documentation
- Optimisation is cheap when you're small
Adding an index or a read replica is a day's work, not a quarter-long migration project.
When to Evolve
Watch for clear signals, not hypotheticals:
- Latency violations: Queries consistently missing SLAs
- Cost curves: Database costs growing faster than revenue
- Query complexity: Application code contorting itself to work around storage limits
- Operational pain: Backups, replication, or scaling operations becoming frequent fire drills
The Evolution Process
When these signals appear, follow this process:
- Identify the bottleneck touchpoint: Which access pattern is breaking? Be specific.
- Evaluate alternatives: What technologies solve this constraint? Not "what's popular", but "what addresses this specific, measured problem".
- Implement behind the data layer: Swap storage without touching application code.
- Migrate incrementally: Run old and new systems in parallel. Achieve very low or even zero downtime.
Why This Pathway Wins
- Pay costs only when benefits are clear
- Not locked into early decisions
- Proven by experience: Stack Overflow, GitHub, and Stripe all scaled this way—incrementally, deliberately, and without Hail Mary rewrites
Summary: The Real Competitive Advantage
Most teams think scalability is about picking the right database. They're wrong.
It's about building a system that can evolve faster than the competition.
The Hidden Cost of "Best Practices"
Microservices, event buses, and specialized databases all have their place. But adopting them early trades velocity for theoretical scale.
Your competitors who shipped faster and deferred these costs will beat you to market. And in most cases, they'll reach your scale before you've finished configuring Kubernetes.
The Pathway Gives You Both
This pathway outlined in this article gives you:
- Ship fast with simple, boring tech
- Evolve deliberately when constraints become clear
- Never rewrite because your data layer absorbs the changes
Architectural Martial Arts
This is architectural martial arts:
- You're not fighting scale; you're using its momentum
- The data layer isn't overhead; it's the technique that lets a small team move like a large one
The Final Insight
Most architecture advice tells you what to build. This article tells you what not to build ... yet.
That difference is worth millions in saved engineering time and preserved velocity.
Your competitors aren't beating you with better databases. They're beating you by shipping faster whilst keeping their options open.
A data layer gives you both.
That's the distilled secret to winning the game.
Appendix: Technology Reference
When you do need to evolve, here's a starting point for mapping touchpoints to technologies. This is not prescriptive—your constraints matter more than these labels.
- Transactional (high volume): Sharded / Distributed SQL CitusDB, CockroachDB, Spanner, NewSQL
- Analytical/Reporting: ClickHouse, TimescaleDB, or stay with PostgreSQL + read replicas
- Graph traversal: Relational with recursive queries and smart indexing or specialised DBs (Neo4j, JanusGraph + Cassandra)
- Full-text search: Elasticsearch, Meilisearch
- High-throughput writes with predictable/restricted queries: Cassandra, ScyllaDB
- Time-series: TimescaleDB, InfluxDB