Scalability is a Function of Your Data Layer

Introduction

The modern scaling landscape is overwhelming. Queues, event buses, microservices, serverless, Kubernetes—the list of "essential" technologies grows longer every year. It's very easy to get lost in the choices, and even easier to feel pressure to adopt whatever patterns are currently deemed "industry standard". Everyone else seems to be doing it, so it must be right.

Right?

The Stack Overflow Anomaly

Consider Stack Overflow. For over a decade, they debugged the world's software problems using a handful of servers and a surprisingly simple architecture. No microservices. No event sourcing. No exotic databases. Just a well-designed system built on proven, boring technology that scaled to tens of millions of users.

How did they achieve this? And more importantly, why does their approach feel like such an anomaly in today's architecture discussions?

The Danger Zone

The answer lies in understanding a critical but often overlooked danger zone: the space between over-optimisation and under-optimisation.

Over-optimise too early, and you'll drown in premature complexity, vendor lock-in, and crushing cognitive overhead. Your team will spend more time wrestling with distributed systems than building features.

Under-optimise, and you'll build fragile and ad-hoc architectures that buckle under the first real load, forcing expensive rewrites that could have been avoided.

There's a third trap worth naming: "career-driven development". We've all seen it—the choice to adopt a hot new technology not because it solves a real problem, but because it looks good on LinkedIn. These decisions are costly, and they're more common than we'd like to admit.

A Different Path

This article offers a different path. It distills the practical, essential essence of scaling down to a framework you can actually use: a pathway that works now and evolves with you, focusing on the decisions that matter whilst deferring those that don't.

It's not about picking the perfect technology stack upfront. It's about building systems that give you the freedom to make those choices later, when the constraints are clear and the costs are justified.

The secret? Your data layer.

Data Modelling: The Foundation

Data modelling is often treated as an "also-ran" in system design; something rushed through in favour of picking databases and frameworks. Yet it's arguably the most important architectural decision you'll make. Get the domain model right, and everything else becomes easier. Get it wrong, and you'll spend years fighting technical debt.

Storage-Agnostic Design

The key principle is deceptively simple: model the business problem first.

Create types that represent meaningful domain concepts
Think in terms of entities, relationships, and operations
Critically: Don't think about storage yet

That decision comes later, once you understand how these models will actually be used.

Keep It Simple

This approach draws from Domain-Driven Design, but in simplified form:

Clear types: Models should clearly refer to meaningful types within your system
Extendable but not complex: They should grow without becoming over-engineered
Business-aligned: Focus on clarity and business alignment, not database schemas

Consider a simple e-commerce system. Here's what storage-agnostic domain modelling looks like:

class Product:
    id: ProductId
    name: string
    price: Money
    inventory_count: int

class Order:
    id: OrderId
    customer: Customer
    items: List[OrderItem]
    total: Money
    status: OrderStatus  # PENDING, CONFIRMED, SHIPPED, DELIVERED

class Customer:
    id: CustomerId
    email: string
    shipping_addresses: List[Address]

class OrderItem:
    product: Product
    quantity: int
    price_at_purchase: Money  # Capture historical pricing

Notice what's missing: no database fields, no save() methods, no ORM annotations. Just clean domain concepts that model the business problem. These models don't know or care whether they'll live in PostgreSQL, MongoDB, or flat files. That's intentional.

Storage decisions come later, after we understand touchpoints.

Data Touchpoints: Understanding Access Patterns

A touchpoint is how your system actually uses the data. Not "what data exists" but rather "how is it accessed". This includes:

Queries: What data do you read?
Updates: What data do you write?
Frequency: How often?
Latency requirements: How fast must it be?
Volume: How much data?

Understanding touchpoints is the bridge between domain modelling and technology choices.

Why Touchpoints Matter

Touchpoints reveal your system's actual constraints. Different touchpoints have different optimal technologies, and understanding these patterns prevents premature tech choices.

You're not picking technologies based on marketing materials or blog posts, you're picking them based on measured, understood access patterns.

Common Touchpoint Patterns

Transactional touchpoints involve complex updates with low latency and ACID requirements. Consider a LegalCase in a case management system: the access pattern is individual records with frequent small modifications. This implies a need for strong consistency and transaction support.

Analytical or reporting touchpoints are characterised by time-range queries, aggregations, and read-heavy workloads. An AuditReport querying AuditLogItem entries is typical—scan many records, filter by date or other criteria. These touchpoints are optimised for reads and often benefit from denormalized data structures.

Graph or relational touchpoints involve traversal operations and relationship-heavy queries. A GraphQuery exploring interconnected data follows edges and performs multi-hop queries. These can be satisfied either by specialized graph databases or by clever relational modelling with recursive queries and smart indexing.

Beyond these three, other common patterns include high-throughput writes (constant data ingestion), time-series data (temporal queries and trends), and full-text search (text matching and ranking). Each has distinct access characteristics that inform technology choices.

Revisiting Our E-Commerce Example

Let's examine our earlier domain model through the lens of touchpoints:

Product Catalogue:

Transactional touchpoint: Frequent updates to inventory_count, price changes
Search touchpoint: Customers browsing by category, filtering by price range
A single domain model, two very different access patterns

Order Processing:

Purely transactional: ACID properties required
Order creation and inventory deduction must be atomic
Cannot tolerate race conditions during checkout

Customer Analytics:

Analytical touchpoint: Reporting on order history, revenue by segment
The same Order model that required strict transactional guarantees now serves aggregate queries across thousands of records

The key observation: One domain model (Order) can have multiple touchpoint types depending on how it's used. This is why premature technology choices are dangerous—you don't yet know all the ways your data will be accessed.

The Key Insight

Touchpoints determine constraints, not technologies.

You can satisfy most touchpoints with multiple tech choices
The right choice depends on scale, team expertise, and cost
A single PostgreSQL instance can handle transactional, analytical, and even graph workloads ... until it can't

The data layer ensures that "until" doesn't become a crisis.

The Data Layer: Your Scaling Insurance

If you're unfamiliar with the concept of a data access layer, I've written about it in depth in How to Build a Data Access Layer.

The Provider Pattern (In Brief)

The core principles:

Immutable domain models: No ActiveRecord patterns, no embedded queries
Provider pattern: Well-defined interface accepting and returning domain types
Dependency injection: Application code doesn't know about storage implementation

Why This Matters for Scaling

The data layer is your abstraction boundary:

Application code depends on the interface, not the implementation
You can swap storage technologies without touching application logic
When that analytical touchpoint outgrows PostgreSQL, you can introduce an analytics database behind the data layer whilst the application remains blissfully unaware

Your Insurance Policy

Think of the data layer as insurance:

You're not optimising for scale today
You're creating optionality for tomorrow
It's your insurance against lock-in—technological, architectural, and strategic

The Scaling Pathway

The core philosophy is simple: defer decisions, not preparation.

Don't pick technologies for hypothetical scale
Do create the structure that allows future evolution
This distinction is everything

A Pattern I've Seen Repeatedly

I've worked on multiple systems where this philosophy was either followed or ignored. The contrast is stark.

The Multi-Database Trap

An early-stage product used MySQL, MongoDB, and other databases for different features. Each choice seemed logical at the time. Different tools for different jobs.

But when analytics requirements emerged (they always do), joining data across systems became a costly nightmare. Months were spent building ETL pipelines and synchronisation layers that a unified data layer would have avoided entirely.

The Boring-Tech Winner

A high-throughput transaction processing system built with a standard web framework and SQL. Nothing exotic.

It achieved low-latency performance by understanding its touchpoints and building a clean data layer from the start. When scale demands grew, the abstraction made evolution straightforward:

Read replicas were added
Indexes were optimised
High-volume touchpoints were migrated to specialized storage
All without rewriting the application

The lesson: Premature technology diversity costs more than premature optimisation.

Start Simple

Default to boring, proven technology:

SQLite for side projects and MVPs
Single PostgreSQL instance for most startups
Sharded SQL with customer affinity for B2B SaaS products

"Boring technology" is a feature, not a bug.

Why Simple Wins Early

Lower cognitive overhead whilst iterating on the product
Well-understood failure modes and debugging
Easy to hire for with abundant documentation
Optimisation is cheap when you're small

Adding an index or a read replica is a day's work, not a quarter-long migration project.

When to Evolve

Watch for clear signals, not hypotheticals:

Latency violations: Queries consistently missing SLAs
Cost curves: Database costs growing faster than revenue
Query complexity: Application code contorting itself to work around storage limits
Operational pain: Backups, replication, or scaling operations becoming frequent fire drills

The Evolution Process

When these signals appear, follow this process:

Identify the bottleneck touchpoint: Which access pattern is breaking? Be specific.
Evaluate alternatives: What technologies solve this constraint? Not "what's popular", but "what addresses this specific, measured problem".
Implement behind the data layer: Swap storage without touching application code.
Migrate incrementally: Run old and new systems in parallel. Achieve very low or even zero downtime.

Why This Pathway Wins

Pay costs only when benefits are clear
Not locked into early decisions
Proven by experience: Stack Overflow, GitHub, and Stripe all scaled this way—incrementally, deliberately, and without Hail Mary rewrites

Summary: The Real Competitive Advantage

Most teams think scalability is about picking the right database. They're wrong.

It's about building a system that can evolve faster than the competition.

The Hidden Cost of "Best Practices"

Microservices, event buses, and specialized databases all have their place. But adopting them early trades velocity for theoretical scale.

Your competitors who shipped faster and deferred these costs will beat you to market. And in most cases, they'll reach your scale before you've finished configuring Kubernetes.

The Pathway Gives You Both

This pathway outlined in this article gives you:

Ship fast with simple, boring tech
Evolve deliberately when constraints become clear
Never rewrite because your data layer absorbs the changes

Architectural Martial Arts

This is architectural martial arts:

You're not fighting scale; you're using its momentum
The data layer isn't overhead; it's the technique that lets a small team move like a large one

The Final Insight

Most architecture advice tells you what to build. This article tells you what not to build ... yet.

That difference is worth millions in saved engineering time and preserved velocity.

Your competitors aren't beating you with better databases. They're beating you by shipping faster whilst keeping their options open.

A data layer gives you both.

That's the distilled secret to winning the game.

Appendix: Technology Reference

When you do need to evolve, here's a starting point for mapping touchpoints to technologies. This is not prescriptive—your constraints matter more than these labels.

Transactional (high volume): Sharded / Distributed SQL CitusDB, CockroachDB, Spanner, NewSQL
Analytical/Reporting: ClickHouse, TimescaleDB, or stay with PostgreSQL + read replicas
Graph traversal: Relational with recursive queries and smart indexing or specialised DBs (Neo4j, JanusGraph + Cassandra)
Full-text search: Elasticsearch, Meilisearch
High-throughput writes with predictable/restricted queries: Cassandra, ScyllaDB
Time-series: TimescaleDB, InfluxDB