How to Build a Data Access Layer | The Chris Frequency

As a technical system grows in scale and complexity, teams require more checks, balances, and standards to keep development manageable and maintain delivery velocity. While practices like feature flags and contract testing are common at this stage, one technique is often considered too late in the development cycle: a solid data access layer.

Without a formally defined data access layer, data access becomes ad-hoc. It often relies on rapid application development frameworks, direct use of object-relational mappers (ORMs), or database queries embedded directly in application code.

While this approach may be acceptable in small systems where outgrowing a single database is unlikely, it becomes a significant problem for systems that need to scale. As we shall see, these ad-hoc patterns limit flexibility, are difficult to test, and become hard to reason about in the long term.

Code examples are in typed Python, but the principles apply to any language.

A typical example

Firstly, as a counterpoint to the intended pattern, we will use the typical rapid application development staple of the Active Record pattern being called directly in application code as an example in this context.

class BlogComment:

    def __init__(self, db, post_id, username, comment):
        self._db = db
        self.post_id = post_id
        self.username = username
        self.comment = comment

    def save(self):
        username = self.username
        comment = self.comment,
        self._db.write("blog_comment", self.id, username, comment)

    @property
    def post(self):
        params = {"id": self.post_id}
        return self._db.query("blog_post", params)

    @classmethod
    def query(cls, params) -> List[BlogComment]:
        return self._db.query_many("blog_comment", params)

Note that real ORMs are far more sophisticated than this, but the interface is essentially the same. We include a few common Active Record patterns here, such as:

Object properties (such as id and username) that can be modified at any time.
A save method that modifies the database.
A post property that does a database query under the bonnet.
A query method where the function caller may specify ORM parameters.

This style limits the resulting application's data code's flexibility, testability and ease of future refactorings.

We will expand on the reasons why in the following sections.

This article is neutral on the usefulness (in general) of ORMs. It should be noted that use of ORMs can be a very good idea in some circumstances, when hidden behind a properly defined data access layer, and regarded as an implementation detail.

Unpredictable queries

The query class method exposes the params parameter, which allows for any kind of filter to be specified in the application code. Think of Java's Hibernate Query Language (HQL), Django's keyword argument filters, or the equivalent in your ORM of choice.

This feature creates a leaky abstraction, where database specifics must be reasoned about in your application. You cannot know what conditions, joins (in the case of SQL databases), and other filters are being applied at query time.

For example, we could join across ten tables without knowing it, as this logic is scattered throughout the application.

This makes it very difficult to modify the data access layer without a codebase-wide search, and it violates the principle of separation of concerns.

Hard to mock, fake and test

We have to call the class methods of BlogComment in our application code to access the underlying database, and the post property secretly queries the database to provide a "seamless web of objects" experience to the developer.

Such an interface can make it difficult to create mocks and fakes for these model objects, which can easily break when we change the application code. Even worse, we have to anticipate the incoming queries to the query method, which introduces significant fragility into our unit tests. We can expect a cascade of test failures if we change the filters and queries in the application.

A Better Way: The Provider Pattern

All data should be modelled as basic, immutable “plain-old python objects” (POPOs), “plain-old Java objects” (POJOs or record classes in newer versions of Java), and other POxOs, with the following properties:

Properties must not be mutable or changeable by application code.
Models must not contain embedded or "secret" queries.
Models may contain other model objects either directly or in immutable collections.

The above conditions more or less completely rule out ActiveRecord-style objects.

Create a well defined data interface

All data must be accessible only via a well-defined data interface with the following properties:

It accepts only domain-specific, immutable model types* or other immutable objects, either individually or in collections as function arguments.
It returns only domain-specific, immutable data types* or other immutable objects, either individually or in collections as return types.
It can have more than one implementation, such as a SQL database and a fake implementation that can be used interchangeably from a correctness point of view.

* Some exceptions are possible, such as simple, domain-specific search criteria objects or data transfer objects in the case of complex atomic writes.

Dependency injection

All application code must be able to accept any concrete implementation of the data access interface, via formal or informal dependency injection.

Typically, we can either use a dependency injection framework, or we can simply pass the concrete implementation of the data access layer to the application code when the application is initialised, e.g:

def main():
    blog_post_provider = SQLBlogPostProvider()
    app = MyApplication(blog_post_provider)
    app.listen("0.0.0.0", 80)

This practice in general, has many benefits beyond data access layers, as it enables clean code, makes unit testing much easier, provides a nice framework for separating concerns and centralises code configuration in one place.

A good example

An example of immutable model objects and data interfaces without any unexpected behaviour is as follows:

class BlogComment(NamedTuple):
    id: UUID
    username: str
    comment: str

class BlogPost(NamedTuple):
    id: UUID
    title: str
    content: str
    username: str
    timestamp: datetime
    comments: List[BlogComment]

class BlogPostProvider(ABC):

    @abstractmethod
    def get_one(self, id: UUID) -> BlogPost:
        pass

    @abstractmethod
    def get_for_user(self, username: str) -> List[BlogPost]:
        pass
    
    @abstractmethod
    def submit_post(self, post: BlogPost):
        pass

    @abstractmethod
    def submit_comment(self, post_id: UUID, comment: BlogComment):
        pass

class SQLBlogPostProvider(BlogPostProvider):
    pass  # implementation omitted

class FakeBlogPostProvider(BlogPostProvider):
    pass  # implementation omitted

Note that the providers can use any technology to fulfil the interface contract, and can connect to any kind of SQL or NoSQL database using direct queries, ORMs, or any other framework.

We can then inject the concrete provider implementations in place of the interface in application code, and use this to consume and update data in the application, with the following benefits:

The Benefits

Ease of unit testing

We can easily mock the provider or create a feature-complete in-memory fake version (as hinted at in the code above) for use in unit tests. This implementation can conform completely to the interface, giving us a robust set of artefacts for testing that will not need updating if we decide to change filters or other code in the application.

Ease of integration testing

We can test our data access provider separately from the application during integration tests and be confident of its correct function in production.

This neatly bounds the blast-radius of integration tests, and gives us much better "bang for buck" for our combined unit and integration test suites.

Enhanced ability to refactor

We can at any point add properties to the model objects, create alternative domain models and smoothly migrate application and data access code from one set of models to another.

One scenario that occurs in scaling companies is the need to move to an entirely new data storage technology (such as moving from a SQL database to a different kind of system). We can achieve this with minimal effort by replacing the provider implementation with a new version based on the new data store.

Possibility of live data store migration

If we adhere to these principles, it is even possible to achieve full migration from one datastore to another with no downtime whatsoever by creating a proxy provider that reads and writes from old and new providers:

class MigrationProxyBlogPostProvider(BlogPostProvider):

    def __init__(
            self,
            old: BlogPostProvider,
            new: BlogPostProvider,
            read_from_new: bool,
    ):
        self._old = old
        self._new = new
        self._read_from_new = read_from_new

    def get_one(self, id: UUID) -> BlogPost:
        if read_from_new:
            return self._new.get_one(id)
        else:
            return self._old.get_one(id)

    def get_for_user(self, username: str) -> List[BlogPost]:
        if read_from_new:
            return self._new.get_for_user(username)
        else:
            return self._old.get_for_user(username)
    
    def submit_post(self, post: BlogPost):
        self._old.submit_post(post)
        self._new.submit_post(post)

    def submit_comment(self, post_id: UUID, comment: BlogComment):
        self._old.submit_comment(post_id, comment)
        self._new.submit_comment(post_id, comment)

We can use this proxy store to implement the interface in application code, the sequence of deployment / migration steps is as follows:

Write to old provider only (pre-migration).
Write to both old and new providers, read from old.
Backfill missing data from new datastore from old datastore until all data migrated.
Write to both old and new providers, read from new.
Write to new provider only (migration complete).

This allows a full migration from one datastore to another, assuming we can deploy services without downtime using Kubernetes or similar orchestration systems.

A real benefit in mission-critical systems!

The Investment: Acknowledging the Upfront Cost

The pattern described above is not the path of least resistance. Frameworks that champion the Active Record pattern do so because it offers the seductive illusion of speed. A few commands can generate a model, and you can immediately start creating, reading, and updating records anywhere in your application.

This initial velocity, however, is a form of technical debt. The pain doesn't come from a single bad decision, but from a thousand small, seemingly harmless ones. Each time a developer writes User.find_by(...) directly in a controller, they weave the application logic and the database schema a little tighter. This is a slow death by a thousand cuts, leading to a system where:

Engineers fear changing a database column because they can't predict what will break.
Unit tests become a tangled mess of mocks for a database you can't control.
Performance bottlenecks appear because a "simple" property access is secretly firing off N+1 queries.

The approach advocated here requires more discipline. It's an upfront investment in technical wealth. You write more code initially—defining interfaces, creating immutable models, and implementing providers. The payoff is a system that is decoupled, predictable, and built to last. It's a conscious choice to trade short-term convenience for long-term maintainability and peace of mind.

The True Power of Immutability

A core part of this investment is the principle of immutability. When a mutable object is passed around an application, any part of that application can change its state at any time, creating hidden side effects that are incredibly difficult to debug.

An immutable model, combined with a data provider that returns new instances of that model upon change, makes the flow of data explicit and predictable. You know exactly where and when data changes, because it can only happen through the well-defined interface of your provider. This isn't just a "nice to have"; it's a fundamental principle for building systems you can reason about.

Conclusion

The choice between an ad-hoc data access pattern and a disciplined one is not a technical detail; it's a strategic decision about the kind of system you want to build. While the initial convenience of patterns like Active Record is tempting, the long-term cost in complexity, fragility, and fear of change is immense.

By investing in a well-defined data access layer with immutable models, you are buying future flexibility. You are creating a system that can be tested with confidence, refactored without fear, and scaled to meet new challenges. This is a very valuable property for any fast-moving, rapidly scaling organisation. For example, at ComplyAdvantage, this approach has been critical in managing growth and complexity.