Reconsidering the Repository Pattern

I have long considered the repository pattern to be a foundational design pattern for database connected systems. Recently, however, I've started to reconsider this opinion. Read on to find out why.

Remember that design patterns are found by capturing established good practice - they don't fall fully formed from the brow of a fevered architect.

The Repository design pattern was formulated at a time when writing database connected systems involved a lot of difficult and repetitive coding. Connection strings, authentication and connection management. SQL query generation through string concatenation, commands, queries, recordsets and parameters - there was a lot of code to write.

Isolating all of that complexity behind the wall of a convenient interface, keeping the complexity of database access away from any actual business logic made a lot of sense.

Smart developers have always been productively lazy - given the chance to write the same thing a dozen times, most all will find a way to simplify things through good design and reuse. Early attempts at code reuse within data access layers led to a number of different approaches. Some of these were very simple, some were complex and mature (1) and, I'm sure, some were unmitigated disasters.

(1) Back in 1997 I was working with a mature library known simply as "Mocom Data Access" (or MDA). Even then, it was mature, cross platform, performant and robust. Bypassing the MDA with hand generated SQL was seldom necessary, but could be done easily when required. It had some features (such as the ability to tell if an association had been loaded) that modern ORMs, with their devotion to POCO objects, can't easily achieve.

Enter the modern Object Relational Mapper, or ORM.

For new projects we start in 2011, we can drop in any one of a large number of mature ORM libraries. These libraries take care of our connection strings, authentication and connection management. They handle SQL query generation, creating command objects and parameters for our queries and transforming recordsets returned from the database into our domain objects.

Hang on, doesn't that description seem somewhat familiar?

Modern object relational mappers take care of wrapping up (almost) all the complexity that we originally wrapped up by creating repositories.

Doesn't this mean that the ORM could, in effect, be our repository? Why should we wrap up the ORM and incur the cost of creating and maintaining another abstraction layer if the ORM is already doing this job for us?

There's a broader lesson here too. The abstractions we employ in our applications have to deliver significant value, else the overhead of defining and maintaining them ends up as a net-negative for our project. Every abstraction we employ needs to pass this threshold of viability - any abstraction whose costs exceed the benefits is hampering our work, not benefiting it.

Comments

I actually laughed out loud when you made the comparison between Repository and ORMs, that's a really good observation! Somewhere deep inside I must have realized this myself, as I work with ORMs quite a bit and I've naturally bastardized the Repository pattern to mean, "Business Logic that is tightly concerned with consuming the DAL."

Some might argue that it's bit on the extreme side in terms of granularity, but it feels natural and elegant to me and I've used it this way in several projects.

I'm curious what you think of this approach, and would love to hear your feedback.

Cheers
jason

Getting rid of the Repository layer is great way to locking yourself into a (supported) RDBMS. If working on a small project where the (supported) RDBMS will be the only data store that is used, then go for it. But what if working on an enterprise system? Most likely, not only will the (supported) RDBMS be used but other data sources as well (flat files, XML, web service, document databases, non-supported RDBMS) and interchangeably. So a repository pattern will have to be implement to handle these different sources, after we don't want our domain code to care about what the data source is, do we?

Andrew, you touch on a very good point. If you have a system that accesses data from a variety of disparate sources, then something akin to the Repository pattern can be very useful.

I believe the key is to consider the value we gain from the abstraction - if it delivers a lot of value, we should use it; if it doesn't, we shouldn't.

FWIW, I spend most of my time maintaining and enhancing a moderate sized enterprise system - a couple of man-decades of development effort in size. It uses the repository pattern to very good effect and I've never seriously considered ripping it out.

That said, systems architecture needs to be "right sized" to the problem at hand. I've seen plenty of smaller systems that use the Repository pattern for no other reason than "That's the way we do things here", handicapping the system with a heavyweight abstraction that gets in the way of getting the job done.

One final thought - insulating our domain code from the data source needn't involve the Repository pattern; there are other ways to solve that particular problem.

Let me first say that I know that this post is more than a year old so sorry if my comment is completely irrelevant to your current opinion.

First, lets reiterate that good old advice from the classic "Design Pattern", that will probably never get old :

"Depend upon Abstractions. Do not depend upon concretions."

If you ever hear of the SOLID principle, that is what the D is standing for. What does it mean? That your objects should probably depend upon an interface that make sense to them, and not on a very concrete type that specify irrelevant things like the kind of technology used.

Because what is a repository really? Its a contract. Its specify a set of operations to retrieve and store the data your program use. In object oriented software, that would probably be an interface of some kind, and it would be defined very near the object it store and retrieve. So if I had an "Order" class, I would have in the very same package/module/library an OrderRepository interface. Because using one without the other wouldn't make much sense, and they are extremely coupled to each other. The repository interface is part of your domain, because a lot of domain are about remembering some information and using it later. Using a repository make it explicit : "Hey, there is bunch of order that are stored somewhere and you can query them in such and such a way". I once read a book (Growing object oriented software guided by test) where they didn't even call their repository "Repository". They called it an "AuctionHouse" and it had a method "getOngoingAuctions()". Made perfect sense (to me at least).

Now an ORM is most certainly not that nice abstraction that allow the developer not to think about the persistence while he thinker in the domain. Quite the contrary. Its like yelling "HEY BY THE WAY I AM PERSISTED IN A RELATIONAL DATABASE, GET IT?". Sure its better than a mess of query/connection string and whatnot, but its still way too much for most of your object to know about. If you use use your ORM directly, you couple your domain with a specific technology that have no relation to it. You used to couple it to a specific database technology, now you couple it to a specific ORM technology. You sure have reduced the complexity, and that's great, but the coupling problem is still there.

Because here is the biggest problem with either using a database directly or an ORM : its a big, fat dependency on an external library you don't control. You don't want every package in your software to depend upon it, because it make it very brittle relative to any change in that library. New version with some incompatibility? Awesome, now half your program is broken. Cool new feature? Can't use it, it won't work with what is existing. By the way, some Orders now come from that webservice produced by the company we just acquired? Lets change every class in the domain! Fun fun fun... External dependency should be hidden as much as possible, because they change and break and you don't control it, and you don't want those change to ripple trough your software.

Finally, here is the last straw against using ORM directly : Unit Test. They should be fast, and in isolation to each other. If your classes depend upon an ORM, its gonna be a pain to mock/stub. And if you don't, then your test will sometime fail because of the ORM, and that's no fun to debug. If your using TDD its gonna be even more painful because you'll have to initalise your ORM very often as you write a test and you'll kind of lose that quick and productive red-green-refactor loop.

...

So basically : ORM : great for reducing complexity, not so great for reducing coupling. I'll still use them. My persistence module/package/library will be very small and easy to understand thanks to them. But it would still be in a separate module whose change happen in isolation to the rest of the software.

Of course, if your domain is all about taking some data in a data store and shove it a Webpage or any other view in an MVC app, by all mean use your ORM directly in your controller. The only logic you care about is how that data is displayed, and that logic is already isolated in the view. But here some food for thought. Your controller parse a request, fetch/store some data, and pass it to another object (the view). Isn't it, like, the essence of a repository?

It may be a stupid question to ask, but wouldn't now repository layer be tightly coupled with ORM layer? If something changes in ORM, don't we have to change and maintain the repository layer? It looks like in one case we would have to change the domain specific layer and in other case we will have to change the repository layer. Change could be made minimum but is not inevitable.

Am I right or am I just being stupid?