ORMs model databases, databases don’t model objects

I was discussing the merits of a couple different ways to abstract some data common to multiple models at work, and my coworkers asked some questions that forced me to articulate exactly why I have such strong opposition to things like “abstract models.” My conclusion, basically, is that I think ORMs serve as models of the database, not as arbitrary app-level objects, and should be designed as such.

Let’s make up some models here to demonstrate what was going on. We start with a model for cats, of course, called Cat, of course, that looks like this*:

class Cat(Model):
    # cat-specific
    fur = ChoiceField('short', 'medium', 'long')
    breed = CharField()
    # vitals
    heartrate = IntegerField()
    core_temp = FloatField()

but now we’ve decided we want to model frogs, too (god knows why!).

We have a couple different ways of modeling this. Coming at it from the code viewpoint, you might have something like this:

class AnimalMixin(object):
    heartrate = Integerfield()
    core_temp = FloatField()

class Cat(Model, AnimalMixin):
    fur = ...

class Frog(Model, AnimalMixin):
    jump_height = ...

This is the kind of thing that your average OO programmer would reach for (well, perhaps AnimalMixin would merely be a base class called Animal) in app code. And, in that context, it’s probably even defensible!** But this is an extremely code-centric view of things that basically ignores the database. What happens in this case? What tables and columns get created?

Well, it sort of depends. In Django, anyway, there’s a couple ways to do table inheritance: there’s a way where the base class just adds fields to all inherited classes; i.e. you would have a Cat table with heartrate and core_temp columns, and you would have a Frog table with heartrate and core_temp columns. The other way is to do a bunch of joins; i.e. you have a table with heartrate and core_temp columns, and your Cat class has a foreign key back to that table, but in the code you just have access to cat.heartrate and it doesn’t feel like a join.

These two options both have significant drawbacks: the first duplicates columns across multiple tables, and the second turns innocent-looking field access in your code into joins! It is highly non-obvious from reading the code whether an attribute access is going to reference a column of the object (fast, already in memory) or a join to a different table (slow, goes to the DB unless prefetched).

I always sort of flinch when I see these approaches used or recommended, but I’ve never really clarified my thinking around why until today. The issue is that in both cases you’re trying to hide the database from the programmer, and that is not a good thing! Don’t get me wrong, there’s lots of aspects of the database I want hidden: connection strings spring to mind, spelling out long JOIN statements, etc, but the fundamental structure of the tables is extremely relevant to thinking about your application.

So enter the third way of doing this, which gives you slightly-less-convenient app code in exchange for significantly-more-obvious database patterns:

class Vitals(Model):
    heartrate = IntegerField()
    core_temp = FloatField()

class Cat(Model):
    vitals = ForeignKey('Vitals')
    fur = ...

class Frog(Model):
    vitals = ForeignKey('Vitals')
    jump_height = ...

Now what’s a join vs what’s a field is obvious at first glance. If you call my_frog.vitals.heartrate you know you’re going through a foreign key, and if you just access my_frog.jump_height, it’s obvious that was an existing field. Obvious to someone who has some experience with Django, anyway.

I think the difference is which you think of first when you’re coming up with persistent models: your database and the data in it, or your app code and the operations of that data. I give the data primacy, and I think the code should model that. The alternative approach, where the code should be maximally convenient and the database should be molded to make it work, makes me frustrated. Give me nice, clean, well-organized data and I’ll swallow a lot of code duplication for it.

The ORM is in service of the database, not the other way around, and don’t you forget it.

* All examples made up and not checked for correctness.

** I find inheritance loathsome. Do not @ me.

Advertisements
This entry was posted in computer science, software. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s