Marco.org

Jan 30 2009

On database joins

@codinghorror:

basically, it’s like this: JOINS ARE EXPENSIVE. Period. Really, really expensive. I’m not kidding.

You’re right. A lot of people overuse joins because they mask a lot of the underlying complexity of what the database actually has to do to achieve what you just asked of it. And they work perfectly well if your tables are small, there isn’t much concurrent access, and performance doesn’t matter.

Unfortunately, for a popular website, your tables are huge, there’s tons of concurrent access, and performance really matters.

Joins also hurt scaling efforts later: what if you move one high-traffic table to a separate database server? No more joins against it.

Don’t underestimate the benefits of some denormalization and avoiding joins.

Fortunately, they’re really quite easy to avoid. For the common case of matching a relational table to its parent objects, we do a simple two-query substitution like this:

$user_ids = $following->query_return_column_array(
    'SELECT user_id FROM ?table WHERE following_id = ?i', $this->id
);
$followed_users = $user->find(
    'SELECT * FROM ?table WHERE id IN ?ai', $user_ids
);

I don’t think there’s a single join in all of Tumblr’s or Instapaper’s code.

Creative Commons License All original content is licensed under the Creative Commons Attribution 3.0 U.S. License except that which is quoted from elsewhere or attributed to others. In short, you may reproduce, reblog, and modify my content, but you must provide proper attribution.