Blog categories
SQL: Theta vs. ANSI JOIN syntax
Posted by: Ben Lancaster on the: 31 Jul 2006Good database design should adhere to the third normal form, or "3NF". In a nutshell, relating bits of data to other bits of data with minimal memory/storage overheads. In order to utilise a 3NF database effectively, table joins (linking and extracting data from two different internal sources) are an absolute must. In standard Structured Query Language syntax, there are two main join-types: theta and ANSI. Here's an example of each, using the common book/author database example:
# This is a theta join example
SELECT
b.`title`,
CONCAT_WS(' ',a.`firstname`,a.`surname`) AS `author`
FROM
`books` b, `authors` a
WHERE
b.`author_id` = a.`author_id`
Here you can see that the table join takes place in the WHERE clause, so we only pull out records where the books table's author_id foreign key column matches the authors table's primary key of the same name.
The equivalent as an ANSI join is as follows:
SELECT
b.`title`,
CONCAT_WS(' ',a.`firstname`,a.`surname`) AS `author`
FROM
`books` b
INNER JOIN # "INNER" is not required, however it is considered best practice
`authors` a ON b.`author_id` = a.`author_id`
Like the theta example, this will pull out everything from books, where there's an author to go with it. Now suppose we haven't yet stored the author data for a given book, but we still want to extract a list of all of the books in our example - with an ANSI join is nice and easy, just swap the word INNER for LEFT, meaning that the left-hand table (books) takes precidence and does not depend upon there being a corresponding record (or set of) from the joined table.
So how would you replicate that with a theta join? Let's take a look:
SELECT
b.`title`,
CONCAT_WS(' ',a.`firstname`,a.`surname`) AS `author`
FROM
`books` b,
`authors` a
WHERE
b.`author_id` = a.`author_id`
OR a.`author_id` IS NULL
Not as elegant, is it? It's also far easier to create accidental cartesian joins using the theta method, which is not a good situation to be in, especially when your row count goes into the hundreds on either table - the result is very CPU/Memory hungry and can even bring a user's browser, and even a server, to a grinding halt. Theta style joins in general (and in my experience) are notably resource intenstive compared to ANSI equivalents.
It's also worth noting that in versions of MySQL prior to 5.x, it was acceptible to mix join types, i.e. use theta and ANSI joins in one query, like so:
SELECT
b.`title`,
CONCAT_WS(' ',a.`firstname`,a.`surname`) AS `author`
FROM
`reading_list` rl,
`books` b
# Here's the ANSI join...
INNER JOIN
`authors` a ON b.`author_id` = a.`author_id`
# ...and here's the theta join
WHERE
rl.`book_id` = b.`book_id`
I hope that it goes without saying that this is very bad programming practice; it's illogical use two different tools (simultaneously) to do the same job! So much so that the chaps at MySQL have opted to flat-out deny mixed join types in the same query.
For further reading, I can thoroughly recommend Jon Stephens/Chad Russels article on the subject, taken from the book "Beginning MySQL Database Design and Optimization: From Novice to Professional" (ISBN 1-59059-332-4).
There are no comments for this post
Latest News
Site Foundry reaches its first birthday // 20 Sep 2007
After two full point versions and a number of upgrades under the...