Sunday, June 14, 2009

Know SQL - Predicates on the WHERE or OUTER JOIN Clauses

Writing good SQL is fundamental in a high performing application free of bugs. SQL has been around as the standard database query language since it left QBE by the roadside in the 1980's, and the core syntax hasn't changed much. Database developers (of any rdbms) should fully understand all ANSI aspects of the language.

I was with some developers recently who hadn't appreciated the difference between a filter predicate on a WHERE and a LEFT OUTER JOIN clause. Essentially the difference is whether the predicate is working on the join or on the resultset. If a predicate is on the right side table of a LEFT OUTER JOIN clause, then it will work on the right hand records only. If the same predicate is on the WHERE clause, the entire record (left and right side) will be filtered out. This can affect the number of records returned.

Here is an example of the difference. The first query has a filter on the WHERE clause, so that any objects joined with columns that have a scale<>0 will be entirely filtered out. That is, not all objects with a name like '%sys%' will be returned.

Select o.* from sys.objects o
left outer join sys.columns c on o.object_id=c.object_id
where o.name like '%sys%'
and c.scale <> 0

This second query has had the columns predicate moved to the LEFT OUTER JOIN clause. This is logically very different since the predicate scale<>0 is working on the join. In this query all objects with a name like '%sys%' will be returned. It’s just a matter of whether they are joined with columns.
Select o.* from sys.objects o
left outer join sys.columns c on o.object_id=c.object_id
and c.scale <> 0
where o.name like '%sys%'


I have used sys.objects and sys.columns, since they are tables (well views actually, but that doesn't matter) in the master database (of either SQL Server 2005 or SQL Server 2008) so you can cut and paste the queries into your query editor and visualise the difference.

This is just one subtlety of the SQL language. If it is news to you and you are a database developer, I suggest that you spend some time learning the SQL language more thoroughly. The good news is that any time you invest in the SQL language will be useful to you for a long time, since it does not appear to threatened by a new language anytime soon. (As much as I love MDX, I can't see it replacing SQL in the rdbms.)

4 comments:

Unknown said...

Very impressive list of examples and dashboards. I have referenced your blog as being a high value resource. You can see the post at:

http://robertlambrecht.spaces.live.com/blog/cns!1738EAC7F6359C6D!2077.entry

Hui Shi said...

For the second query:

This second query has had the columns predicate moved to the LEFT OUTER JOIN clause. This is logically very different since the predicate scale<>0 is working on the join. In this query all objects with a name like '%sys%' will be returned. It’s just a matter of whether they are joined with columns.
Select o.* from sys.objects o
left outer join sys.columns c on o.object_id=c.object_id
and c.scale <> 0
where o.name like '%sys%'

When I remove and c.scale <> 0 from the ON-clause, it returns more data than using ON-clause predicate...
any ideas?

thanks

Hui

Hui Shi said...

richard, I think you change the second query into

Select distinct o.* from sys.objects o
left outer join sys.columns c on o.object_id=c.object_id
and c.scale <> 0
where o.name like '%sys%'

Richard Lees said...

You could do that. The queries aren't really trying to do anything productive, just show the difference between predicates on the JOIN versus WHERE clause.