Michael Steinberg's Blog
My Links
Blog Stats
  • Posts - 3
  • Stories - 1
  • Comments - 0
  • Trackbacks - 2
Article Categories
Archives

A NULL value in a database record can create some interesting problems for a variety of database operations, but people get especially confused about how to best deal with them for joins.  Let's say you have these records in your SalesPerson table:

LastName        FirstName        Phone
Humphrey         John                 214.555.1111
Steinberg          Michael             NULL
Boswell             Aaron                214.555.2222
Orrell                Brian                 214.733.3333

And you have an older version of the table with these records:

LastName        FirstName        Phone
Humphrey         John                 972.555.4444
Steinberg          Michael             NULL
Boswell             Aaron                214.555.2222

If you wanted to know which records from the old version were exact matches in the new version, you could join them and see what matches.  Since you don't have a unique primary key and because you want to return records that are complete matches, you have to use all fields in the join, which would look like this:

SELECT

      SalesPerson.LastName

    , SalesPerson.FirstName

    , SalesPerson.PhoneNumber

FROM

      SalesPerson

JOIN

    SalesPerson_OLD

    ON SalesPerson.LastName = SalesPerson_OLD.LastName

       AND SalesPerson.FirstName = SalesPerson_OLD.FirstName

       AND SalesPerson.PhoneNumber = SalesPerson_OLD.PhoneNumber

 

Only one record is returned from this query:  Aaron Boswell is a complete match in both tables.  John Humphrey was left out because his phone number changed and Brian Orrell only existed in the new version of the table.  But why didn’t the record for Michael Steinberg result in a match?

 

Of course, the reason is the NULL value for the phone number.  The important thing to keep in mind about a NULL value is that it does not just mean “empty” or “nothing.” It can also mean “unknown.” Since 2 unknowns can not be said to be equal, it follows that NULL <> NULL.

 

So, what’s the right solution?  One option is to use ISNULL to convert all NULL values to something that can be compared:

 

SELECT

      SalesPerson.LastName

    , SalesPerson.FirstName

    , SalesPerson.PhoneNumber

FROM

      SalesPerson

JOIN

    SalesPerson_OLD

    ON ISNULL(SalesPerson.LastName, '') = ISNULL(SalesPerson_OLD.LastName, '')

       AND ISNULL(SalesPerson.FirstName, '') = ISNULL(SalesPerson_OLD.FirstName, '')

       AND ISNULL(SalesPerson.PhoneNumber, '') = ISNULL(SalesPerson_OLD.PhoneNumber, '')

 

This was the solution I initially came up with in a real world problem, until Samir pointed me in the right direction.  The query gets the results we are expecting (both Aaron and Michael are matches), but it has a couple of major drawbacks:

 

1.       By using a function to convert the data, we are preventing the use of any indexes on these columns for the join.

2.       We would implicitly be making a business decision to treat NULLs as Empty Strings.  This could lead to false positives if one of the fields was a real empty string and the other started off as NULL.  False positives become even more problematic when dealing with numeric and other data types where you would have to treat NULLs as zeroes or some other arbitrary number.

 

The better option is to check for NULL values in the relevant columns without converting the values.  To do that, the query would look like this:

 

SELECT

      SalesPerson.LastName

    , SalesPerson.FirstName

    , SalesPerson.PhoneNumber

FROM

      SalesPerson

JOIN

    SalesPerson_OLD

    ON (

         SalesPerson.LastName = SalesPerson_OLD.LastName

         OR (SalesPerson.LastName IS NULL AND SalesPerson_OLD.LastName IS NULL)

       )

       AND (

             SalesPerson.FirstName = SalesPerson_OLD.FirstName

             OR (SalesPerson.FirstName IS NULL AND SalesPerson_OLD.FirstName IS NULL)

           )

       AND (

             SalesPerson.PhoneNumber = SalesPerson_OLD.PhoneNumber

             OR (SalesPerson.PhoneNumber IS NULL AND SalesPerson_OLD.PhoneNumber IS NULL)

           )

 

This is a lot of code, but only because it is explicit about how you want to use NULLs. For those of you that prefer EXISTS to JOINs for this type of query, you’ll run into the same problem.  You’ll have to modify the WHERE clause of the subquery in the same way we adjusted the JOIN criteria above. 

posted on Friday, October 27, 2006 6:26 PM
Feedback
No comments posted yet.
Title  
Name  
Url
Comments   
Protected by Clearscreen.SharpHIPIn order to prevent spam, please enter the code to post a comment.:
Michael Steinberg