[image]
[ /ScriptResource.axd?d=C4PBrmmo86HA_hfvdCAfVHq4Frs0vcLzhA7kJMOE_IiRhuApEBheb7nhkHkfAqz50 ]

Queries Based Upon The Absence of Data

Aaron Alton, March 20, 2008

It is a fairly common requirement to base the results of a query upon the absence of data in a specific object. For example:

I need to copy records from one table to another, but only if the rows do not already exist in the destination table.
I need to report on all customers who have not placed sales orders in the last six months.
I need to show all invoices which are missing service charge lines.
A modified version of the second example will be used for the code examples within this Wiki Page (modified to account for the transaction dates in AdventureWorks).

Whether the target rows are qualified by the absence of data in the same table, or in a different table, there are four commonly accepted query strategies:
EXCEPT Queries (SQL 2005 and Greater)
Correlated Subqueries
Left-Anti-Semi Joins
NOT IN Queries


Except
The Except Operator, available in SQL Server 2005 or newer versions, excludes rows from the "Left" query which also exist in the "Right" query. The general format of this type of query is:

--LEFT SIDE
SELECT <Column List>
FROM MyTable
EXCEPT
--RIGHT SIDE
SELECT <Column List>
FROM MyOtherTable

The EXCEPT operator demands that the number of columns in the "Left" query matches the number of colums in the "Right". It also (naturally) requires that the datatypes of columns with the same ordinal in both queries match (ex. Column1 in Query1 is compatable with Column1 in Query2).

The following example uses AdventureWorks to demonstrate the EXCEPT operator:

USE AdventureWorks
GO
--Left Query
SELECT  c.CustomerID 
FROM    Sales.Customer AS c
EXCEPT
--Right Query
SELECT  sh.CustomerID 
FROM    Sales.SalesOrderHeader AS sh
WHERE   OrderDate BETWEEN '2004-01-01' AND GETDATE()

Limitations: As the EXCEPT operator requires that the column list from both queries match, it is not always the simplest solution for the type of example cited above. Usually, multiple columns need to be included in the result set, which requires the use of the EXCEPT block as a filter condition for the "overall" query.


Correlated Subqueries
Correlated subqueries involve two queries which are mutually dependent. The values from the "outer" query feed into the "inner" query, which in turn determines the results of the "outer" query. Correlated subqueries have many applications outside of the scope of this page. For more information see the links at the bottom of this page. One (of many) general examples of this query is as follows:

SELECT <Column List>
FROM MyTable AS a
WHERE NOT EXISTS     (SELECT    Column1
                          FROM  MyOtherTable AS b
                          WHERE a.KeyColumn = b.KeyColumn)

The following example uses AdventureWorks to demonstrate a correlated subquery:

USE AdventureWorks
GO
SELECT  c.CustomerID 
FROM    Sales.Customer AS c
WHERE NOT EXISTS
                (       SELECT  sh.CustomerID 
                        FROM    Sales.SalesOrderHeader AS sh
                        WHERE   c.CustomerID = sh.CustomerID
                        AND             sh.OrderDate BETWEEN '2004-01-01' AND GETDATE())

Caution: Under some circumstances, a correlated subquery can result in poor query performance (the solution is less of a pure set solution than the other examples on this page. As with any query, test each option and analyze the query plans under realistic test circumstances prior to deciding on a strategy.


Left Anti Semi Joins

A Left Anti Semi Join is a LEFT OUTER join which is used to determine the absence of a record in the JOINed table. A general example follows:

SELECT <Column List>
FROM MyTable AS a
LEFT OUTER JOIN MyTable AS b ON a.KeyColumn = b.KeyColumn 
WHERE b.KeyColumn IS NULL

Because of the "AND b.KeyColumn IS NULL" provision, the LEFT OUTER JOIN effectively filters rows where a.KeyColumn matches b.KeyColumn (where a match exists).

The following example uses AdventureWorks to demonstrate a Left Anti Semi Join:

USE AdventureWorks
GO
SELECT  c.CustomerID 
FROM    Sales.Customer AS c
LEFT OUTER JOIN (SELECT sh.CustomerID
                        FROM    Sales.SalesOrderHeader AS sh 
                        WHERE   sh.OrderDate BETWEEN '2004-01-01' AND GETDATE()) AS b ON b.CustomerID = c.CustomerID 
WHERE b.CustomerID IS NULL



NOT IN Queries
The NOT IN query compares the value in a column from the primary query to a column list derived from another query. The latter query creates a list of the values which are to be eliminated in the former query, and the NOT IN clause performs the filtering. A general example follows:

SELECT <Column List>
FROM    MyTable AS a
WHERE a.KeyColumn NOT IN
(SELECT b.KeyColumn
FROM     MyOtherTable AS b
WHERE  <WHERE Clause>)

The following example uses AdventureWorks to demonstrate a NOT IN Query:

USE AdventureWorks
GO
SELECT  c.CustomerID 
FROM    Sales.Customer AS c
WHERE c.CustomerID NOT IN
(SELECT sh.CustomerID
                        FROM    Sales.SalesOrderHeader AS sh 
                        WHERE   sh.OrderDate BETWEEN '2004-01-01' AND GETDATE())


Which Method Do I Use?
There is no singular answer. Where the efficiency of the query is critical, write it in each format, and test, test, test! As data volumes change, the optimal query method may also change.
Last edited Apr 16 at 3:26 AM by AaronAlton, version 6
Updating...
Page view tracker


You are viewing a mobilized version of this site...
View original page here

Mobilized by Mowser Mowser