A technical blog about my projects, challenges, and discoveries in the world of data warehousing using SQL Server, Power BI Desktop, DevExpress, and more.

Tuesday, July 21, 2009

Existential Queries

Recently I was asked to come up with a query that would select names of people who have one kind of record in a given table but from that group omit people with a different kind of record in the same table.

For those familiar with the Shelby Systems v.5 software, I needed to pull people with a particular Profile code but of that group omit people with a different Profile code.

The only method I could find to meet both condition was to use the EXISTS( ) function. This function returns a boolean true/false value based on whether or not a specified subquery returns any results. If the subquery returns results, the value is true. If it does not, it is false. As with most functions like this, the value can be reversed by the key word NOT.

I had run across EXISTS( ) some time ago, and I have used it periodically to simplify some query conditions. I'm glad I had it in my arsenal, because it was the only way I found to solve the problem of "find everyone who has one Profile, but not if he has specific second Profile too."

Here is the basic approach I took to solving this problem with a query, simplified to a minimum number of column references:

select
NameCounter, FirstMiddle, LastName
from
Shelby.NANames
where
exists (select * from Shelby.NAProfiles where Shelby.NAProfiles.Profile = 'value1' and Shelby.NAProfiles.NameCounter = Shelby.NANames.NameCounter) and not exists (select * from Shelby.NAProfiles where Shelby.NAProfiles.Profile = 'value2' and Shelby.NAProfiles.NameCounter = Shelby.NANames.NameCounter)

In regular English, the query says to pull the names of those who have a Profile of 'value1' but who do not also have a Profile of 'value2.' There may be other ways to achieve this kind of result, but this one worked for me.

Notice that the subquery's SELECT clause uses the asterisk (*) to stand in for all columns. Because the subquery is only used as a test to see if even one row is returned, there is no need to be specific about which column to select; the asterisk works fine.

1 comment:

  1. Edward Eoff, a co-worker of mine at Shelby Systems, Inc., recently pointed out that the following statement will produce the same results as the query I gave in the blog post:

    select NameCounter, FirstMiddle, LastName
    from Shelby.NANames
    where NameCounter IN(select NameCounter from Shelby.NAProfiles where Shelby.NAProfiles.Profile = 'value1')
    and NameCounter NOT IN(select NameCounter from Shelby.NAProfiles where Shelby.NAProfiles.Profile = 'value2')

    This is indeed true; the two queries will return the same results. The difference is one of efficiency. The EXISTS() comparison subquery only has to run until it can identify one matching row of results, and then it stops. The IN() comparison subquery must run until it pulls back every row and only then makes the comparison. In small sets of data, the difference is negligible. In large sets of data, however, using EXISTS() can significantly speed up a query.

    ReplyDelete

Followers