A technical blog about my projects, challenges, and discoveries in the world of data warehousing using SQL Server, Power BI Desktop, DevExpress, and more.

Wednesday, August 19, 2009

Querying the Schema

Since I didn't get to post last week, I'll post twice this week to make up for it.

Today I was presented with a problem in which a customer wanted to know which tables had been updated by a particular person in a particular time frame.

Immediately I knew that he would need to find out the values of the WhenUpdated and WhoUpdated columns, which are found in many of the tables of the Shelby Systems v5 database schema. The problem was that there was no way of predicting exactly which tables had been affected in the given time frame. In addition, there was no way of know exactly which tables existed in the customer's database, at least not without doing some digging or getting a copy of a full backup of the customer's data. Both of these options would take more time than was really called for.

What I needed was a query that would let me get a list of the tables that contained one or both of the columns I was interested in finding. This is called a "meta-query" because it queries the database schema instead of the regular data storage tables. Of course, this being SQL Server, even the schema is stored in tables. Specifically, the information about the columns in the tables is stored in a table called information_schema.columns.

I whipped out the SQL Cookbook by Anthony Molinaro, which is a superb resource for pushing the ordinary limits of SQL code, and I quickly found the entry on querying the schema to list column names in a given table. I simply reversed the conditions of the query to list the tables that contain a given column. The result was this:

select distinct
table_name
from
information_schema.columns
where
table_schema = 'Shelby' and column_name in ('WhoUpdated', 'WhenUpdated') and table_name not like 'vw_%'
order by
table_name

Running this code gives a list of all the tables that have either WhoUpdated or WhenUpdated. Because all of the tables that have one also have the other, the column_name condition is a bit redundant, but it shows how you can search for matches on one or more column names.

When the customer runs this query, it will list for him only the tables that are in his database installation, so it saves me having to figure out which tables he has and which he doesn't. Once he has the list of tables, it is a matter of running a query on each table to find a match on the date or the name of the last update information.

What is also great about this query is that it can be run straight from the built-in ShelbyQUERY module. There is no need to use the SQL Server Management Studio or any other outside querying mechanism.

P.S. If you're wondering, yes, there is an information_schema.tables table that stores information about the tables per se.

No comments:

Post a Comment

Followers