A technical blog about my projects, challenges, and discoveries in the world of data warehousing using SQL Server, Power BI Desktop, DevExpress, and more.

Sunday, December 2, 2012

SQL Saturday Coming Soon!

In just six days I'll be presenting at SQL Saturday #173 in Washington, DC.  If you are in the area and can make it, I hope to see you in my session on avoiding SQL Server Reporting Services annoyances. SQL Saturdays are free events and are the best one-day SQL Server training events I know.

SQL Saturday #173 will have two tracks on Business Intelligence and Analysis Services, two on DBA topics, one for T-SQL developers, and one on personal development. The agenda is packed with great topics. For example, right after my session is over I'm heading over to hear about advanced charting techniques in SSRS presented by Jason Thomas.

Not only are the sessions rich in information, but so are the door prizes. Last year I won the book SQL Server Reporting Services Recipes, and I've used it many times since. Crossing my fingers for an equally good prize this year!

Even if you can't make it up to DC this coming Saturday, consider coming to SQL Saturday #187, which is right here in Richmond, VA! This will be our second SQL Saturday in the city, and the Richmond SQL Server User Group is psyched to bring this event to Richmond again. Jessica Moss and Kevin Kline have already signed up as speakers, and that's just the beginning of what should be a powerful slate of presenters.

Hope to see you Saturday!

Sunday, November 11, 2012

Top 10 Things I Learned While NOT at PASS 2012

Last week was the PASS Summit in Seattle, Washington. I was unable to attend this year, but I watched the keynote addresses on PASS TV, a live streaming video feed from the conference. Normally I post a "Top 10 Things I Learned" from each day of the Summit. Because I didn't go this year, I didn't do this. But that is not to say I learned nothing last week. In fact, I spent last week developing my first functioning "data mart" for one key area of the Shelby v.5 application: GlobaFILE. GlobaFILE is the core repository of names and name-related data. It does not have any true data warehouse style "facts" in it, such as financial transactions, but it does serve as the basis of core name content for all other modules apart from the General Ledger. The biggest bonus to creating this data mart is the ability to capture changes over time, which is by itself a pretty big win for historical analysis.

While I learned last week is a combination of aspects of SSIS, T-SQL, and news from PASS during the keynote addresses. Without further ado, here are my top 10, in no particular order:

1. SSIS: I learned that most of the expressions used in SSIS are based on C# syntax rather than on SQL or VB.NET. This was a frustration because it is yet another language syntax I will need to learn in order to work effectively in SSIS.

2. SSIS: Most of the data transformation I need to do from my OLTP database to the as-yet-in-design OLAP database cannot be done with the basic SSIS data tasks. I would need to use C# conditional statements and VB.NET scripts together to achieve the result. But by the same token, I can do most of the data transformation with T-SQL in queries. So right now I'm not certain if SSIS is really an essential part of the development of a Shelby v.5 data warehouse solution, especially in light of the next Top 10 item.

3. T-SQL: The MERGE command in T-SQL can populate slowly changing dimensions and facts in the data mart. I am glad that I already knew how to create Common Table Expressions prior to attempting to populate my data mart database using MERGE. The CTEs are where the data extraction and transformation take place, and the MERGE statement itself is where the data are loaded into the data mart tables.

4. T-SQL: Define the surrogate key column of a fact or dimension table with the "identity" property so that it will auto-increment a new value for every new row. For a dimension table, specify 0 as the first value for the "identity" assignment, thus enabling a manual INSERT query to populate the "N/A" row.

5. T-SQL: Define a "trigger" on a dimension table in order to automatically assign a "RowEnd" date to any row that is supplanted with a new row with the identical business key.

6. T-SQL: Use calculated columns to make explicit some data that are stored implicitly in other columns but which will make the end-user's experience much easier to have at hand, such as "age" and "RowCurrent."

7. SSIS: Unless I'm missing something, deploying solutions from SSIS to a production server context is much more difficult than deploying reports in SSRS to the Report Manager. This is another reason that I am currently preferring using MERGE over SSIS.

8. KEYNOTE ANNOUNCEMENT: Microsoft is in development of an enhancement to SQL Server 2012 called "Hekaton" that will optimize selected tables and procedures so that they run completely in memory, yielding improvements in efficiency of one to two orders of magnitude.

9. KEYNOTE ANNOUNCEMENT: Excel 2013 includes a completely stand-alone instance of Power View visualization reporting. This is huge. This means that some reporting options that were reserved only for owners of SQL Server Enterprise AND SharePoint Enterprise are not available to any owner of Excel 2013. I found Office 2013 available on TechNet downloads, and I can't wait to get it installed and make sure I understood the announcement correctly. If I did, this could be a game changer for reporting at all levels, including for the churches who use Shelby v.5 and Microsoft Office.

10. BONUS: It pays to tweet. As part of a Twitter contest sponsored by PASS, I tweeted what I liked best about PASS Summit 2012 (which was the PASS TV live feed of certain sessions of the Summit). And I was one of two winners of the contest! :-) I don't yet know what I won, but I'm hoping for the USB key with videos of all the sessions of the Summit on it. It's a long shot, since that's listed as a "grand prize," but I'm keeping my fingers crossed until I hear something, hopefully this week!

Next year PASS will be in Charlotte, North Carolina, and I'm definitely planning on going next year. My next PASS event is coming up much sooner: I'm speaking next month at SQL Saturday in Washington, D.C. on December 8th. Hope to see you there!

Thursday, September 27, 2012

Bit and Sum() in Reporting Services

The bit data type is normally used to record Boolean true/false data, with a 0 for false and a 1 for true. It might also represent yes/no data, with a 0 for no and a 1 for yes. When it comes to reporting on this type of data, it is normally represented with a checkbox or a word. It is not normally an additive value.

But consider the circumstance of an attendance history table with a bit column for "attended" paired with a column for a "person_id" and a "date." In a report it would be logical to add up the "attended" column values to arrive at a "times attended" total. And that is exactly what I recently tried to do, only to discover what I am about to share.

In SQL Server Management Studio (SSMS), the bit data type is represented in the results display by a 0 or a 1. If you try to put a bit column into a SUM() aggregate function, you will receive an error message that lets you know that the bit type is invalid for the SUM() operator. But that is not a big problem because you can CAST the data type into tinyint, and then it adds up to yield a positive number that is the sum of all the 1s and 0s in the included set of rows.

By contrast, in Business Intelligence Development Studio (aka BIDS - Visual Studio optimized for Reporting Services), the bit data type is represented by the words True and False. Just like in SSMS, if you try to put a bit data type in a Sum() function, you will receive an error. Once again this seems like no big deal because you can use CInt() to convert the data type to an integer.

But here's the rub: BIDS considers a "true" bit to be a -1, not a 1. That means the Sum() of a converted bit data type will be a negative number that is the sum of -1s and 0s in the included set of rows.

I discovered this when my formula for "percent attended" was yielding negative values even though, when I checked the numbers in SSMS, all I saw were positive numbers. It took a little digging to figure out that SSMS and BIDS where interpreting "True" in equal but opposite ways (1 and -1 respectively).

So the conclusion is this: if you are going to use a bit data type as an additive value in BIDS and Reporting Services, convert it first in the SQL statement to a tinyint data type. That will keep everything positive.

Thursday, April 12, 2012

PowerPivot DAX Language Gotchas

Recently I was working on a handout for the International Shelby Conference coming up in June. The topic for the session is PowerPivot, a new way of accessing data to underlay PivotTables and PivotCharts. I have used PowerPivot very occasionally since its release a few years ago, but I had never utilized the "Data Analysis Expression" language, aka DAX, that is part of the tool. DAX is a formula language akin to the formula language that is part of a basic Excel worksheet. But it has many new functions that are designed to apply to entire tables and to columns of tables, rather than just to individual cells and groups of cells. I wanted to include a couple of examples of the DAX language in my presentation, so I dove in. Almost immediately I hit some snags and got errors and unexpected results. I eventually worked through the issues I was having, and I learned some things about DAX in the process. So here is a summary of what I learned in my first foray into the world of DAX.

  • There are two places to create DAX-calculated output for the PivotTable: calculated columns added to the source tables in the PowerPivot window and measures added directly to the PivotTable in the Excel window. 
The first one I knew about already. Whenever you load a table into the PowerPivot window, there is an "Add New Column" spot to the right of the existing columns in the table. Just give the column a name and add the DAX formula, and off you go. The new column will act like a native column of data whenever you go to the PivotTable (or PivotChart) report. Where this was a "gotcha" was my lack of awareness of measures. I was frustrated for awhile when I was trying to use DAX formulas in calculated columns that only work properly as measures.

Some of the most powerful analysis solutions using DAX are in measures, not in calculated columns. This is because the formulas in measures can transcend the natural "filters" of the PivotTable to show data beside or beyond the context of the position in the PivotTable. As one example, the PREVIOUSYEAR() function in DAX can allow a cell in a PivotTable to show data from the year prior to the one active for a given cell. Although the same data can be achieved through the design of the PivotTable, with DAX measures the user has more control over how the data will appear in the results.
  • There are no explicit conversion functions in DAX.
If the system cannot make an implicit conversion of the data, you're stuck. This is a key bit of information to keep in mind as you think about calculated columns and measures. If you want to do some kind of manipulation of the data that requires a particular data type, make sure the underlying data in the source tables are of the correct type before you get to the DAX context. This probably means using a SQL query to "clean" the data in a view before you import it into PowerPivot.
  • The filter parameters used with some DAX functions can implicitly "ignore" the context defined by row and column position inside the PivotTable, but they cannot implicitly ignore the context defined by slicers or report filter selections. 
This was the biggest "gotcha" for me once I started trying to define measures on my own. For an example, suppose I have a PivotTable with no slicers or report filters. This table has rows grouped by CalendarYear. I then define a measure that uses the PREVIOUSYEAR() function without any special filter parameter. The regular SUM() for each row will show the aggregate for the CalendarYear value of that row and the measure will show the SUM() for the prior CalendarYear value. On the other hand, suppose I have a PivotTable with rows grouped by CalendarYear AND a slicer to allow the user to select one or more CalendarYear values. As soon as the user selects one particular CalendarYear value, the measure with PREVIOUSYEAR() will be blank. The slicer choice filters the data in such a way that the PREVIOUSYEAR() function has no data to work with in that context. The only way around this is to add an ALL() filter parameter to the PREVIOUSYEAR() function. By adding ALL(CalendarYear) to the function's filter parameters, the measure is able to explicitly ignore the CalendarYear slicer filter and locate the rows of data it needs for the measure to work.
  • There is no NULL in DAX. 
Instead, DAX identifies NULL, empty, and 0 values all as "blank" values, represented in the DAX language as BLANK(). Instead of using ISNULL() to determine if a value is null, as you would in VisualBasic, the command in DAX is ISBLANK().

As I use PowerPivot and DAX more in the future, I'm sure I'll run across more gotchas and tips, and I'll share them in my blog. If you have a tip to share, please share it in a comment on this post. We'll learn together!