A technical blog about my projects, challenges, and discoveries in the world of data warehousing using SQL Server, Power BI Desktop, DevExpress, and more.

Thursday, April 12, 2012

PowerPivot DAX Language Gotchas

Recently I was working on a handout for the International Shelby Conference coming up in June. The topic for the session is PowerPivot, a new way of accessing data to underlay PivotTables and PivotCharts. I have used PowerPivot very occasionally since its release a few years ago, but I had never utilized the "Data Analysis Expression" language, aka DAX, that is part of the tool. DAX is a formula language akin to the formula language that is part of a basic Excel worksheet. But it has many new functions that are designed to apply to entire tables and to columns of tables, rather than just to individual cells and groups of cells. I wanted to include a couple of examples of the DAX language in my presentation, so I dove in. Almost immediately I hit some snags and got errors and unexpected results. I eventually worked through the issues I was having, and I learned some things about DAX in the process. So here is a summary of what I learned in my first foray into the world of DAX.

  • There are two places to create DAX-calculated output for the PivotTable: calculated columns added to the source tables in the PowerPivot window and measures added directly to the PivotTable in the Excel window. 
The first one I knew about already. Whenever you load a table into the PowerPivot window, there is an "Add New Column" spot to the right of the existing columns in the table. Just give the column a name and add the DAX formula, and off you go. The new column will act like a native column of data whenever you go to the PivotTable (or PivotChart) report. Where this was a "gotcha" was my lack of awareness of measures. I was frustrated for awhile when I was trying to use DAX formulas in calculated columns that only work properly as measures.

Some of the most powerful analysis solutions using DAX are in measures, not in calculated columns. This is because the formulas in measures can transcend the natural "filters" of the PivotTable to show data beside or beyond the context of the position in the PivotTable. As one example, the PREVIOUSYEAR() function in DAX can allow a cell in a PivotTable to show data from the year prior to the one active for a given cell. Although the same data can be achieved through the design of the PivotTable, with DAX measures the user has more control over how the data will appear in the results.
  • There are no explicit conversion functions in DAX.
If the system cannot make an implicit conversion of the data, you're stuck. This is a key bit of information to keep in mind as you think about calculated columns and measures. If you want to do some kind of manipulation of the data that requires a particular data type, make sure the underlying data in the source tables are of the correct type before you get to the DAX context. This probably means using a SQL query to "clean" the data in a view before you import it into PowerPivot.
  • The filter parameters used with some DAX functions can implicitly "ignore" the context defined by row and column position inside the PivotTable, but they cannot implicitly ignore the context defined by slicers or report filter selections. 
This was the biggest "gotcha" for me once I started trying to define measures on my own. For an example, suppose I have a PivotTable with no slicers or report filters. This table has rows grouped by CalendarYear. I then define a measure that uses the PREVIOUSYEAR() function without any special filter parameter. The regular SUM() for each row will show the aggregate for the CalendarYear value of that row and the measure will show the SUM() for the prior CalendarYear value. On the other hand, suppose I have a PivotTable with rows grouped by CalendarYear AND a slicer to allow the user to select one or more CalendarYear values. As soon as the user selects one particular CalendarYear value, the measure with PREVIOUSYEAR() will be blank. The slicer choice filters the data in such a way that the PREVIOUSYEAR() function has no data to work with in that context. The only way around this is to add an ALL() filter parameter to the PREVIOUSYEAR() function. By adding ALL(CalendarYear) to the function's filter parameters, the measure is able to explicitly ignore the CalendarYear slicer filter and locate the rows of data it needs for the measure to work.
  • There is no NULL in DAX. 
Instead, DAX identifies NULL, empty, and 0 values all as "blank" values, represented in the DAX language as BLANK(). Instead of using ISNULL() to determine if a value is null, as you would in VisualBasic, the command in DAX is ISBLANK().

As I use PowerPivot and DAX more in the future, I'm sure I'll run across more gotchas and tips, and I'll share them in my blog. If you have a tip to share, please share it in a comment on this post. We'll learn together!