For the pure design of the star schema data warehouse database, these resources never led me wrong. However, when it came to the point of implementing analysis on top of the database per se, I found that there was an omission that in retrospect I find quite surprising. It came to my attention as I tried to perform analysis of data from two separate dimensions linked via a so-called "factless" fact table using Power BI Desktop as the platform for the analysis. This problem may be unique to Power BI Desktop, but I doubt it. I believe it is a symptom of basing the output on the "tabular" model of analysis rather than on an analysis cube or on a single query.
The ProblemSimply put, the problem is this: unless a value from the fact table is included in the analysis in some way, the relationship between the dimensions via the intervening fact is ignored. I discovered the problem when I created a bar chart visualization based on the following three tables:
Individuals Key (surrogate key)
Individual ID (operational db key)
Dates Key (surrogate key)
Daily Summary Fact (associates each individual with each date, for point-in-time analysis)
After adding these three tables into Power BI and relating them together, I dropped [Dates].[Calendar Year] into the bar chart's "Axis" property, and I dropped [Individuals].[Individual ID] into the "Values" property and changed it to "Count Distinct" so that I could see how many members there were each year. I also added [Individuals].[Member Status] to the "Legend" property so that each bar would be subdivided by member type.
I was surprised that this yielded exactly the same numbers for the overall totals and for the shaded areas for every year in the output. That seemed odd even for my imaginary test data, so I started looking into the values in the operational database and in the data warehouse. After several ad hoc queries, I confirmed that the results in the graph were erroneous. And eventually I discovered that the numbers were the overall totals for all the data, not broken out by year. So the ultimate conclusion I came to was that the graph was ignoring the [Daily Summary] fact table that related individuals to dates and was, instead, showing all individuals for all dates.
Two Not-Quite Satisfactory Solutions
- Apply the COUNT() function to any of the surrogate key values in the fact table.
- Add a "counter" column with a value of 1 in each row.
All the examples from the Kimball book and the Adamson book are based on the assumption that the aggregations will be handled in a SQL query with an INNER JOIN between the dimension and "factless" fact tables. And in SQL of course the relationships are honored even if the SELECT clause does not have any columns from the fact table per se. But in Power BI and (probably) other analysis tools using the "tabular model" instead of single sets of query results, this is evidently not the case.
My SolutionThe solution that worked for me is to include the operational database identity key directly into the fact table. This is never suggested by Kimball or Adamson (that I found in print anyway), but it resolves the problem by putting the one "dimensional" value that I would want to aggregate with COUNT DISTINCT into the fact table as a column that can be dropped into the Values property of any Power BI visualization. My revised [Daily Summary] fact now looks like this:
Daily Summary Fact
Using [Daily Summary].[Individual ID] instead of [Individuals].[Individual ID] in my Power BI graph yields the expected and accurate results.
Thinking beyond the specific problem of "factless" fact tables, I can anticipate wanting to use COUNT DISTINCT on the [Individual ID] value for a variety of analysis contexts, even when the fact table might in fact contain more traditional facts such as monetary values or other additive quantities. So my current plan is to incorporate the operational key value in every relevant fact table, whether "factless" or not.