SELECT Statements

A technical blog about my projects, challenges, and discoveries in the world of data warehousing using SQL Server, Power BI Desktop, DevExpress, and more.

Wednesday, March 30, 2016

Geocoding on the Cheap (i.e. Free)

It has been a long time since I've been so excited about sharing a database technique on my blog as I am about this one. Geocoding enables a wide range of geographical reporting that is almost always the most whiz-bang part of any demonstrations of reporting output. But the data behind the flashy maps has been hard to come by, and it is usually considered too pricey for most small-to-medium sized users of databases because in most situations geocoding individual addresses requires a fee-based license agreement with Google, Bing, or some other source of geocoding information.

But I recently learned that if you are willing to compromise the precision of the data just a little and use approximate locations of your address information instead of the exact locations, you can do geocoding at completely no cost. That's right: free. The "trick" is that you have to be okay with using the geographic center of the ZIP code area as the location of each address in that ZIP code. According to http://www.mapszipcode.com the average ZIP code area is 90 square miles, with some much smaller than that and a few larger. If my math is correct, that means that the largest difference between the actual address and the center of the ZIP code (on average) is 6.7 miles. Most differences will be closer than that.

If that margin of error is within your tolerance for geographical reporting, read on.

Step One: Download a Free Data Source of ZIP Code Latitude and Longitude Information


Learning that there are free data sources of ZIP code lat/long information was the "aha" moment that opened up in my mind all the possibilities of doing geocoding reporting without expensive licensing agreements. I did some searching to find the best source and found the following links, which I provide for your review. There are good tips on these, and I won't waste space repeating what they say.

http://www.uszipcodes.com/zip-code-latitude-longitude-quality.htm
http://greatdata.com/resources
https://boutell.com/zipcodes/

The last one listed is the site from which I chose to download the data. It is in a simple comma-delimited file format, easy to work with.

Step Two: Prepare the Data Source for Import into SQL Server


What you have to do in this step will be determined by which data source file you download in the previous step and the method of importing the data you plan to use. The data from boutell.com is quote-and-comma delimited, which is not a format easily handled by the bulk import utility I used for the import. Also, it contains two time-zone columns that are not relevant to the geocoding process. So I opened up the source file in Excel, deleted the two columns of time-zone data, and then saved the file with a new name and in a basic CSV file format.

Be aware that there are a handful of cities that use an apostrophe in the name of the city. The bulk insert process I used handles that fine, but if you use a different method, you may need to locate those cities and either remove them or deal with them in a special way.

Step Three: Prepare the Database for the Import


This might be the trickiest step for some. To use the same bulk import utility that I did, you will need to create a brand new table to receive the data. If you use a different method for the import, you might not need to do that. If you make a table, it might be a temporary table (if you have another, more permanent table where the geocoding data will ultimately land) or it might be a permanent addition to your database. How you name and use the table will depend on whether it is temporary or permanent. In my production environment, this was just a temporary table.  Here is the script I used to make the table:

create table zcta (
 zip char(5) primary key,
 city varchar(64),
 state char(2),
 latitude float,
 longitude float );


Step Four: Import the Data


I chose to use a bulk insert routine executed from SQL Server Management Studio. Here is a blog post from another SQL Server user showing how simple the process is:

http://blog.sqlauthority.com/2012/06/20/sql-server-importing-csv-file-into-database-sql-in-sixty-seconds-018-video/.

Because of the data clean-up in Step Two, my process was pretty much identical to the one shown in the video, except for the file location and file name.

Now that you have every meaningful ZIP code in your database along with their latitude and longitude values, you can connect that lat/long information with the addresses in your database that have a ZIP code. Getting to this point is the major hurdle to geocoding. Once you get to this point, the rest is easy.

Step Five: Add a Geography Column


The lat/long information is essential, but it is not in a format that SQL Server can readily use for calculating distances. There is a special data type called "geography" that is specifically designed for this purpose. There is a lot you could learn about the geography data type, but here's all you need to know:

  • The "geography" data type was introduced in SQL Server 2008.
  • There are sub-types for "geography" values, but the only one you need is POINT.
  • A geography "point" is defined by longitude and latitude (in that order).
  • As a SQL CLR function, the functions related to the geography data type are case sensitive.
  • The easiest syntax for populating a geography point is this: geography::Parse('POINT (long lat)')
  • The geography data type can be declared for table columns, view columns, and SQL variables.
Note in that syntax example, Long and Lat should be replaced by string versions of the longitude and latitude values, respectively.

For my purposes, I wanted the geography data to be calculated for each ZIP code just once, so I added a computed column onto the permanent version of the ZIP code lookup table. Here is the code I used:

alter table [dbo].[ZIP Code Lookup] add [GeoLocation] as geography::Parse('POINT(' + convert(varchar(20), [Longitude]) + ' ' + convert(varchar(20), [Latitude]) + ')') persisted

The longitude and latitude values are stored in the float data type in the table, so they have to be converted to varchar in order to be concatenated into the geography function. The "persisted" key word at the end means that the value is calculated just once and then stored semi-permanently. It will only be re-calculated if either the longitude or latitude value changes in the lookup table.

Step Six: Use the Geography Column to Calculate Distance


Here is the payoff of all the work above. With the geography column representing the lat/long position of the address ZIP codes, you can calculate approximate distances between any two addresses with the STDistance function, just like this:

[Geography Point 1].STDistance([Geography Point 2])

This will return the distance in meters between the points. If you want miles, divide the result by 1609.344. In my formula, I also round the result to the nearest mile to reflect the fact that this is an approximation and not based on the exact locations.

Using this function, it would be possible to create a query that can return all the addresses that fall within a specified distance of a given location code, based on their ZIP codes. I have not yet created visualizations of this sort of thing, but that is certainly the next step. When I have done it, I'll post a follow-up with the steps for that process, picking up where this one leaves off.

[Edit] If you want to know more about the geography data type and its related geometry data type, a good overview can be had here: http://www.jasonfollas.com/blog/archive/2008/03/14/sql-server-2008-spatial-data-part-1.aspx.

Monday, March 7, 2016

Using an Operational DB Key as a "Fact" in Factless Fact Tables

As most of the readers of this blog know from other communication sources, I recently moved from Shelby Systems, Inc. to Custom Data Systems, Inc. One reason for my change is that I now have the opportunity to cut my teeth on a data warehouse project from start to finish, something that I have long wanted to do ever since learning about data warehousing at my first PASS Summit event back in 2010. At that event I bought the book The Data Warehouse Toolkit by Ralph Kimball and Margy Ross, and I have used it as a resource ever since. Indeed, it has been a constant source of help to me during these last couple of months as I have put the theory of data warehousing into practice at my new employer. On the topic of helpful books, I should also mention Star Schema: The Complete Reference by Christopher Adamson. I highly recommend both books for those just getting started with data warehouse. For those with some experience at star schema modeling, Adamson's book could still be quite helpful, as he presents a large number of practical examples of various challenges in star schema design as well as discussion of approaches quite different from the Kimball method.

For the pure design of the star schema data warehouse database, these resources never led me wrong. However, when it came to the point of implementing analysis on top of the database per se, I found that there was an omission that in retrospect I find quite surprising. It came to my attention as I tried to perform analysis of data from two separate dimensions linked via a so-called "factless" fact table using Power BI Desktop as the platform for the analysis. This problem may be unique to Power BI Desktop, but I doubt it. I believe it is a symptom of basing the output on the "tabular" model of analysis rather than on an analysis cube or on a single query.

The Problem

Simply put, the problem is this: unless a value from the fact table is included in the analysis in some way, the relationship between the dimensions via the intervening fact is ignored. I discovered the problem when I created a bar chart visualization based on the following three tables:

Individuals Dimension
Individuals Key (surrogate key)
Individual ID (operational db key)
First Name
Last Name
Member Status
Billing Classification
etc.

Dates Dimension
Dates Key (surrogate key)
Date
Calendar Year
Calendar Month
etc.

Daily Summary Fact (associates each individual with each date, for point-in-time analysis)
Individuals Key
Dates Key

After adding these three tables into Power BI and relating them together, I dropped [Dates].[Calendar Year] into the bar chart's "Axis" property, and I dropped [Individuals].[Individual ID] into the "Values" property and changed it to "Count Distinct" so that I could see how many members there were each year. I also added [Individuals].[Member Status] to the "Legend" property so that each bar would be subdivided by member type.

I was surprised that this yielded exactly the same numbers for the overall totals and for the shaded areas for every year in the output. That seemed odd even for my imaginary test data, so I started looking into the values in the operational database and in the data warehouse. After several ad hoc queries, I confirmed that the results in the graph were erroneous. And eventually I discovered that the numbers were the overall totals for all the data, not broken out by year. So the ultimate conclusion I came to was that the graph was ignoring the [Daily Summary] fact table that related individuals to dates and was, instead, showing all individuals for all dates.

Two Not-Quite Satisfactory Solutions


The standard approaches for doing counts on factless fact tables such as the [Daily Summary] table are:

  • Apply the COUNT() function to any of the surrogate key values in the fact table.
  • Add a "counter" column with a value of 1 in each row.
While these are fine for a basic count of values, neither works for COUNT DISTINCT. Obviously a "counter" column would return a COUNT DISTINCT value of 1, since every row has the exact same value. And the surrogate key of an individual is not a reliable value for COUNT DISTINCT because the same individual can have multiple surrogate keys, assuming the dimension captures changes over time (the so-called "slowly changing dimension"), which in the case of the [Individuals] dimension it certainly is.

All the examples from the Kimball book and the Adamson book are based on the assumption that the aggregations will be handled in a SQL query with an INNER JOIN between the dimension and "factless" fact tables. And in SQL of course the relationships are honored even if the SELECT clause does not have any columns from the fact table per se. But in Power BI and (probably) other analysis tools using the "tabular model" instead of single sets of query results, this is evidently not the case.

My Solution

The solution that worked for me is to include the operational database identity key directly into the fact table. This is never suggested by Kimball or Adamson (that I found in print anyway), but it resolves the problem by putting the one "dimensional" value that I would want to aggregate with COUNT DISTINCT into the fact table as a column that can be dropped into the Values property of any Power BI visualization. My revised [Daily Summary] fact now looks like this:

Daily Summary Fact
Individuals Key
Dates Key
Individual ID

Using [Daily Summary].[Individual ID] instead of [Individuals].[Individual ID] in my Power BI graph yields the expected and accurate results.

Thinking beyond the specific problem of "factless" fact tables, I can anticipate wanting to use COUNT DISTINCT on the [Individual ID] value for a variety of analysis contexts, even when the fact table might in fact contain more traditional facts such as monetary values or other additive quantities. So my current plan is to incorporate the operational key value in every relevant fact table, whether "factless" or not.

Conclusion

Even though literature on data warehousing with the star schema places the identity key of the operational database only in dimensions, I have discovered that it is useful and in some cases necessary to put it in fact tables too. From now on, that value goes in both places in my star schema designs. If you use Power BI Desktop, Excel Power Pivot, or any other self-service BI tool based on the tabular model, you might want to do the same and save yourself some headaches.

Wednesday, September 25, 2013

A First Glance at Prepared Queries or, as I think of them, "Temporary Stored Procedures"

Since my last post I have taken a new position as DBA / Business Intelligence Developer at Shelby Systems, Inc. In my new role I am still doing a lot of report development but also diving into the SQL used by our applications to pull data, looking for ways to improve their efficiency.

Needless to say, my posts here will reflect all the new things I'm learning about T-SQL and SQL Server in general beyond the usual reports-driven information I have acquired in my work. And this week I learned a brand new technique for handling queries: "preparing" them.

In the world of T-SQL the word "prepare" has a very specific definition, which is something like this: "load a query statement into memory and assign it a handle (ID) so that it can be executed repeatedly by means of the handle." Typically "prepared" queries are also "parameterized," meaning that the query requires one or more parameters to be defined with values before the query can return results.

There are four stored procedures to handle "prepared" queries, and the rest of this entry will explain each one.

sp_prepare is the command to use in order to "prepare" a query statement and define one or more parameters for it. It accepts three parameters itself.

The first parameter is an reference to a variable that has already been defined as an integer in an earlier DECLARE statement. This parameter must also be defined as an OUTPUT parameter, meaning that the stored procedure will assign the integer value to the variable.

The second parameter is an nvarchar value of one or more variable definitions. The string value for this parameter should be formatted just like a DECLARE statement but without the word DECLARE.

The third parameter is also an nvarchar value and contains the query to be prepared. Here is an example of using the sp_prepare stored procedure to prepare a query that pulls the members of a family from Shelby.NANames based on a provided FamNu value.

declare @FamilyMembers int;
exec sp_prepare @FamilyMembers output, N'@FamilyNumber int', N'select NameCounter, FirstMiddle, LastName, Birthdate from Shelby.NANames where FamNu = @FamilyNumber';


The @FamilyMembers variable will contain the integer assigned by the sp_prepare procedure as the "handle" for the prepared query. From this point in the SQL script, I can reference @FamilyMembers to execute the prepared query instead of typing it all out again. There is a special stored procedure for executing prepared queries, and that is discussed next.

sp_execute is the command to execute prepared queries. It accepts two parameters. The first parameter is a reference to the handle of the prepared query. Normally this takes the form of the variable used to define the prepared query in the sp_prepare procedure. The second parameter is a varchar list of the value(s) to be used for the parameter(s) in the prepared query. The following two lines of code will execute the prepared query from the example code above for two different families.

exec sp_execute @FamilyMembers, '2';
exec sp_execute @FamilyMembers, '5';

As you can see, after the query has been prepared, it is very easy to execute the query over and over with various parameter values. These two commands (sp_prepare and sp_execute) are so commonly found with the second immediately following the first that Microsoft added a stored procedure to do them both together.

sp_prepexec prepares a query and immediately executes it. It has the same three parameters as sp_prepare but adds a fourth parameter for the initial set of variable values for the execution of the prepared query. The following code shows how the same family member query can be prepared and executed all at once.

exec sp_prepexec @FamilyMembers output, N'@FamilyNumber int', N'select NameCounter, FirstMiddle, LastName, Birthdate from Shelby.NANames where FamNu = @FamilyNumber', '7';

With the advent of sp_prepexec, the sp_prepare stored procedure is hardly ever used. But sp_execute remains as the only way to re-execute a prepared query after the first time. And if you don't plan to execute the query more than once, "preparing" the query is not really worth the effort.

sp_unprepare is the final stored procedure for dealing with prepared queries. As its name states, this one removes a prepared query from memory. It has only one parameter, which is the handle of the prepared query to unprepare. As with sp_execute, this is normally the variable that was defined to capture the handle from the sp_prepare or sp_prepexec stored procedure. The final example is how to "clean up" the memory after the family members prepared query is no longer needed.

exec sp_unprepare @FamilyMembers;

And in case you want to run the whole script all together in SQL Server Management Studio, here is the complete script to parepre, execute, and remove the sample prepared query:

declare @FamilyMembers int;

exec sp_prepexec @FamilyMembers output, N'@FamilyNumber int', N'select NameCounter, FirstMiddle, LastName, Birthdate from Shelby.NANames where FamNu = @FamilyNumber', '7';

exec sp_execute @FamilyMembers, '2';

exec sp_execute @FamilyMembers, '5';

exec sp_unprepare @FamilyMembers;

Although prepared queries are not something I can imagine using for report designs, they would be helpful for applications that need to load the same columns of information over and over from the same table(s) with just one or two variables to define each time a user of the application loads. Of course, a permanently defined stored procedure would do just as well, if not slightly better. But if you can't define a permanent stored procedure, "preparing" a frequently-used query is the way to go.

Tuesday, January 29, 2013

Alternating Row Background Colors in a Tablix (aka the Green Bar effect)

UPDATE on 1/31/2013: Major overhaul of technique for formatting a matrix with the "green bar" effect. The new approach requires no pre-planning for the dataset query and uses scope designations that are agnostic to the specific names of the data region.

The "green bar" effect is a document is the alternating of one background color (usually white) with another background color (originally light green) on each printed line of a document in order to make it easier to read across each line of text. The term comes from the days of impact printers with continuous feed paper that was pre-printed with just such a scheme of alternating colors, and it remains a popular method of making long lists, particularly grids, easy to read. Here is an example:



The method of formatting SQL Server Reporting Services tables with a "green bar" effect is not difficult. Formatting a matrix is only a little more difficult.

Formatting a Table
  1. In design mode, highlight all the text boxes in the (Details) section. Use a click-and-drag method to select them all in one step or hold down the Ctrl key and click each one in turn.
  2. On the Properties pane click the down-arrow for the value of the BackgroundColor property, and then click Expression...
  3. Enter the following expression. If you want colors other than white and light green, substitute your own color choices.

    =IIf(RowNumber(Nothing) mod 2 = 0, "LightGreen", "White")
The Mod function applies "modulo division," which returns the remainder of the division. All even numbers have a remainder of zero when divided by 2, so the IIF() function will return "true" for even numbers and "false" for odd numbers, resulting in the two different colors being applied to the background of the text box. That's all there is to it for a table. A matrix is a little more involved.

Formatting a Matrix
  1. Create your matrix, and then right-click on the left-most column and select Insert Column > Inside Group - Left.
  2. On the bottom-left text box of the matrix, which was just created when you added the new column, right-click and select Text Box Properties...
  3. Change the Name property to RowNumber.
  4. Enter the following expression as the Value property:

     =RunningValue(Min(1), Sum, Nothing)
     
  5. Click OK to confirm the new settings for the text box.
  6. If you do not want the running count of rows to appear in your final matrix, click on the tablix column marker to select the entire column. Then use the Properties pane to change the Hidden property to True.
  7. Click on the text box that will show the aggregate data for your matrix (in an Excel PivotTable this would be called the "data region" cell).
  8. On the Properties pane click the drown-arrow for the value of the BackgroundColor property, and then click Expression...
  9. Enter the following expression.

    =IIf(ReportItems!RowNumber.Value mod 2 = 0, "LightGreen", "White")
If you want the row labels to alternate with the same colors, merely repeat the same formula for the background color of the row label text box(es) as well.

Tuesday, January 22, 2013

Check Database Size against SQL Server Express Size Limit

NOTE: This is a re-post of the same topic that is soon to be published in our SSTips forum of the Shelby Community site. I want to share the information with a wider audience than just the Shelby Systems, Inc. customer base, so I am posting it here too, with only minor edits for form and content. Also I want to acknowledge the contribution of Thomas LaRock, whose blog post on autogrowth settings provided the inspiration for some of the code in the query solution I give below.
If you use the free Express edition of SQL Server (like the one that is bundled with the Shelby v.5 installation software), then there is an upper limit on how large your database files can be. (SQL Server Standard Edition has an upper limit of 524 Petabytes -- effectively no limit at all -- but it is not free.) If your database reaches the limit of your SQL Server Express version, you will begin to experience errors due to the inability of the database tables to accept new data. It can be expensive and time-consuming to correct this problem once it occurs. Here is a brief chart of the Express versions and their size limits:
  • 2000 Desktop - 2 GB
  • 2005 Express  - 4 GB
  • 2008 Express  - 4 GB
  • 2008 R2 Express - 10 GB
  • 2012 Express - 10 GB
Run the query below in ShelbyQUERY or in SQL Server Management Studio (SSMS) to identify your SQL Server version and edition and also to let you know if you are reaching the upper limit of its capacity.  An explanation of the columns returned by this query follows the query statement.

select
 SQL_Server_Version = @@version,
 SQL_Server_Edition = serverproperty('Edition'),
 SQL_Server_Name = serverproperty('ServerName'),
 Database_Name = a.Name,
 Maximum_Database_Size_MB = case when serverproperty('EngineEdition') in (1, 4) then case when cast(serverproperty('ProductVersion') as nvarchar(128)) like '8.%' then '2048 MB' when cast(serverproperty('ProductVersion') as nvarchar(128)) like '9.%' then '4096 MB' when cast(serverproperty('ProductVersion') as nvarchar(128)) like '10.0%' then '4096 MB' else '10240 MB' end else 'Effectively no limit' end,
 Data_Size_MB = convert(decimal(12,2),round(fileproperty(a.name,'SpaceUsed')/128.000,2)),
 Available_Growth_MB = case when serverproperty('EngineEdition') in (1, 4) then case when cast(serverproperty('ProductVersion') as nvarchar(128)) like '8.%' then 2048 when cast(serverproperty('ProductVersion') as nvarchar(128)) like '9.%' then 4096 when cast(serverproperty('ProductVersion') as nvarchar(128)) like '10.0%' then 4096 else 10240 end end - convert(decimal(12,2),round(fileproperty(a.name,'SpaceUsed')/128.000,2))
from
 sys.sysfiles a
where
 a.name not like '%_log';



  • SQL_Server_Version - a verbose description of the version, edition, and other properties of your SQL Server environment
  • SQL_Server_Edition - the edition of SQL Server
  • SQL_Server_Name - the name (including instance) of SQL Server
  • Database_Name - the name of your database
  • Maximum_Database_Size_MB - for Express editions, the upper limit of the size of the database, measured in Megabytes; for Standard editions, "Effectively no limit"
  • Data_Size_MB - the amount of data stored in the database, measured in Megabytes. This is the critical number.  If this number ever equals the value shown for the Maximum_Database_Size_MB, the Shelby v.5 software will start throwing errors whenever new values are entered because there will be no room to store them in the database.
  • Available_Growth_MB - the difference between the maximum database size and the current data size; when this reaches zero, trouble will begin.

If you are a Shelby v.5 customer and you discover that you are about to max out your database size, there are some things you can do about it. Read the SSTip post in the Shelby Community (to be posted February 1, 2013) for suggestions.

Thursday, January 17, 2013

SSRS 2008 R2 Gotcha: Default Multi-valued Parameters

Recently a customer contacted me with a mysterious symptom in SQL Server Reporting Services (SSRS): a report displayed different results in VisualStudio than it did in Report Manager. He had verified the data source was the same for both contexts. He had verified that the RDL file was identical in both contexts. All the user-selected parameters were identical. And yet in Visual Studio certain values were included in the output that were not appearing when the exact same report ran in Report Manager. He had redeployed this report several times while troubleshooting, with no change in the output of the report in Report Manager.

After I connected to his system with a remote assistance session, I started drilling down through the report management options in Report Manager (report > Manage). I discovered that there were a couple of hidden parameters in the report design, and shortly after that I discovered that one of the hidden parameters was a multi-select parameter for which the Visual Studio report had two values selected by default and the Report Manager had only one.

Apparently the Report Manager context had cached the default setting for the parameter with one value only and redeploying the report did not change the setting. The only method we found that could fix the hidden parameter was to manually update it in the Report Manager context. Only then did the online report output match the offline report output in Visual Studio.

Later I tested the process by deploying a report that had a multi-valued parameter that used a query to select all the possible values by default. This parameter was not hidden. This report deployed normally and showed the parameter as having a default that was generated by query, the correct setting. But then I altered the report in Visual Studio to have a manually-defined default value instead of a query-based value. When I deployed the report and checked it again in Report Manager, the parameter had no default value at all. Changing the default value to a different manual value and redeploying had no effect; the online report continued to have no default value defined.

The bottom line appears to be that in SSRS 2008 R2 setting manual default values for a multi-valued parameter in Visual Studio is not a guarantee that the deployed report will have the same values set as defaults. The only way to be sure you have the correct manual default values for multi-valued parameters is to set them using the the Parameters property of the online report in Report Manager.

Monday, January 7, 2013

Repeating Headers in SSRS

Getting header rows to repeat in SQL Server Reporting Services (SSRS) is surprisingly tricky. This is due at least in part to the ambiguous use of the term "header" in the SSRS context. In some cases it refers to any row that is above the core data-driven section of a tablix item. In other cases it refers only to a group header row. Hopefully the explanations below will clarify things a bit.

NOTE: In all the examples to follow I will use a typical data source and a dataset, but I am only going to show screenshots of the tablix structure and options related to the topic of repeating headers. The actual data are moot to the process of getting the headers to repeat.

Before exploring the solutions for repeating headers, I will walk through steps to highlight the cause of much frustration with it comes to trying to make headers repeat. 

Step 1: Add a table to your report. Notice that the top row of the table is labeled "Header."

Step 2: Add whatever fields you want to the data row.
Step 3: Open the tablix properties of the table. Put a check mark next to "Repeat header rows on each page." Click OK.

Step 4: Run the report. Notice that the header rows do not repeat on each page!

This is the usually the beginning point of frustration with making header rows repeat. It has caused me (and probably others too) to shout at the computer, "Why aren't you doing what I'm telling you to do?!?!"

[Spoiler: "Repeat header rows on each page" doesn't really mean what it says. It means, "Repeat the header region of a row group on each page whenever the content is too wide to fit on one page and generates one or more extra pages." I guess that was too much verbiage for the programmers to fit.]

The following steps explain how to get Reporting Services to do what you want it to do and repeat the information you want repeated.

Repeat Rows or Columns for a Table with No Groups

A static row is any row that is outside of any data-driven section of the tablix. This means it is not the (Details) row and it is not inside any bracketed "grouping" section either. The tablix marker that appears to the left of the row is completely empty, as shown below.

Normally static rows do not appear as items in the Row Groups section at the bottom of the design screen, but to make them repeat we need to see them. So use the drop-down choice to the right of the Column Groups label and select "Advanced Mode" to turn it on.


Click on the (Static) item under the Row Groups pane that corresponds to the row you want to repeat. Then switch to the Properties pane (press F4 to open the Properties pane if it is not already open).

Change or verify these property values:

  • FixedData to True (to "freeze" the row in a web browser environment; this is optional)
  • KeepWithGroup to After (this may be the default, but it can be changed inadvertently)
  • RepeatOnNewPage to True


Now the row will repeat on each new page of the report output.

If there are too many columns to print on one page, you can also choose to repeat (and freeze) one or more columns on each page. Simply make the same change to the column's (Static) item under Column Groups that is shown above for the static row.

Repeat Rows and Columns in a Table with Row Groups

When you add a row group to a tablix, a vertical double-dashed line divides the table and a bracket appears on the tablix frame to show the rows within the group. Within the group bracket, the region to the left of the double-dashed line is the "row header" region. To the right of the double-dashed line is the details region and, if you added a row group header, a static row (technically only a partial row) above the details region. The diagram below shows each region with arrows pointing to the corresponding item in the Row Groups pane. As mentioned above, to see the (Static) items, the pane must be in Advanced mode.


All static rows, including the partial header row inside the group, can be repeated on each page using the technique explained above. The row header region will reprint the field value at the top of each page by default, so normally there is nothing you need to do with it. However, if you have too many columns to fit the page, you can also repeat the row header region on the pages with the excess horizontal content. To do this, open the tablix properties and put a check beside "Repeat header rows on each page." (Of course it is not "rows" that are repeated but the row header column. This is part of what makes the option unclear.) If you want the row header region to stay on the screen when the report user scrolls horizontally, also check the box to "Keep header visible while scrolling."


Repeats Rows and Columns in a Table with Column Groups

Like a row group, a column group in a table (or matrix) also has a double-dashed line dividing the details region below the line from the column header region above the line. Making static rows and columns (or the partial "header" segments that are also static) works just as described above for static items. Making the column header region repeat works like making the row header repeat: open the tablix properties and select "Repeat header columns on each page." (Again, it is not "columns" that repeat but the column header row. This just maintains the confusion.) When you are dealing with a matrix (which is really just a table with row groups and column groups together), this is normally the option you want to choose to make sure the columns are labelled on every page. But this is not intuitive because the wording leaves you to believe that columns will be repeating, not the header row. 

Summary

Hopefully this post will help you take control of the headers you want to repeat across the pages of your report, both in print on on the screen. Here is a brief list of points to take from the discussion:
  • The tablix properties for repeating headers refer only to group-header regions and have no effect on other rows or columns in the tablix.
  • The phrases "header rows" and "header columns" in the tablix properties screen are misleading and basically mean the opposite of what they seem.
  • To make non-group-header rows or columns repeat, first turn on the "Advanced" mode of the group pane at the bottom of the design screen, then set the properties of the corresponding (Static) item so that RepeatOnNewPage and FixedData are both true.

Followers