Poll Results: Foreign Key Constraints

A few weeks ago I did the following post asking people – if they used foreign key constraints in their star schemas.

The poll is still open if you are interested in adding to it, but here is what the chart looks like as of today. (at the bottom of the poll itself there is a link to the live results, unfortunately I cannot link the live results in here as the blogging platform blocks the required javascript)



Interestingly the results are fairly even. Of the 78 respondents, fractionally over half at least aim to start with referential integrity in their star schemas.

I did not want to influence the results by sharing my opinion, but my personal preference is to always aim to have foreign key constraints. But at the same time, I am pragmatic about it, I do have projects where for various reasons some constraints are not defined. And I also have other designs that I have inherited, where it would just be too much work to go back and add foreign key constraints. If you are going to implement foreign keys in your star schema, they really need to be there at the start.

In fact this poll was was the result of a feature request for BIDSHelper asking for a feature to check for null/missing foreign keys and I am entirely convinced that BIDS is the wrong place for this sort of functionality. BIDS is a design tool, your data needs to be constantly checked for consistency.

It's not that I think that it's impossible to get a design working without foreign key constraints, but I like the idea of failing as soon as possible if there is an error and enforcing foreign key constraints lets me "fail early" if there are constancy issues with my data.

By far the biggest concern with foreign keys is performance and I suppose I'm curious as to how often people actually measure and quantify this. I worked on a project a number of years ago that had very large data volumes and we did find that foreign key constraints did have a measurable impact, but what we did was to disable the constraints before loading the data, then enabled and checked them afterwards. This saved as time (although not as much as not having constraints at all), but still let us know early in the process if there were any consistency issues.

For the people that do not have consistent data, if you have ETL processes that you control that are building your star schema which you also control, then to be blunt you only have yourself to blame. It is the job of the ETL process to make the data consistent. There are techniques for handling situations like missing data as well as  early and late arriving data. Ralph Kimball's book – The Data Warehouse Toolkit goes through some design patterns for handling data consistency.

Having foreign key relationships can also help the relational engine to optimize queries as noted in this recent blog post by Boyan Penev

Print | posted on Sunday, May 23, 2010 9:44 PM