There was an interesting thread on the SQLDownUnder mailing list last week about the use of Key Value Pairs in database design. One of the Guru's on the list Jeremy Huppatz made a few posts which covered off pretty much all the issues I was thinking about.

 

One of the key points that Jeremy made was this one:

 

Remember that relational databases are designed with 3 key goals - reliable data entry, stable consistent storage and rapid retrieval.  By using KVP's you throw the rapid retrieval capability straight out of the window.

 

Working mainly in the BI field might mean that I can have a tendency to focus more on the retrieval side of things, but I think it is good practice with any system to keep both the inputs and outputs in mind while building it.

 

Rather than paraphrasing Jeremy's post, I contacted him and asked if he would mind if I reproduced it here:

  • KVP's look elegant for data capture but they make reporting a nightmare.
  • If you're going to use KVP's for data capture, strongly consider running triggers or a second-phase ETL process to transform KVP data into an entity-based schema.
  • Try to avoid reporting against big heaps of KVPs.
  • If you have no other option, try to ensure that your KVP rows have as many additional lookup keys as possible to allow queries against your heaps to be heavily filterable on the KVP categories, subtypes, etc.

Also consider the fact that with KVP's you might effectively be building some optimizable mathematical structures such as Directed Acyclic Graphs or Bidirectional Acyclic Graphs, and consider the extra data-field considerations that may be required.

 

Before doing anything else, however, I'd strongly recommend doing a full logical model and looking at what common properties on your sets of objects suggest standard ERD implementations and relationships.  KVP's are most often used as a short-circuit to the data analysis process and are rarely an easy solution to support and maintain in the long term - especially when you start connecting enterprise systems together.

 

Not only did Jeremy post the above points, he then went on to work through a simple example of what happens when you try to report off a KVP structure.

 

One of the big problems with reports is that you end up building lots of logical subqueries when building reports.

 

e.g. Consider a fairly typical KVP table defined as follows:

 

CREATE TABLE KVP_Table (record_id dec (17,0) identity (1,1) NOT NULL PRIMARY KEY
, key_id int NOT
NULL
, category_id int NOT
NULL
, sub_category_id int NOT
NULL
, key_name nvarchar(60) NOT
NULL
, key_value nvarchar(max) NOT NULL)

 

Let's also assume that key_id, category_id and sub_category_id are foreign keys to lookup tables with appropriate category names that can be used for filtering, and that record_id is used as a foreign key by other tables that refer to our KVP heap.

 

Let's say for the sake of argument that our application is a medication administration database and we want to look for records where a specific medication specification has been administered.

 

So, when looking for something general like medication name like Ibuprofen, we need:

 

SELECT record_id
FROM
KVP_TABLE
WHERE key_name =
'Medication Name'
AND key_value = 'Ibuprofen'

 

Should come back pretty quickly - this is the trivial case.

 

Let's say we wanted to find out something using more than one filter.

With the table as specified above, we have no way to correlate KVP's with a broader context.  So... lets add an event_id column to the table (make it int NOT NULL) to allow correlation of medication administration events with the corresponding KVPs.

 

Then we might want to find a list of event ID's for which our medication matches a compound KVP definition for medication name, dosage level and administration route.

 

Let's say:

 

SELECT event_id
FROM
KVP_TABLE
WHERE key_name = 'Route' AND key_value =
'Oral: Capsule'
AND event_id in
(
      SELECT
event_id
      FROM
KVP_TABLE
      WHERE key_name = 'Dosage' AND key_value =
'50mg'
      AND event_id in
(
          SELECT
event_id
          FROM
KVP_TABLE
         WHERE key_name = 'Medication Name' AND key_value =
'Ibuprofen'
         )
     )
)

 

You can see where this starts getting messy.  This could also be implemented as a join as below:

 

SELECT e1.event_id
FROM
KVP_TABLE e1
    INNER JOIN KVP_TABLE e2 ON e1.event_id = e2.
event_ID
        INNER JOIN KVP_TABLE e3 ON e2.event_id = e2.event_id WHERE e1.key_name = 'Route'

    AND e1.key_value =
'Oral: Capsule'
    AND e2.key_name = 'Dosage'

    AND e2.key_value =
'50mg'
    AND e3.key_name = 'Medication Name'

    AND e3.key_value = 'Ibuprofen'

 

However, when you want records that match a "NOT IN" query specification (i.e. you want to screen out records that actually MATCH the fields specified - this is more common than you'd think) you end up with the JOIN query being no longer viable - the subquery becomes the optimal approach. 

 

Consider the impact of a single "NOT IN" clause in a report that has 5 levels of filtering on a table with 20 million rows.  What sort of response times would you expect to see?  IMO, you're kidding yourself if you'd expect anything under 40-50 seconds.  More realistic would be 2-3 minutes - if you're lucky.

 

Remember that relational databases are designed with 3 key goals - reliable data entry, stable consistent storage and rapid retrieval.  By using KVP's you throw the rapid retrieval capability straight out of the window.

 

If you're looking to deploy this in SQL Server 2005, you might have some joy using the native XML datatype to encode your key value pairs, but searching for them is still going to be painful, even if you use the native XML indexing available in the latest DBMS release.

 

Anyway - you get the idea... I thought it might be better to illustrate the problem than just to say "Don't do it."  Hopefully this gives you an idea of why the red pill might be a bad idea.

 

The other part of the thread that I also had an interest in came about when it was mentioned that reporting should not be too much of an issue as it was planned to do it all through OLAP cubes. Below was Jeremy's response to this proposal,

 

However, you still have the issue that a given logical record spans multiple rows in the KVP values table, and this makes filtering a pain.

From a cube perspective, many of your facts are going to be built as KVPs - you need an additional view that does a pivot summary of each of the KVP values relevant to a specific "event" into a fact table with the appropriate correlations to dimensional members.  From a query processing perspective, this rapidly becomes a nightmare.  Expect much longer cube-builds going down this route than using relational tables where the fact table doesn't have to be computationally derived.

 

With respect to cube dimensions, you're going to run into issues with dimension definitions as you're going to have to build a view defining each set of key-names to correlate the members for each level of your dimension.  This is doable, but is also fairly hard to maintain.  Adding a new property KVP to a record means doing a full process of your dimension and cube to reflect it in your reporting environment.

Relational data can speed up dimension and cube builds by several orders of magnitude, as you don't need to build an abstraction layer that defines the dimension memberships and levels required by each individual KVP.  You simply define your dimensions directly from the relational schema.

 

I understand the attraction of KVPs from a web developers standpoint, and from the perspective of someone wanting to do a lot of SET/GET-based OOP, but once you start doing more complex method-based coding (e.g.

complex business rule validation which might update multiple instance/class properties in a single invokation), the benefits of KVPs are quickly eroded.

 

This of course should not be taken to mean that you should never use KVPs. This post is merely meant to highlight the issues involved with KVPs so that you can weigh up the issues involved. KVP tables are often used for storing things like settings (which I have done myself) or for storing extension attributes (I understand Microsoft CRM employs KVP tables for this purpose.


Feedback

# re: Key Value Pairs in Database design

I pretty much use KVPs exclusively for settings on the database side of things. I use them in most of my classes, however, as a way of showing values in a dropdown list with the key being some field, or combination of fields, which uniquely describes the object and it's record ID as the value.

Using a single lookup table for all pairs as described is a management nightmare. My motto is keep to the basics and try not to violate basic normalized database concepts! 3/16/2006 10:53 PM | Brian

# re: Key Value Pairs in Database design

Yes, KVPs are great for settings. I use them for this myself.

I think database design, like all design activities is a matter of balance. In the case of KVPs, it is a matter of balancing the ease of input with output requirements. In some cases, like settings tables, the balance definitely shifts in favour of KVP tables. In terms of general data storage, my opinion is that the balance shifts the other way. 3/18/2006 6:09 AM | Darren Gosbell

# re: Key Value Pairs in Database design

Dear All,

I happen to read this article... I need to store data in the form


attribute name->attribute value.


the attribute name can be anything and the attribute value can also be arbitary depending on the action.

so how do i design the table...?? The option i could think of was key value concept....but then i wont be able to validate the attribute values that are entered???

any thoughts , suggestion are welcome

10/18/2007 5:42 PM | anita

# re: Key Value Pairs in Database design

The only reason that I post here is because this is one of the first sites that come up when I do searches for key-value pair views. I was looking for a good way to write a view or a select for a create table ... as select or an insert into ... select, or as a simple create view statement. After some study, I found that a simpler but possibly slower solution for a uid-key-value set table worked like the following in postgresql:

SELECT uid,
max(ud.user_name) user_name,
max(ud.user_age) user_age,
max(ud.user_email) user_email
FROM (SELECT uid,
CASE WHEN name='user_name' THEN value ELSE NULL END as user_name
CASE WHEN name='user_age' THEN value ELSE NULL END as user_age
CASE WHEN name='user_email' THEN value ELSE NULL END as user_email
FROM user_kvp_table) as ud
group by ud.uid;

With this solution, you only need to add two lines whenever you add an item to the view. I haven't worked out a way to make it find all the keys automatically, but this would allow for the data comparisons that you describe above in a simpler (although possibly slower) structure. 3/19/2009 4:05 AM | Paul Salcido

# re: Key Value Pairs in Database design

Yeah, that's a pretty classic "pivot" style query. It makes it a lot easier to create reports, but it's hard to optimize such queries as they have to scan the whole table. 3/19/2009 7:50 AM | Darren Gosbell

# re: Key Value Pairs in Database design

The secret of this nice statment is the record_id dec (17,0):

CREATE TABLE KVP_Table (record_id dec (17,0) identity (1,1) NOT NULL PRIMARY KEY
, key_id int NOT NULL
, category_id int NOT NULL
, sub_category_id int NOT NULL
, key_name nvarchar(60) NOT NULL
, key_value nvarchar(max) NOT NULL)
, key_level int NOT NULL


It can be used as attribute or as a measure...or a combination of the two:-)

Adding a key level allow you to act like a magnifying glass over a population for example....
by pairing key_level_category pattern...you don't need to prebuild the sub...subsub...subsubsub...think in advance :-)


, 0AAA0AAA111AAADFA~key_idC int NOT NULL
, 1AAB0AAA0AAA111DFcategory_idC int NOT NULL
, 2A2223AA0AAA0AAA0DAAsub_category_idC int NOT NULL
, NAAB0AAA3BBBkey_name nvarchar(60) NOT NULL
, K3BBBAANANABkey_value nvarchar(max) NOT NULL)
, key_level dec (13,6) NOT NULL

patern can be easily modelised in a logical key multi dimensional level attribute or measure connection...or in simple word : key value pairs 7/5/2012 6:49 AM | bi_addict

# re: Key Value Pairs in Database design

Nice that you'd write this on the day I started researching it.

I've heard a lot of hype about the joys of KVPs in regards to NoSQL. Without having done the research or being an expert on hardware (i.e. without having a clue what I'm talking about I'd like to say something (like most of the internet))
I had a think about it in principle rather than practice and decided...

1) It's more straightforward to distribute one big table around servers than a whole bunch of them.

2) It could take advantage of the fact that all the data for, say, a facebook photo album (I don't know what structure they actually use) will be stored together physically if it was created at the same time rather than bits of it in different tables with pages all round the place. That would mean that in principle I could see the album being faster to retrieve by KVP than by any entity based approach. I don't know a great deal of math so that could have been what you were getting at with "Directed Acyclic Graphs".

3) Basically everything you said regarding the problems trying to get data out in anything other than the granular format it was entered.

So nice to see that sanity check from someone that actually knows things. 4/2/2013 11:53 PM | Adam Misrahi

# re: Key Value Pairs in Database design

Appreciate the thread of discussion. I use KVP a lot while designing repositories or sections of applications that is intended to store Metadata. I absolutely feel that KVP is the best way to go about when it comes to capturing metadata, of course there is always set of views on top of these KVP objects to expose them in form of relational structures. 4/9/2013 1:00 PM | InfoHyper

Post a comment





 

 

News

About Me
I am a consultant, based in Melbourne Australia. I primarily work in the Business Intelligence area with SQL Server, although I also dabble in Content Management Server and .Net coding

Contact Me

MVP


Legal
Any and all code, software, examples, suggestions and anything else on this web site is available for you to use at your own risk. No warranty is expressed or implied.
Views and Opinions
The views and opinions expressed on this web site are not necessarily the views or opinions of my employer.

Subscribe in Bloglines Subscribe in NewsGator Online

Locations of visitors to this page

Twitter












Tag Cloud


Article Categories

Archives

Post Categories

.Net Blogs

Aussie Bloggers

BI Blogs

CMS Blogs

Syndication: