The SQL Azure Federation had been publically launched several weeks ago and this is one of the most existing features I’m looking forward. This might be the first post of SQL Azure Federation, and hopefully not the last one.
Some Backgrounds
SQL Azure Federation was mentioned in about 2009. The Microsoft told that there will be a feature in SQL Azure allow users to split one database into many based on some sort of rules But from the client side perspective, user can interact their data as if in one database. This feature was called SQL Azure Federation. At the TechED China 2010, when I talked with Zach, a technical evangelist at Microsoft focus on Windows Azure platform when the SQL Azure Federation would be available, or even though CTP for MVPs or partners. But at that moment Zach said he still don’t have a chance to take a look at it. I remembered it was Dec, 2010.
But the good news came on June 2011, Microsoft had opened the nomination of the PE Program of SQL Azure Federation. And I was very lucky being approved to use this feature and provide some early feedback to Microsoft. During the PE program I had attended several online meetings and have the chance to play with it in some of my projects. Cihan Biyikoglu, the program manager in SQL Azure team and his group gave me a lot of information and suggestion on how the SQL Azure Federation works and how to use it in the best approach. During the PE program, the Microsoft said that SQL Azure Federation will be available at the end of 2011.
In 12th, Dec 2011 the SQL Azure Federation was launched with the SQL Azure Q4 2011 Service Release, with some other cool features, which you can have a reference here.
What’s (Data) Federation
Federation is not a new concept or technology in computer science. But it could be meaning differently in different context, such as WS-Federation, Active Directory Federation Services, etc. But in this, and the following blog posts about SQL Azure Federation, the word “Federation” I mentioned would be focus on the Data Federation.
Federation, as known as the data shard or partitioning, is a database design principle whereby rows of a database table are held separately. It also be known as horizontal partitioning. Assuming that we have a database and a table named Product which contains the product records, and now there’s 10,000 records there. After a while the number of the records raised to 10,000,000. If they are all in the same database it might cause performance problem. In this case, there are two solutions we can choose. One is to increase the hardware of the database server. For example, more CPU cores, more memory and higher network bandwidth. This approach we called Scale Up. But there will be a limitation in this way, we can not add cores, memory and much as we want.
Another one is to split the database across to multiple databases and servers. Let say we divide the database and Product table in 10 databases (servers), so in each database there will be only 1,000,000 records in Product table. We split the data volume across the multiple servers, as well as split the network load, CPU and memory usage. Furthermore, since we can have as many servers as we need, there will be no limitation to extend our system in this approach. This is called Scale Out.
SQL Azure Federation implemented this approach, which helps us to split one database into many that we called federation members, to increase the performance.
Horizontal Partitioning and Vertical Partitioning
Let’s deep into the tables in the databases to be federated. If a table was too big that introduced some performance issues, like the Product table I mentioned previously which has 10,000,000 records, we need to split them across to the databases. There are also two approaches we have, the horizontal partitioning and vertical partitioning.
Horizontal partitioning, likes you use a knife to cut the table horizontally, which means split the table by rows. The tables after the partitioning would be:
- Have the exactly same schema.
- One database command on a table would be the same on any other tables.
- Each record represent the full information.
- Can retrieve all information within one query.
- Need to touch all databases (partitioned tables) and aggressive process when fan-out query, like SUM, AVG, COUNT, etc..
Vertical partitioning means split the table by columns. The table after the partitioning would be:
- Each table would be in different schema.
- Query on each tables would be different. And may introduce some data redundant.
- Each record in a table just represent partial information.
- Easy to implement COD (Cost Oriented Design) by moving the columns in cheaper storage. For example moving the binary columns into Windows Azure Blob Storage.
- Need multiple queries when retrieve some information.
- Fan-out query normally can be finished within one query.
SQL Azure Federation utilize the horizontal partitioning to split the tables in multiple databases. But it’s not that simple as I mentioned above. When using horizontal partitioning, we need to firstly define the rule on how to divide the tables. In the picture above, it indicates that the table will be divided by ID, all records that ID less than 4 would into one database, and others (larger than 3) will be in another
But if we have some tables related, for example UserOrder table which have UserID as well, we need to split that table by the same rule, to make sure that all records in the tables that referred to the same UserID must be in the same partition. This will make the JOIN query quick and easy.
There are also some tables that doesn’t related with the ID in this example, for instance the countries, cities, etc.. When we partitioning the database, these tables should not be split and need to be copied to each databases.
The last thing is that, there might be some tables that represent the global information, like the system settings, metadata and schema data. They should not be split and should not be copied into the databases. So they will be remained in the original database we can call it root database.
Now we have a fully implementation on the horizontal partitioning. We have the rule on how the data should be split. We ensure that all related records will be stored in the same database node and the lookup tables will copied across them. We also have the root database with tables that have the global information stored. I can tell you that this is what SQL Azure Federation does for us, automatically and safely.
SQL Azure Federation Concepts
SQL Azure Federation introduces some new concepts around the data partitioning. They are federation, federation distribution, federation member, root database, federation column, federated table, reference table, center table and atomic unit.
Federation is the rule on how to partition our data. There can be more than one federations in one system. But on a particular table there has to be only one federation apply. This means, for example we have a table with columns UserID and ProductID. We can apply a federation that split the table by UserID, or by ProductID, but we cannot apply both of them on the same time.
A federation includes:
- Federation Name: The name of the federation which can be used when alter or connect.
- Federation Distribution Name: The identity name that to split the tables in this federation. For example if we want to split the tables based on the UserID then we can name the federation distribution name as “userId”, “uid” or whatever.
- Federation Distribution Data Type: The data type that the federation distribution name is. Currently the SQL Azure Federation only support int, bigint, uniqueidentifier and varbinary(n).
- Distribution Type: How SQL Azure Federation will split the data. There are many ways to split the data such as mod, consistent hashing but currently SQL Azure Federation only support “range”.
After we split database into many, based on the federation we specified, the small databases called Federation Member. The original database, may contains some metadata tables would be called Root Database or Federation Root. The tables that is being split into the federation members are Federated Table.
The tables that represent the lookup data can be copied to each federation members automatically by SQL Azure Federation, which is called Reference Table. The remaining tables in the federation root would be Center Tables.
As we discussed below, when horizontal partitioning the tables that related with the same split key (here is the federation distribution name) should be put into the same databases (federation members). For example if we move the record in Product table into federation member 1 if UserID = 3, then all records that UserID = 3 in the table ProductDetails should be moved in federation member 1 as well. In SQL Azure Federation, the group of the records that related to the same federation distribution value called Atomic Unit. This is very important and useful when using SQL Azure Federation which I will explain in the coming few posts.
Summary
In this post I covered some basic information about the data federation. I talked about the approaches that we can use to partitioning our data. I also described the different between horizontal partitioning and the the vertical partitioning, and the goal of horizontal partitioning. Finally I talked about the concept of SQL Azure Federation.
In the next post I will demonstrate how to create a database and use SQL Azure Federation, to split my original database into members.
Hope this helps,
Shaun