Ways To Load Data In SQL Azure

This blog provides links to a presentation and a sample application that shows how to load data in SQL Azure using different techniques and tools. The presentation compares the following techniques: INSERT, BCP, INSERT BULK, SQLBulkCopy and a few tools like SSIS and Enzo Data Copy.

The presentation contains load tests performed from a local machine with 4 CPUs, using 8 threads (leveraging the Task Parallel Library), to a SQL Azure database using the code provided below. The test loads 40,000 records. Note that however the test was not conducted from a highly controlled environment, so results may vary. However enough differences were found to show a trend and demonstrate the speed of the SQLBulkCopy API versus other programmatic techniques.

You can download the entire presentation deck here: presentation (PDF)

image

Enzo Data Copy and SSIS

The presentation deck also shows that the Enzo Data Copy wizard loads data efficiently with large tables. However it performs slower for very small databases. The reason the Enzo Data Copy is fast with larger databases is due to its internal chunking algorithm and highly tuned parallel insert operations tailored for SQL Azure. In addition, Enzo Data Copy is designed to retry failed operations that could be the result of network connection issues or throttling; this resilience to connection issues ensures that large databases are more likely to be transferred successfully the first time. The Enzo Data Copy tool can be found here: Enzo Data Copy

In this test, with SSIS left with its default configuration, SSIS was 25% slower than the Enzo Data Copy Wizard with 2,000,000 records to transfer.  The SSIS Package created was very basic; the UseBulkInsertWhenPossible property was set to true which controls the use of the INSERT BULK command. Note that a more advanced SSIS developer will probably achieve better results by tuning the SSIS load; the comparison is not meant to conclude that SSIS is slower than the Enzo Data Copy utility; rather it is meant to show that the utility compares with SSIS in load times with larger data sets. Also note that the utility is designed to be a SQL Server Migration tool; not a full ETL product.

image

 

Note about the source code

Note that the source code is provided as-is for learning purposes. In order to use the source code you will need to change the connection string to a SQL Azure database, create a SQL Azure database if you don’t already have one, and create the necessary database objects (the T-SQL commands to run can be found in the code). 

The code is designed to give you control over how many threads you want to use and the technique to load the data. For example, this command loads 40,000 records (1000 x 40) using 8 threads in batches of 1000: ExecuteWithSqlBulkCopy(1000, 40, 8); While this command loads 40,000 records (500 x 80) using 4 threads in batches of 500: ExecuteWithPROC(500, 80, 4);

Here is the link to the source code: source code

Here is a sample output of the code:

image

  • Share This Post:
  • Share on Twitter
  • Share on Facebook
  • Share on Technorati

How To Copy A Schema Container To Another SQL Azure Database

This article is written to assist SQL Azure customers to copy a SCHEMA container from one SQL Azure database to another. Schema separation (or compress shard) is a technique used by applications that hold multiple customer “databases” inside the same physical database, but separated logically by a SCHEMA container. At times it may be necessary to copy a given SCHEMA container from one database to another. Copying a SCHEMA container from one database to another can be very difficult because you need to only extract and import the data contained in the tables found in that schema.

If your SQL Azure database has multiple schema containers in it, and you would like to copy the objects under that schema to another database, you can do so by following the steps below. The article uses the Enzo Backup tool (http://bluesyntax.net/backup.aspx), designed to help in achieving this complex task.

Example Overview

Let’s assume you have a database with multiple SCHEMA containers and you would like to copy one of the schema containers to another database. A SCHEMA container holds tables, foreign keys, default constraints, indexes and more. In the screenshot below the selected database has 5 schema contains: the DBO schema and 4 custom SCHEMA container. The tool allows you to copy the objects from any schema into another database.

image

Backing Up Your SCHEMA Container

The first step is to backup your schema. Enzo Backup gives you the options needed to backup a single SCHEMA container at a time.

1- If you have registered your databases previously with the backup tool, select the database from the list of databases on the left pane, right-click on the SCHEMA container (i.e. your logical database) and select Backup to Blob (or file). Otherwise click on the Backup –> To Blob Device from the menu and enter the server credentials, and specify the name of the SCHEMA to backup from the Advanced screen.

2- Type the name of your backup device

3- Optionally select a cloud agent if you are saving to a Blob on the Advanced tab (note: a cloud agent must be deployed separately)

4- Click Start

image

Restoring Your SCHEMA Container

Once the backup is complete, you can restore the backup device to the database server of your choice. Note that you could restore to a local SQL Server database or to another SQL Azure database server.

1- Click on Backups on the left pane to view your backup devices and find your device

2- Right-click on your backup device (file or blob)

3- Enter the credentials of the server you are restoring to and the name of the database

4- If the database does not exist, select the Create If… option

5- Optionally, if you are using a Blob device and restoring to a SQL Azure database, check the Use Cloud Agent (note: a cloud agent must be deployed separately)

6- Click Start

image

If for some reason the database you are restoring to is not empty, you may see a warning indicating that the database has existing objects. Click “Yes” to continue. When the operation is complete, you can inspect your database to verify the presence of your logical database.

Considerations

- You can restore additional SCHEMA containers on an existing database. You would simply need to repeat the above steps for each SCHEMA container.

- If your intent was to “move” the SCHEMA container, you will need to clean up the original database. Once your SCHEMA container has been copied, and you have verified all you data is present in the destination database, you will need to manually drop all the objects in the source database before dropping the SCHEMA container.

- This tool does not support backing up SQL Server database. However you can restore a backup device created with Enzo Backup on a local SQL Server database if desired.

  • Share This Post:
  • Share on Twitter
  • Share on Facebook
  • Share on Technorati

Quick Review of Backup tools for SQL Azure

The landscape of SQL Azure backups is changing rapidly.  A few tools are becoming available at no charge and Microsoft is adding capabilities over time. Here a quick update.

Microsoft Tools

Microsoft offers two primary backup mechanisms so far:

  • Export/Import feature available on the Azure Management portal
  • The COPY operation as part of its T-SQL CREATE DATABASE statement

These mechanisms do not offer a scheduling component and do not work together. To obtain a transactionally consistent backup, you first need to perform the COPY operation manually, then run the Export function.

Free Tools

You have other tools on the market that are a bit more comprehensive at no charge. Here are a few:

  • Enzo Backup for SQL Azure (Standard Edition)
  • Red Gate's SQL Azure Backup (backup only; no restore; no cloud backup)

These are the only two third-party products available at no charge I am aware of right now. Note that Red Gate's product provides a simple copy operation to a local SQL Server database, not really a backup/restore solution. However it does the job well if you want to get a local copy of your data. Enzo Backup is more comprehensive and offers many more functions, such as a built-in scheduler, cloud backup devices (in Blobs) and a restore capability.

Paid-For Tools

  • Enzo Backup for SQL Azure (Advanced Edition)
  • SQLAzureBackup (basic command-line BCP wrapper, no cloud backup)

Regarding the tools you can purchase, SQLAzureBackup is a product you can purchase, although it seems somewhat limited in its capabilities. You can nonetheless use it to export your data locally and restore the database back into SQL Azure. Enzo Backup Advanced Edition is a more powerful flavor of the free version that leverages multithreading for faster operations.

Here is a link to an MSDN article that offers additional information on how to backup SQL Azure: http://blogs.msdn.com/b/davidmcg/archive/2011/09/29/data-backup-strategies-for-windows-and-sql-azure.aspx

For the time being Enzo Backup appears to have significantly more capabilities than the other products, and it can be used at no charge by visiting Blue Syntax's website (http://www.bluesyntax.net).

[Disclaimer: I am the author of Enzo Backup for SQL Azure; this blog intends to provide a quick overview of the current tools available on the market; please investigate the tools referenced here and visit the MSDN link provided to make an educated decision]

 

  • Share This Post:
  • Share on Twitter
  • Share on Facebook
  • Share on Technorati

Azure Florida Association: New user group announcement

I am proud to announce the creation of a new virtual user group: the Azure Florida Association.

The missiong of this group is to bring national and internaional speakers to the forefront of the Florida Azure community. Speakers include Microsoft employees, MVPs and senior developers that use the Azure platform extensively.

How to learn about meetings and the group

Go to http://www.linkedin.com/groups?gid=4177626

First Meeting Announcement

Date: January 25 2012 @4PM ET

Topic: Demystifying SQL Azure

Description: What is SQL Azure, Value Proposition, Usage scenarios, Concepts and Architecture, What is there and what is not, Tips and Tricks

Bio: Vikas is a versatile technical consultant whose knowledge and experience ranges from products to projects, from .net to IBM Mainframe Assembler.  He has lead and mentored people on different technical platforms, and has focused on new technologies from Microsoft for the past few years.  He is also takes keen interest in Methodologies, Quality and Processes.
 

  • Share This Post:
  • Share on Twitter
  • Share on Facebook
  • Share on Technorati

How to Detect If You Are Connected To A Federation Root

If you develop applications in SQL Azure, you may end up in a situation where your code needs to know if it is operating in the root database or in a Federation Member, or if it is connected to a regular user database. Two ways can be used to make this determination: using system tables, or using the USE FEDERATION command.
Using System Tables
This approach is the easiest to implement since it relies a SQL call only, without much logic in your application code. Assuming your code is currently connected to a SQL Azure database, you could use the sys.databases table to read the is_federation_member column.  
Here is the statement you would use:
SELECT is_federation_member FROM sys.databases WHERE name = db_name()
If you are currently connected to a federation member, is_federation_member will be 1; otherwise it will be 0.
Note that as of this writing, running the above statement in SQL Azure will throw an error because the is_federation_member column is not yet available in regular SQL Azure user databases. However, if you connect to a federation, the SQL code above will run.
Using the USE FEDERATION Command
Alternatively you can also use code in .NET to make that determination. Connect to the database as usual, using the SqlConnection class for example, then perform the following tasks:
-          Execute the USE FEDERATION ROOT WITH RESET command, which forces the connection back to the root database (in case you are not current in the root database)
-          Execute SELECT db_name() which fetches the name of the database of the root database
-          Compare the original database name you connected to with the one returned by the previous call; if the names are the same, you originally connected to the root database
Here is the sample code:
string sql = "USE FEDERATION ROOT WITH RESET";
string sql2 = "SELECT db_name()";
bool isFederationRoot = false;
 
try
{
 SqlConnection sqlConn = new...; // Your connection string goes here
 sqlConn.Open();
 (
new SqlCommand(sql, sqlConn)).ExecuteNonQuery(); // connect to the root db
 SqlDataReader dr = (new SqlCommand(sql2, sqlConn)).ExecuteReader(); // Get the db_name()
 
 if (dr.Read())
   isFederationRoot = (dr[0].ToString().ToLower() == dbName.ToLower());
 dr.Close();
}
catch { }
 
 

 

  • Share This Post:
  • Share on Twitter
  • Share on Facebook
  • Share on Technorati

Solving Schema Separation Challenges

Introduction
To save on hosting costs and simplify maintenance, Software as a Service (SaaS) providers typically rely on schema separation to host multiple customers' records. This implementation relies on a specific SQL Server and SQL Azure feature called a schema object. A schema object behaves like a container, or a namespace in programmatic terms, allowing multiple tables (and other objects) to be stored with the same name in a single database.
However schema separation comes with its own set of challenges. The challenge in using schema separation is that few tools support it. For example, until recently, it was not possible to easily backup or restore a single schema.  Other challenges include security and account management. Let's explore further.
Basics of Schema Separation
Let's assume you have two customers, and you need to track historical information. So you need to create a table called tbl_history for each customer.  You could either create a database for each customer with the tbl_history table in each database (the linear shard pattern), or create a schema container for each customer in a single database, and create the tbl_history table in each schema container (the compressed shard pattern). For a description of sharding patterns review this white paper:  http://www.bluesyntax.net/files/EnzoFramework.pdf.
Here is an example of a schema-based storage for different customers. A database is created with 2 schema containers (cust1 and cust2). The tbl_history table is then added to each schema container. The following script assumes that the database has already been created.
CREATE SCHEMA cust1
GO
CREATE SCHEMA cust2
GO
 
CREATE TABLE cust1.tbl_history(id int identity(1,1) primary key, dateadded datetime default (getdate()), productid int, quantity int)
CREATE TABLE cust2.tbl_history(id int identity(1,1) primary key, dateadded datetime default (getdate()), productid int, quantity int)
 
INSERT INTO cust1.tbl_history(productid, quantity) values (1, 5)
INSERT INTO cust1.tbl_history(productid, quantity) values (2, 7)
INSERT INTO cust2.tbl_history(productid, quantity) values (107, 22)
 
SELECT * FROM cust1.tbl_history
SELECT * FROM cust2.tbl_history
At this point we have two tables with the same name, each in a different schema container, but within a single database. Here is more information about schema containers: http://msdn.microsoft.com/en-us/library/ms189462.aspx
Securing Access
An important feature of schema containers is their support for security. You could easily create a user (mapped to a login account) and grant that user SELECT rights to cust1.  The following statements should be executed against the master database in SQL Azure.
CREATE LOGIN logincust1 WITH PASSWORD = 'p@ssw0rd001'
CREATE LOGIN logincust2 WITH PASSWORD = 'p@ssw0rd002'
Then back in the user database, create two users.
CREATE USER user1 FOR LOGIN logincust1
CREATE USER user2 FOR LOGIN logincust2
 
Finally, you need to authorize each user to execute statements against the history table. We will authorize user1 to use all the objects in schema cust1, and user2 in cust2.
GRANT SELECT, EXECUTE ON SCHEMA::cust1 TO user1
GRANT SELECT, EXECUTE ON SCHEMA::cust2 TO user2
 
At this point, user1 can only select and execute stored procedures in the cust1 schema. user1 cannot access schema cust2.  Here is more information about the Create Login statement: http://msdn.microsoft.com/en-us/library/ee336268.aspx
Customer Accounts vs. Service Accounts
The previous section discusses the creation of logins and users that provide security to each schema container. It should be noted that each login/user account created in the user database should be treated as a service account, not an actual customer account. Indeed, if you create customer accounts in SQL Azure directly you could negatively impact connection pooling, and hence performance. You should authorize users first, using a specific application authentication component (such as ASP.NET Membership), then map the customer account to the service account to use.
You typically implement the mapping (from customer account to service account) using a separate customer account database. A column in your customer account table would store the database connection string, which would contain the service account to use. Here is an example:
CREATE TABLE custmapping(customerid int primary key, custconnection nvarchar(255) NOT NULL)
Note that in a production environment, you would likely encrypt the connection string. The above table does not implement encryption to illustrate a simple scenario.
You would then add a record for each customer:
INSERT INTO custmapping VALUES (1, 'server=....;UID=logincust1;PWD=p@ssw0rd001')
INSERT INTO custmapping VALUES (2, 'server=....;UID=logincust2;PWD=p@ssw0rd002')
 
When a customer logins with their account, your code would read the custmapping table to retrieve the connection string to use for the duration of the session.
Moving A Schema
If a customer grows significantly, or is abusing your SaaS application, you may be facing the need to move that customer's data to a different database so that the other customers (called tenants in SaaS terms) are not affected negatively by the increase in resources needed by that customer.
There are very few options available today to move a single schema container, and its associated objects, from one database to another. You could manually create the schema container and its objects first, then use BCP or SSIS to move the data. However this can be error prone and lengthy.
A tool recently released by Blue Syntax, called Enzo Backup for SQL Azure, provides the ability to backup a single schema. This tool will backup the associated users in addition to the related objects (tables, stored procedures and so forth). The restore process will recreate the schema in the chosen database server and all the objects (and data) in that schema. Here is more information about this tool: http://www.bluesyntax.net/backup.aspx
Multitenant Frameworks
Due to the level of complexity in building multitenant environments, certain companies are turning to specialized frameworks. Although these frameworks can require some learning curve, they provide certain capabilities that would be difficult to build, such as fan-out, caching, and other capabilities. Here are a few .NET frameworks:
·         The Enzo Sharding Library (open-source); also supports Data Federation: http://enzosqlshard.codeplex.com/releases/view/72791
·         The CloudNinja project (open-source): http://cloudninja.codeplex.com/
·         The Enzo Multitenant Framework: http://www.bluesyntax.net/scale.aspx
Schema Separation and Data Federation
Data Federation is an upcoming feature of SQL Azure that will provide a new mechanism to distribute data. In a way, Data Federation allows SaaS vendors to design a monolithic database, in which all customers are located, and distribute records of one or more tables across databases when the time comes. Data Federation is essentially a compressed shard, similarly to schema separation, with the added benefit of tooling support and easier repartitioning based on performance and storage needs. In addition, Data Federation can help you distribute data on almost any dimension; not just customer id.
Data Federation and Schema Separation are not competing solutions for SaaS vendors. Each have specific benefits and challenges. SaaS vendors will need to determine which one serves their needs best, or even use both.
Using both Schema Separation and Data Federation delivers even greater flexibility.  For example, if the history table of customer 2 becomes too large, you could leverage Data Federation to split that table, without affecting customer 1. A specific use of this technique could be to use Data Federation to split tables across databases every year for all customers (or only those that have a lot of records). This dual layering technique (using schema separation for customer records, and data federation for archiving) can deliver remarkable performance results and scalability.
Conclusion
Many vendors have successfully built a SaaS solution using schema separation. While tooling has been lacking, some vendors are adding schema separation support in their products. Understanding the options available at your finger tips, including the frameworks already developed, and how to combine sharding models can give you a significant advantage when building your SaaS solutions.
  
  • Share This Post:
  • Share on Twitter
  • Share on Facebook
  • Share on Technorati

Sharding Library for SQL Azure Data Federation

The ability to write scale out applications for SQL Azure will soon become much easier, thanks to the upcoming SQL Azure Data Federation capabilities. In an earlier post, I outlined specific steps developers can take to prepare for this key enhancement.  Some of the key capabilities of SQL Azure Data Federation is to distribute large data sets across multiple databases, hence providing a mechanism to avoid the current database size limitations of SQL Azure.

While it may be tempting to look at this new feature as a workaround to an existing size limitation, the real benefit of federating data is to give each data set its own set of independent processing power; indeed each federation member is assigned its own CPU, storage, memory, TempDB and network bandwidth. In other words, Data Federation provides the basic foundation for a scale out database model, typically needed by SaaS applications and historical data sets.

Distributed Needs

Since data stored in federation members are actually stored across multiple databases, certain capabilities taken for granted in typical development environments are not available just yet.  And with this type of new features, new needs are surfacing.  In a nutshell, some of the capabilities currently missing include:

-          The ability to fan out queries when running a query across federation members

-          The ability to perform basic joins across federation members to exclude data sets

-          The ability to execute simple aggregated functions such as MIN, MAX... across federated members

Since these capabilities are not yet supported by the Data Federation feature, it is necessary to leverage parallel processing on the client (or middle-tier) application in order to efficiently query across federation members (i.e. databases).

Open-Source Enzo Library

In order to provide support the above capabilities across federation members, I updated the Enzo Shard Library open-source project (http://enzosqlshard.codeplex.com/releases/view/72791), which is currently in Version 2.0 Beta. This updated library offers new important features, including support for Data Federation. A new class (called the Distributed Query) allows developers to execute a simple SELECT query across federation members. For example, a Distributed Query that fetches customer data across federation members (hence databases) to return the Maximum CustomerID would look like this:

SELECT MAX(CustomerId) USING (select customerid from customers) FEDERATED ON (customerfederation)

The above statement is interpreted by the Enzo library; the inner statement (select customerid from customers) is the T-SQL part that is actually executed across each federation member in parallel against SQL Azure. Then the library aggregates the results and returns the maximum customer id from all the databases involved in the federation (see the details on CodePlex for a complete specification of the Distributed Query).

There are many new features available in this library, which will be the subject of my talk at the PASS Conference 2011 in October (more information).

 

  • Share This Post:
  • Share on Twitter
  • Share on Facebook
  • Share on Technorati

Spotlight (R) on Azure: Cool Tool

While testing the performance characteristics of the SQL Azure backup tool I am building (called Enzo Backup for SQL Azure), I decided to try Spotlight (R) on Azure in order to obtain specific performance metrics from a virtual machine (VM) running on Microsoft's data center.  Indeed, my backup solution comes with a cloud agent (running as a worker role in Azure) that performs backup and restore operations entirely in the cloud. Due to the nature of this application, I needed to have an understanding of possible memory and CPU pressures.
As of today, very few techniques are available to capture VM's performance metrics in Microsoft Azure. One thing you can do is to configure your application to send Diagnostics data to Azure Tables and Blobs; however this requires a lot of work when comes time to parse the information. I had previously tried the Azure Management Pack of System Center. While the Management Pack was indeed powerful, it was difficult to install and came with a very limited number of metrics by default, making it more difficult for a small organization like mine to customize and fully leverage. System Center's capabilities with Azure will grow over time and will most likely appeal to larger organizations.
When it comes to small companies with few operational capabilities, Spotlight (R) on Azure delivers (from Quest). The installation is extremely simple and the tool comes with many metrics out of box. In my tests, I was able to obtain a quick read on my virtual machine and review performance graphs showing CPU Utilization, CPU Queue Length, Memory Utilization and Network Traffic.  The tool shows by default the last 15 minutes or data, which is very handy. If you have a large farm of virtual machines, you can also view metrics rolled up across your virtual machines. Finally the tool allows you to declare Alerts if you cross performance thresholds (the thresholds values can be changed to your needs).   
All in all, this is a simple, yet powerful performance monitoring tool for your Azure virtual machines. Here is a link that shows the tool in action: http://www.youtube.com/watch?v=FDJjytW-VFI and here is a link where you can download the tool for a test drive: http://communities.quest.com/docs/DOC-9906 
 
  • Share This Post:
  • Share on Twitter
  • Share on Facebook
  • Share on Technorati

Preparing for Data Federation in SQL Azure

This blog will help you prepare for an upcoming release of SQL Azure that will offer support for Data Federation.  While no date has been provided for this feature, I was able to test an early preview and compiled a few lessons learned that can be shared publicly. Note however that certain items could not be shared in this blog because they are considered NDA material; as a result, you should expect additional guidance in future posts when the public Beta will be made available.

What is Data Federation?

First and foremost, allow me to explain what Data Federation is in SQL Azure. Many terms have been used, like partitioning, sharding and scale out. Other sharding pattern exist too, which could be described in similar terms; so it is difficult to paint a clear picture of what Data Federation really is without comparing with other partitioning models. (To view a complete description of sharding models, take a look at a white paper I wrote on the topic where with a link on this page: http://bluesyntax.net/scale.aspx)
The simplest model to explain is the linear shard: each database contains all the necessary tables for a given customer (in which case the customer ID would be the dimension). So in a linear shard, you will need 100 databases if you have 100 customers.
Another model is the compressed shard: a container (typically a schema) inside a database contains all the necessary tables for a given customer. So in a compressed shard, each database could contain one or more customers, all logically usually separated by a schema (providing a strong security data enforcement model). As you will see, Data Federation offers an alternative to schema separation.
The scale up model (which is not a shard technically) uses a single database in which customer records are identified by an identifier. Most applications fall in this category today. However the need to design flexible and scalable components in a cloud paradigm makes this design difficult to embrace.
Data Federation, as it turns out, is a mix of a compressed shard and a scale up model as described above, but instead of using a schema to separate records logically, it uses a federation (a new type of container) which is a set of physical databases (each database is called a federation member).  So within each database (a federation member) you can have multiple customers. You are in control of the number of customers you place inside each federation, member and you can either split customers (i.e. create new database containers) or merge them back at a later time (i.e. reducing the number of databases making up your shard).
As a result, Data Federation allows you to start with a single database (similar to a scale up model), and create federations over time (moving into the compressed shard territory). When you create a federation, you specify a range over the customer IDs you would like to split, and SQL Azure will automatically move the records in the appropriate database (federation member). You could then, in theory, design “as usual” using a scale up model and use Data Federation at a later time when performance requirements dictate.
I did say in theory.

Design Considerations

So at this point you are probably wondering… what’s the catch? It’s pretty simple, really. It has to do with the location of the data and performance implications of storing data across physical databases: referential integrity. [Read Cihan’s blog about the limitations on referential integrity here: http://blogs.msdn.com/b/cbiyikoglu/archive/2011/07/19/data-consistency-models-referential-integrity-with-federations-in-sql-azure.aspx]
Data Federation is indeed very powerful. It provides a new way to spread data access across databases, hence offering a shared-nothing architecture across federation members hosting your records. In other words: the more members your federation has (i.e. databases) the more performance is at your fingertips.
But in order to harvest this new power, you will need to make certain tradeoffs, one of which being referential integrity.  While foreign key relationships are allowed within a federated member, they are not allowed between members. So if you are not careful, you could duplicate unique IDs in related tables. Also, since you have distributed records, you may need to replicate certain tables on all your federated members, hence creating a case of data proliferation which will not be synchronized across federation members for you.
If you are considering to use Data Federation in the future and would like to take an early start in your design, I will make the following recommendations:
-          The smaller the number of tables, the easier it will be for you to reconcile records and enforce referential integrity in your application
-          Do not assume all existing Referential Integrity (RI) constructs will still exist in a federated database; plan to move certain RI rules in your application
-          It is possible that the more you create federations members, the more expensive your implementation will be (no formal pricing information is available at the time of this writing)
-          Data Federation still uses the scale up model as its core, since each federated table needs to contain the customer ID in question; so record security and access control is similar to a scale up model
-          You will need to leverage parallel processing extensively to query federated members in parallel if you are searching for records across your members, so if you are not familiar with the Task Parallel Library (TPL), take a look. You will need it
In summary, Data Federation is for you if you need to scale your database to dozens of databases or more due to performance and scalability reasons. It goes without saying that the simpler the database, the easier the adoption of Data Federation will be.  Performance and scalability improvements often come with specific tradeoffs;  Data Federation is no exception and as expected the tradeoff is referential integrity (similar to no-sql database systems).
 
  • Share This Post:
  • Share on Twitter
  • Share on Facebook
  • Share on Technorati

Backing up SQL Azure

That's it!!! After many days and nights... and an amazing set of challenges, I just released the Enzo Backup for SQL Azure BETA product (http://www.bluesyntax.net). Clearly, that was one of the most challenging projects I have done so far.

Why???

Because to create a highly redundant system, expecting failures at all times for an operation that could take anywhere from a couple of minutes to a couple of hours, and still making sure that the operation completes at some point was remarkably challenging. Some routines have more error trapping that actual code...

Here are a few things I had to take into account:

  • Exponential Backoff (explained in another post)
  • Dual dynamic determination of number of rows to backup 
  • Dynamic reduction of batch rows used to restore the data
  • Implementation of a flexible BULK Insert API that the tool could use
  • Implementation of a custom Storage REST API to handle automatic retries
  • Automatic data chunking based on blob sizes
  • Compression of data
  • Implementation of the Task Parallel Library at multiple levels including deserialization of Azure Table rows and backup/restore operations
  • Full or Partial Restore operations
  • Implementation of a Ghost class to serialize/deserialize data tables

And that's just a partial list... I will explain what some of those mean in future blob posts. A lot of the complexities had to do with implementing a form of retry logic, depending on the resource and the operation.

 

  • Share This Post:
  • Share on Twitter
  • Share on Facebook
  • Share on Technorati
Twitter