Faster, Simpler access to Azure Tables with Enzo Azure API

After developing the latest version of Enzo Cloud Backup I took the time to create an API that would simplify access to Azure Tables (the Enzo Azure API). At first, my goal was to make the code simpler compared to the Microsoft Azure SDK. But as it turns out it is also a little faster; and when using the specialized methods (the fetch strategies) it is much faster out of the box than the Microsoft SDK, unless you start creating complex parallel and resilient routines yourself. Last but not least, I decided to add a few extension methods that I think you will find attractive, such as the ability to transform a list of entities into a DataTable. So let’s review each area in more details.

Simpler Code

My first objective was to make the API much easier to use than the Azure SDK. I wanted to reduce the amount of code necessary to fetch entities, remove the code needed to add automatic retries and handle transient conditions, and give additional control, such as a way to cancel operations, obtain basic statistics on the calls, and control the maximum number of REST calls the API generates in an attempt to avoid throttling conditions in the first place (something you cannot do with the Azure SDK at this time).

Strongly Typed

Before diving into the code, the following examples rely on a strongly typed class called MyData. The way MyData is defined for the Azure SDK is similar to the Enzo Azure API, with the exception that they inherit from different classes. With the Azure SDK, classes that represent entities must inherit from TableServiceEntity, while classes with the Enzo Azure API must inherit from BaseAzureTable or implement a specific interface.

// With the SDK
public class MyData1 : TableServiceEntity
{
    public string Message { get; set; }
    public string Level { get; set; }
    public string Severity { get; set; }
}

//  With the Enzo Azure API
public class MyData2 : BaseAzureTable
{
    public string Message { get; set; }
    public string Level { get; set; }
    public string Severity { get; set; }
}

Simpler Code

Now that the classes representing an Azure Table entity are defined, let’s review the methods that the Azure SDK would look like when fetching all the entities from an Azure Table (note the use of a few variables: the _tableName variable stores the name of the Azure Table, and the ConnectionString property returns the connection string for the Storage Account containing the table):

// With the Azure SDK
public List<MyData1> FetchAllEntities()
{
     CloudStorageAccount storageAccount = CloudStorageAccount.Parse(ConnectionString);
     CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
     TableServiceContext serviceContext = tableClient.GetDataServiceContext();

     CloudTableQuery<MyData1> partitionQuery =
        (from e in serviceContext.CreateQuery<MyData1>(_tableName)
        select new MyData1()
        {
           PartitionKey = e.PartitionKey,
           RowKey = e.RowKey,
           Timestamp = e.Timestamp,
           Message = e.Message,
           Level = e.Level,
           Severity = e.Severity
           }).AsTableServiceQuery<MyData1>();

       return partitionQuery.ToList(); 
}

This code gives you automatic retries because the AsTableServiceQuery does that for you. Also, note that this method is strongly-typed because it is using LINQ. Although this doesn’t look like too much code at first glance, you are actually mapping the strongly-typed object manually. So for larger entities, with dozens of properties, your code will grow. And from a maintenance standpoint, when a new property is added, you may need to change the mapping code. You will also note that the mapping being performed is optional; it is desired when you want to retrieve specific properties of the entities (not all) to reduce the network traffic. If you do not specify the properties you want, all the properties will be returned; in this example we are returning the Message, Level and Severity properties (in addition to the required PartitionKey, RowKey and Timestamp).

The Enzo Azure API does the mapping automatically and also handles automatic reties when fetching entities. The equivalent code to fetch all the entities (with the same three properties) from the same Azure Table looks like this:

// With the Enzo Azure API
public List<MyData2> FetchAllEntities()
{
       AzureTable at = new AzureTable(_accountName, _accountKey, _ssl, _tableName);
       List<MyData2> res = at.Fetch<MyData2>("", "Message,Level,Severity");
       return res;
}

As you can see, the Enzo Azure API returns the entities already strongly typed, so there is no need to map the output. Also, the Enzo Azure API makes it easy to specify the list of properties to return, and to specify a filter as well (no filter was provided in this example; the filter is passed as the first parameter). 

Fetch Strategies

Both approaches discussed above fetch the data sequentially. In addition to the linear/sequential fetch methods, the Enzo Azure API provides specific fetch strategies. Fetch strategies are designed to prepare a set of REST calls, executed in parallel, in a way that performs faster that if you were to fetch the data sequentially. For example, if the PartitionKey is a GUID string, you could prepare multiple calls, providing appropriate filters ([‘a’, ‘b’[, [‘b’, ‘c’[, [‘c’, ‘d[, …), and send those calls in parallel. As you can imagine, the code necessary to create these requests would be fairly large. With the Enzo Azure API, two strategies are provided out of the box: the GUID and List strategies. If you are interested in how these strategies work, see the Enzo Azure API Online Help. Here is an example code that performs parallel requests using the GUID strategy (which executes more than 2 t o3 times faster than the sequential methods discussed previously):

public List<MyData2> FetchAllEntitiesGUID()
{
    AzureTable at = new AzureTable(_accountName, _accountKey, _ssl, _tableName);
    List<MyData2> res = at.FetchWithGuid<MyData2>("", "Message,Level,Severity");
    return res;
}

Faster Results

With Sequential Fetch Methods

Developing a faster API wasn’t a primary objective; but it appears that the performance tests performed with the Enzo Azure API deliver the data a little faster out of the box (5%-10% on average, and sometimes to up 50% faster) with the sequential fetch methods. Although the amount of data is the same regardless of the approach (and the REST calls are almost exactly identical), the object mapping approach is different. So it is likely that the slight performance increase is due to a lighter API. Using LINQ offers many advantages and tremendous flexibility; nevertheless when fetching data it seems that the Enzo Azure API delivers faster.  For example, the same code previously discussed delivered the following results when fetching 3,000 entities (about 1KB each). The average elapsed time shows that the Azure SDK returned the 3000 entities in about 5.9 seconds on average, while the Enzo Azure API took 4.2 seconds on average (39% improvement).

image

With Fetch Strategies

When using the fetch strategies we are no longer comparing apples to apples; the Azure SDK is not designed to implement fetch strategies out of the box, so you would need to code the strategies yourself. Nevertheless I wanted to provide out of the box capabilities, and as a result you see a test that returned about 10,000 entities (1KB each entity), and an average execution time over 5 runs. The Azure SDK implemented a sequential fetch while the Enzo Azure API implemented the List fetch strategy. The fetch strategy was 2.3 times faster. Note that the following test hit a limit on my network bandwidth quickly (3.56Mbps), so the results of the fetch strategy is significantly below what it could be with a higher bandwidth.

image

Additional Methods

The API wouldn’t be complete without support for a few important methods other than the fetch methods discussed previously. The Enzo Azure API offers these additional capabilities:

  • - Support for batch updates, deletes and inserts
  • - Conversion of entities to DataRow, and List<> to a DataTable
  • - Extension methods for Delete, Merge, Update, Insert
  • - Support for asynchronous calls and cancellation
  • - Support for fetch statistics (total bytes, total REST calls, retries…)

For more information, visit http://www.bluesyntax.net or go directly to the Enzo Azure API page (http://www.bluesyntax.net/EnzoAzureAPI.aspx).

About Herve Roggero

Herve Roggero, Windows Azure MVP, is the founder of Blue Syntax Consulting, a company specialized in cloud computing products and services. Herve's experience includes software development, architecture, database administration and senior management with both global corporations and startup companies. Herve holds multiple certifications, including an MCDBA, MCSE, MCSD. He also holds a Master's degree in Business Administration from Indiana University. Herve is the co-author of "PRO SQL Azure" from Apress and runs the Azure Florida Association (on LinkedIn: http://www.linkedin.com/groups?gid=4177626). For more information on Blue Syntax Consulting, visit www.bluesyntax.net.

Twitter