How To Create A Powershell Script To Backup Your Azure Blobs

In this post I will show you how easy it is to create a PowerShell script using Visual Studio that can backup your Azure Blobs. The same concept can be applied to backup Azure Tables and SQL Database. To successfully follow this example, you will need a few things. First, we are coding in C# with Visual Studio 2012. The actual backup operation is rather complex, so we will be using the free API provided by Blue Syntax (http://www.bluesyntaxconsulting.com) to perform the actual backup operation. The PowerShell script will be calling the API.

Install And Configure Enzo Cloud Backup

As a pre-requisite, you will need to install the Enzo Cloud Backup tool, and the Enzo Cloud Backup API Version 3. Both can be found on the Blue Syntax website (http://www.bluesyntaxconsulting.com/backup30.aspx). Click on the Download button found on the product description page, and install Enzo Cloud Backup, and Enzo Cloud Backup API.

Once installed, start Enzo Cloud Backup. You will see a Login page. You will need to specify an Azure Storage Account where Enzo will store its configuration information. It is recommended to use a Storage Account that is used by Enzo only. Once you have created the Storage Account in your Azure subscription, enter the Account Name and an Account Key on this screen to proceed.  For detailed information on how to create an Azure Storage Account and access the Account Key, read this blog post: http://azure.microsoft.com/en-us/documentation/articles/storage-create-storage-account/.

EnzoLogin

Once logged in, you will need to register your product in order to obtain license keys; this is a very simple step and will ensure that you obtain the necessary license keys to run this example (to register your product, start Enzo Cloud Backup, and go to Help –> Request Permanent License Keys). There is no charge for the Community Edition of Enzo Cloud Backup; the API comes at no charge as well. Once registered you will receive an email with your license keys, so make sure you enter a valid email address in the registration key.

Register

Create a Class Library Project

Now that the Enzo Cloud Backup tool is installed along with the API, let’s create a new project in Visual Studio 2012. The project type is a class library. I named this project PSEnzoBackup. Make sure you select a Visual C# project; the code provided further down is written in C#.

NewProject

Configure Your Project

Once the project has been created, rename the Class1.cs file to BackupBlob.cs and make sure the class name is also renamed to BackupBlob. The end result should look like this in you Solution Explorer.

 RenameClass

Once the class has been renamed, add the following references to your project:

  • System.Management.Automation
  • CloudBackupAPI
  • EnzoBackupAPI

You can find the Automation DLL on your development machine (usually under C:\Program Files (x86)\Reference Assemblies\Microsoft\WindowsPowerShell). The other two libraries can be found in the directory where you installed the Enzo Cloud Backup API (they are usually found in a subdirectory of C:\Program Files (x86)\BlueSyntax\).

Add A Backup Method

At this point, the project is ready for development. Paste the following code into your BackupBlob.cs file as-is.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Management.Automation;

namespace PSEnzoBackup
{
    [Cmdlet(VerbsCommon.New, "BlobBackup")]
    public class BackupBlob : PSCmdlet
    {
        private string[] _blobNames;

        [Parameter(
            Mandatory=true,
            ValueFromPipelineByPropertyName = true,
            ValueFromPipeline = true,
            Position = 0,
            HelpMessage = "The list of blobs to backup (container/blob) separated by a comma. Can use * as a wildcard."
            )
        ]
        [Alias("ListBlob")]
        public string[] BlobNames
        {
            get { return _blobNames; }
            set { _blobNames = value; }
        }

        protected override void ProcessRecord()
        {
            WriteObject("Starting the backup operation...");

            BSC.Backup.Helper.AzureBlobBackupHelper backup = new BSC.Backup.Helper.AzureBlobBackupHelper(
                "ENZO_STORAGE_ACCOUNT",
                "ENZO_STORAGE_KEY",
                false,
                "YOUR_API_LICENSE_KEY");
            backup.Location = BSC.Backup.Helper.DeviceLocation.LocalFile;
            backup.DeviceURI = @"c:\tmp\backupfile.bkp";
            backup.Override = true;
            backup.SpecificBlobs = _blobNames.ToList();
            backup.UseCloudAgent = false;

            string operationId = backup.Backup();

        }

    }
}

The ProcessRecord() method is a protected override of the method that will be called by the PowerShell command. In this method, we simply create a reference to the AzureBlobBackupHelper class; the constructor requires the account name and keys of the Azure Storage Account used by Enzo Cloud Backup, and the API license key. Additional properties are available on the backup object; we are specifying that a local file backup device will be created, and will override any existing file. The SpecificBlobs property is set to the list of blob names provided as parameters to the PowerShell command. The Backup() method returns an OperationId which can be used by other Helper classes to check on the progress of the backup operation. We are hard-coding the account credentials and the name of the backup file, but it would be easy to provide parameters instead (as we did with the BlobNames parameter).

Before compiling this code, you will need to replace a few things:  the ENZO_STORAGE_ACCOUNT and ENZO_STORAGE_KEY are those used by the Backup application when you login (see the pre-requisites above). As indicated earlier, you must run the Enzo Backup application at least once to create the necessary configuration tables that the API uses. The AccountName and AccountKey you use to login in Enzo Backup are the ones you need to specify here.  The YOUR_API_LICENSE_KEY is the API License Key you received by email when registering your product with Blue Syntax. Once you have specified those values, you can compile this code.  Note the path where the DLL is being created when you compile as you will need it soon.

DllPath 

Test the PowerShell Script

Open a PowerShell command and type this command to load the PowerShell library we just compiled.

Import-Module "C:\YOUR_FULL_PATH_TO_PROJECT_DIRECTORY\PSEnzoBackup\bin\Debug\PSEnzoBackup.dll"

And finally, run the following command to backup all your blobs. The BlobNames parameter is a list of blob names (specified as ContainerName/BlobName) separated by a comma. The list of blobs supports a wildcard (*/* means all containers and all blobs). For more information about this parameter, and other properties available by this API, visit the online help here: http://www.bluesyntaxconsulting.com/EnzoCloudBackup30/APIBackupAB.aspx.

New-BlobBackup -BlobNames */*

You can easily wrap the commands provided by the Enzo Cloud Backup API in a PowerShell module, including backup Azure Tables, SQL Databases, and even restoring them. This example shows how to backup to disk, but you can also backup to blobs. To learn more about the API visit http://www.bluesyntaxconsulting.com/Backup30API.aspx.

About Herve Roggero

Herve Roggero, Microsoft Azure MVP, @hroggero, is the founder of Blue Syntax Consulting (http://www.bluesyntaxconsulting.com). Herve's experience includes software development, architecture, database administration and senior management with both global corporations and startup companies. Herve holds multiple certifications, including an MCDBA, MCSE, MCSD. He also holds a Master's degree in Business Administration from Indiana University. Herve is the co-author of "PRO SQL Azure" and “PRO SQL Server 2012 Practices” from Apress, a PluralSight author, and runs the Azure Florida Association.

Backing up Azure Blobs

As customers adopt the Microsoft Azure platform, the need to backup Azure Blobs is becoming increasingly important. That’s because Azure Blobs are used by both the Platform as a Service (PaaS) and Infrastructure as a Service (IaaS) components of Microsoft Azure. In fact, Azure Blobs are also becoming more popular with local software applications that need a central location to store configuration information and application logs. Since Blobs are a form of data, there is a need to back them up.

Until now, the way to backup Azure Blobs was to copy the files on local hard drives, which many third party tools currently provide. However Azure Blobs contain additional information that files can’t carry, such as custom metadata, HTTP properties, and Access Control List (ACL) details providing security information on those blobs. But Azure Blobs are not the only components that need to be backed up: Azure Blob Containers are also important. They are the equivalent of directory structures, and also carry metadata and ACL information. Existing third party tools do not provide a mechanism to save this additional information.

I created a free utility called Enzo Cloud Backup, version 3, now available for download (http://www.bluesyntaxconsulting.com/). This tool allows you to perform backup and restore operations on Azure Blobs (in addition to Azure Tables and Azure SQL Database). It is designed to handle a very large number of blobs and containers. In addition, an API is also available at no charge so that you can programmatically initiate backup and restore operations.

The free edition provides all the features of the advanced edition, but is limited in the number of backups that can be performed monthly. Give it a try; you will be surprised by how easy it is to use. And if you have some feedback, please send them to me!

About Herve Roggero

Herve Roggero, Microsoft Azure MVP, @hroggero, is the founder of Blue Syntax Consulting (http://www.bluesyntaxconsulting.com). Herve's experience includes software development, architecture, database administration and senior management with both global corporations and startup companies. Herve holds multiple certifications, including an MCDBA, MCSE, MCSD. He also holds a Master's degree in Business Administration from Indiana University. Herve is the co-author of "PRO SQL Azure" and “PRO SQL Server 2012 Practices” from Apress, a PluralSight author, and runs the Azure Florida Association.

The Business Case for a Data Server

As a developer you are familiar with Web Servers, and Database Servers. Both service data, in different ways. And this creates an interesting challenge. Let’s face it: accessing data is hard. Data in databases, in XML documents, in PDF documents, in flat files, on FTP servers in a proprietary format, in cubes, in no-SQL… you name it. In addition to those storage formats, data is made available in a large variety of file formats, available through a multitude of protocols, and serviced through an ever increasing set of providers each with their own authentication and authorization implementation. But the storage of data and the way to access it isn’t the only challenge. Who wants to access all this data? Developers, DBAs, reports writers, Systems Integrators, consultants, managers, business users, office workers, applications…

What’s the real problem?

Accessing data is hard because of those three forces: storage formats, protocols, and consumers. Different consumers need the same data in different ways, because they consume data in different ways. When the data is needed by applications (such as a mobile phone), then it is best accessed using REST requests returning JSON documents; this is usually best accomplished by servicing data through a web server. But if the data is needed by a report (such as Excel, or a cube for analytical purposes), it is usually best accessed using SQL commands, which is usually best accomplished using a database server.

This creates an interesting challenge: who are the consumers for a given set of data, and in which protocol should it be accessed? This simple question is hard to answer because for a given set of data, consumers change over time, but not the data stores, nor the protocols. The challenge is usually resolved by hiring consultants (or spending a lot of time with internal resources) to build bridges that move/copy data from one storage format to the next, so it can be used by the consumers that need the data in a specific format; that’s why integration tools are very popular: let’s copy the data from point A to point B, and C, so it can be used by consumers (business partners, reports, executives…). All this takes time and money. Copying no-sql data to a relational database or in a cube, so it can be reported on, takes effort. Or extracting flat files from an FTP site daily, and load it in a database, so it can used by a web application requires the creation of a complex set of programs that orchestrate this work.

As a result, the difficulty in making data access ubiquitous inhibits certain companies from making timely decisions, because the necessary data is not immediately available in the right format at the right time. And as mentioned previously, the need for data in various formats is clear. How many deployments of SSIS (SQL Server Integration Services), Joomla, BizTalk, Informatica, Scheduled jobs and ETL processes that move files around are you aware of? Some are needed because complex transformations are necessary, but a vast number of those implementations are in place because the source data is simply in the wrong format. [note: I am not saying these tools are not necessary; I am naming these tools to outline the need for data movement in general]

Introducing the Data Server

With this challenge continuing to grow with every new platform, API, consumer, and storage type, I suggest that a new technology be built, that I will simply call a Data Server, so that data can be serviced in real-time through virtually any protocol regardless of where the data comes from. In other words, it shouldn’t matter who needs the data and through which protocol; a mobile application needs data through REST/JSON? No problem. A report needs the exact same data through SQL? No problem. The same data, coming from the same source, should be accessible in real-time regardless of consumer preferences. If data were available in a ubiquitous manner, regardless of the protocol being used, a large number of integration routines would become obsolete, or would be largely simplified.

So what are the attributes of a Data Server?  It should be able to hide the complexities of the underlying data sources, and present data in a uniform way, through multiple protocols. For example, a Data Server would present Tweets through a REST/JSON interface and through an SQL interface (the same data in real-time; not a copy of the data). SharePoint lists could be made available through the same REST/JSON interface, or SQL as well. An FTP server could be accessed through REST/JSON and SQL too, and so would WMI, no-sql, CICS screens, flat files, SOAP endpoints… Regardless of the origin of the data, it would be serviced through a uniform response in the desired protocol.

The Data Server could also abstract security by shielding the authentication and authorization details of the underlying source of data.  This makes a lot of sense too because most sources of data use different security protocols, or variations of published standards. For example, authenticating to Microsoft Bing, Azure Blobs, Google Maps, FTP, SharePoint or Twillio is difficult because they all use different implementations. So abstracting authentication through a single REST layer or an SQL interface, and adding a layer of authorization on top of these data endpoints makes things much easier and more secure. It becomes possible to monitor data consumption across private and public sources of data as well, which can be important in certain organizations.

Data Cross Concerns

A Data Server would also help in implementing data cross concerns that are not usually easy to configure (or use), such as caching, asynchronous processing, scheduling, logging and more. For example, caching becomes much more interesting through a data server because it doesn’t matter which interface is used to access the data; cached data sets can be made available to both REST/JSON and SQL interfaces at the same time, which means that the data needs to be cached only once and remains consistent no matter which consumer reads the data. 

Asynchronous processing is also an interesting cross concern; consumers can start a request without waiting for its completion, through REST or SQL equally. For example, a REST command could initiate an asynchronous request, and a SQL command could check the completion of the request and fetch the results. Since the protocol used by consumers becomes an implementation choice, the data takes center place in a Data Server. Accessing, managing, recycling and updating data through a vast array of data sources becomes protocol agnostic.

Conclusion

To accelerate data-intensive projects and to help organizations consume the data they need efficiently, data should be made available in a uniform way regardless of the client protocol, no matter where it comes from, so that it doesn’t matter anymore who needs to consume that data. By creating this level of abstraction, the authentication and authorization mechanism can be streamlined too, so that there is only one way to secure the information. And since a Data Server channels requests to a multitude of data endpoints, it becomes a hub where common data related concerns can be implemented, including caching, scheduling and asynchronous requests.

About Herve Roggero

Herve Roggero, Microsoft Azure MVP, @hroggero, is the founder of Blue Syntax Consulting (http://www.bluesyntaxconsulting.com). Herve's experience includes software development, architecture, database administration and senior management with both global corporations and startup companies. Herve holds multiple certifications, including an MCDBA, MCSE, MCSD. He also holds a Master's degree in Business Administration from Indiana University. Herve is the co-author of "PRO SQL Azure" and “PRO SQL Server 2012 Practices” from Apress, a PluralSight author, and runs the Azure Florida Association.

To SaaS or not to SaaS

Many companies servicing specific industries have technology assets that could be transformed into a Software as a Service (SaaS) offering. Your business may have a unique, proprietary technology that your industry could use. So packaging your intellectual property into a SaaS offering may make sense; you could essentially rent your ‘savoir faire’ to your industry, or even to multiple industries depending on how generic the technology is. For example, if you know how to process insurance claims effectively for your own business, could you offer an Insurance Claim SaaS platform to your industry? When done properly, offering a SaaS solution could open the doors to new revenue and a more predictable revenue model based on a subscription model for example. And of course, if you are considering a SaaS offering, it may even be possible to transform your technology offering into a more developer centric solution as a PaaS offering; so the following discussion applies equally to PaaS enabling your technology stack.

What’s my low hanging fruit?

To begin answering the question of whether or not you should consider building a SaaS offering, you want to make an inventory of the high level technologies you have built over the years. It may be related to licensing, data transfer, security models, industry-specific calculations, a mathematical model and so forth. Or it could be as simple, or generic, as business card transactions that you feel is unique to your industry because of compliance requirements or other constraint.

Once you have established this list, find the technology that is likely the easiest to package. If the technology is well understood by your staff, if you own all the rights to the technology, if the technology does not need too much rework to be used by multiple customers, and if making the technology available to the competition won’t put you out of business, then you may have a winner!

What else is involved?

If you identify a technology that makes sense to package as a SaaS offering, you need to evaluate the energy involved in creating your offering. By energy I mean associated costs and servicing readiness of your company. From a cost standpoint, you may need to invest in making your technology more generic to handle a greater variety of conditions that your online customers will need; or perhaps you may need to redesign a component in your technology to handle multiple customer requests at once.

From a servicing standpoint, will you need to build a management portal for your SaaS offering? Will you need to train your staff into the new kinds of issues that your technology may face when exposed publicly? Keep in mind that building an online service may require your company to be prepared to function like a software company, with release management, higher quality control, patch management and so forth.

Example of a failed service

Although creating a SaaS offering can be lucrative, it is usually not easy to build and service over time. Some services seem to make sense, but the technology stack, or the actual customer demands are such that it may not be a viable financial option in the long run. A technology that does not scale very well to an increasing number of users could spell doom for your SaaS ideas.

Although there are many services that failed over the years, I am cherry picking the Azure Reporting Service as the prime example because it was put together by a company that now builds cloud services as part of its mainstream business: Microsoft. Of course, Microsoft has built many successful online services (SaaS, PaaS and IaaS), and as such is developing a leading expertise in packaging and bringing to market remarkable solutions. And yet, the reporting service offered a couple of years ago failed from a business standpoint. The service itself was certainly in high demand; configuring an onsite Reporting Services environment is complex, and hard to maintain. So a SaaS/PaaS offering made sense for this service because of the demand for simpler configuration. However, the technology is likely very hard to scale; running reports can use a lot of resources, which makes it difficult to predict (and contain) the amount of resources used by hundreds (or thousands) of customers. Ultimately, while the offering was awesome (and so was the service itself), it did not appear to be scalable enough, which means that Microsoft probably pulled the plug on this offering due to the high operating costs (high operating costs meant a high price point for the service itself, which as a result impacted negatively customer demand).

Conclusion

If your business services a specific industry, transforming an existing technology you built over the years into a SaaS (or PaaS) offering could provide a new and more scalable business model to your company, and create a more predictable revenue stream over time. One way to enter this business model is to leverage your own technology stack and repurpose what you already do well into a publicly available service. However it can be challenging to transform your business into an online operation. If your company is considering entering this space, you should consider the various business impacts of running a SaaS service, including whether or not your technology needs to be upgraded and how you may need to change your business to function like a software company.

About Herve Roggero

Herve Roggero, Microsoft Azure MVP, @hroggero, is the founder of Blue Syntax Consulting (http://www.bluesyntaxconsulting.com). Herve's experience includes software development, architecture, database administration and senior management with both global corporations and startup companies. Herve holds multiple certifications, including an MCDBA, MCSE, MCSD. He also holds a Master's degree in Business Administration from Indiana University. Herve is the co-author of "PRO SQL Azure" and “PRO SQL Server 2012 Practices” from Apress, a PluralSight author, and runs the Azure Florida Association.

Microsoft Azure and Threat Modeling You Apps

In this blog post I will introduce you to the Microsoft Threat Modeling Tool (TMT) and how it can be used specifically with the Microsoft Azure environment for application development and application deployment purposes. In order to do this, I will take a specific example, using the Enzo Cloud Backup tool my company is building, and show you how to perform a high level threat model for a specific operation: a backup request.

Introduction

First things first… what is the Threat Modeling Tool (TMT)? It’s a pretty cool piece of software that allows you to lay down your application architecture, with its various dependencies and protocols; it becomes a canvas for understanding and identifying potential threats. The tool itself may provide certain recommendations, and a list of checkpoints, but even if you decide to use the tool as a simple architecture diagram, it is very effective at doing so. You can use TMT to create simple, system-level diagrams, or to dive into specific operations to get into the data flow of your system. I refer to system-level diagrams as Level 0, and a data flow as Level 1.  The deeper you go, the deeper the analysis. Interestingly enough, I am starting to use TMT as a troubleshooting guide as well, since it does a good job at representing inter-system connections and data flows.

As you draw your diagram, you will need to choose from a list of processes, external components, data stores, data flows and trust boundaries. Selecting the correct item in this list is usually simple; you also have Generic items that allows you specify every security property manually.

image

Enzo Cloud Backup – Level 0 Diagram

Let’s start with a Level 0 system diagram of the Enzo Cloud Backup system. As we can expect, this is a very high level overview which provides context for other more detailed diagrams. The Level 0 diagram shows two primary trust domains: the corporate network where Enzo Cloud Backup is installed, and the Microsoft Azure trust domain, where the backup agent is installed. This diagram also introduces the concept that the agent and the client communicate directly, and through an Azure Storage account. As a result, two communication paths need to be secured, and both happen to take place over an HTTPS connection. The cloud agent also communicates to an Azure Storage account through HTTPS within the Microsoft Azure trust boundary, and to a data source (that is being backed up or restored). A more complex diagram would also show the fact that multiple Virtual Machines could be deployed in the cloud, all communicating to the same storage account, but for simplicity I am only showing one Virtual Machine. Also note that in my Level 0 diagram I choose not to document the sequence of events; in fact my Level 0 diagrams are not specific enough to dive into specific events or data flows.

Although the diagram below shows boxes and circles, selecting the right stencil is important for future analysis. There are no specific visual cues on the diagram itself indicating if you are dealing with a Managed or Unmanaged process for example, but the tool shows this information in the Properties window (shown later) and it will use this information to provide a set of recommendations. In addition, each item in the diagram comes with a set of properties that you can set, such as whether a data flow is secured, and if it requires authentication and/or authorization.

image

Enzo Cloud Backup – Level 1 Remote Backup Operation

Now let’s dive into a Level 1 diagram. To do so, let’s explore the systems and components used by a specific process: a remote backup operation. Level 1 diagrams are usually more complex, and as a result need to show as many data stores as possible, and optionally identify the order of data flows. While TMT doesn’t natively give you the option to document the sequence of your data flow, you can simply name each data flow by adding a number in front of it to achieve a similar result. By convention, I use 0 as initialization tasks, 1 as the starting request, and sometimes I even use decimals to document what happens within a specific step. But feel free to use any mechanism you think works for you if you need to document the sequence of your data flows.

My Level 1 diagram below shows that a human starts the backup request through Enzo Cloud Backup, which then sends a request through a Queue. The backup service then picks up the work request and starts a backup thread that performs the actual work. The thread itself is the one creating entries in the history data store that the client reads to provide a status to the user. As we can see in this diagram, the cross-boundary communications are as expected from our Level 0 diagram. Also worth noting is that the backup agent reads from a configuration file; from a security standpoint this could represent a point of failure and disrupt the backup process if it is tempered with. The client depends on the local machine registry which is also a store to properly secure as a result.  Last but not least, the numbering provided in front of each data flow allows you to walk through the steps taken during the backup request.

image

As mentioned previously, you can configure each item in the diagram with important properties which will ultimately help TMT provide recommendations. For example, I selected an HTTPS data flow for the Monitoring Status reading from the History Data store. Since I selected an HTTPS data flow, the tool automatically set some of the security attributes; I also provided information about the payload itself (it is a REST payload for example). Other properties, not shown below, are also available. And you can create your own properties if necessary, for documentation purposes.

image

Going Further with TMT

TMT is being used by large corporations to provide more detailed application diagrams, going as far as internal memory access. In fact the tool allows you to add notes, and have a checklist of important security verifications for your diagrams. TMT also comes with a build-in security report that gives you which steps you should consider in making you system more secure. The following diagram is a short sample output of this report for the Registry Hive access. The more information you provide, including the properties on your data stores, data flows, processes and trust boundaries, the better.

image

Last but not least, the tool comes with an Analysis view of your diagram, which is the basis for generating the report shown above. For example TMT identified that the History Data store could become unavailable; while this sounds like a simple problem to solve through a retry mechanism, this could be used as a Denial of Service attack against the tool. Note that TMT will highlight the component in question to make it easy to understand the area at risk.

image

Give it a ride! TMT is easy to use and provides a solid framework for documenting and analyzing your systems; and since most corporations have a vested interested in securing they applications in the cloud, TMT offers the framework to do so. To learn more about TMT, check out this link: http://blogs.microsoft.com/cybertrust/2014/04/15/introducing-microsoft-threat-modeling-tool-2014/.

About Herve Roggero

Herve Roggero, Microsoft Azure MVP, @hroggero, is the founder of Blue Syntax Consulting (http://www.bluesyntaxconsulting.com). Herve's experience includes software development, architecture, database administration and senior management with both global corporations and startup companies. Herve holds multiple certifications, including an MCDBA, MCSE, MCSD. He also holds a Master's degree in Business Administration from Indiana University. Herve is the co-author of "PRO SQL Azure" and “PRO SQL Server 2012 Practices” from Apress, a PluralSight author, and runs the Azure Florida Association.

DocumentDB vs Azure SQL vs Azure Table

Microsoft Azure is now offering a third storage option for developers: DocumentDB.  DocumentDB is a no-sql storage service that stores JSON documents natively and provides indexing capabilities along with other interesting features. Microsoft Azure also offers Azure Tables, another no-sql storage option, and Azure SQL, which is a relational database service. This blog post provides a very high level comparison of the three storage options.

Storage

Purely from a storage perspective, the only relational storage offering is Azure SQL. This means that only Azure SQL will provide a schema and a strongly typed data store. This gives Azure SQL some interesting advantages related to search performance on complex indexes. Both Azure Table and DocumentDB are no-sql storage types, which means they do not enforce schema on their records (a record is called an Entity in Azure Table, and a Document in DocumentDB). However DocumentDB stores its data as a native JSON object, and allows attachments to a document, while Azure Table stores its data in XML (although you can retrieve Azure Table entities as JSON objects). The advantage of storing native JSON objects is that the data maps directly into JSON objects in the client application code. It is also worth noting that Azure SQL comes with storage capacity limitations; it can only store up to 500GB of data in a single database.

Indexing

All three options provide a form of indexing. Azure SQL offers the most advanced indexing capability allowing you to define indexes with multiple columns and other advanced options (such as a WHERE clause as part of the index filter). DocumentDB offers automatically generated indexes allowing queries to efficiently retrieve information. Azure Tables offer only a single index on its PartitionKey; this means that querying on properties other than the PartitionKey can take a significant amount of time if you have a lot of data.

Programmability

Azure SQL provides advanced server-side programming capabilities, through triggers, column-level integrity checks, views and functions (UDF). The programming language for Azure SQL is T-SQL. DocumentDB offers a similar programmability surface using JavaScript, which provides a more uniform experience for developers. Azure Tables do not offer server-side programmability.

Referential Integrity and Transactions

All three options provide some form of data consistency, although limited for Azure Tables. Azure SQL offers full-blown RDBMS referential integrity and transactional support; with T-SQL developers can nest transactions server-side and perform commit or rollback operations. And Azure SQL offers strong Referential Integrity through foreign keys, unique constraints, NOT NULL constraints and more. DocumentDB does not offer strong Referential Integrity (there is no concept for foreign keys for example) but supports transactions through JavaScript. As a result, developers can use JavaScript to enforce server-side referential integrity programmatically. Azure Tables offers basic transaction support through the use of its BATCH operation, although there are a few stringent requirements: up to 100 entities can be batched together, and all must share the same PartitionKey.

Service Cost

Azure SQL pricing recently changed and is now driven by the level of service you desire, which makes it difficult to compare with other storage options. Assuming a Basic level of service and a relatively small database (up to 2GB), you can expect to pay roughly $5.00 per month starting Nov 1st 2014. Pricing of Azure SQL goes up significantly with higher levels of service, and can reach over $3,000 per month for the more demanding databases (the price point impacts the maximum size of the database and the throughput available). DocumentDB offers a starting price point at $45.00 per month per Capacity Unit, which represents a storage quantity and specific throughput objectives (note: the price for DocumentDB is estimated since it is currently in preview). Azure Tables only have a storage price point which makes this storage option very affordable. A 1GB storage requirement with Azure Tables will cost about $0.12 per month with high availability and geo-redundancy.

All the storage options incur additional charges for transactions, which is not included in this analysis to simplify the comparison.

Summary

The three storage options provided by Microsoft Azure (Azure SQL, DocumentDB and Azure Tables) provide an array of implementation options at various price points depending on the needs of your application. Azure SQL provides higher data integrity but is limited in storage capacity, DocumentDB offers a no-sql implementation with configurable consistency levels and JavaScript server-side logic, and Azure Table offers simple no-sql storage capacity.

The following table is my interpretation of the Microsoft Azure storage options used for storing objects or tables.

 

Storage

Indexing

Server-Side Programmability

Ref. Integrity and Transaction Support

Service Cost

Azure SQL

Database

Up to 500GB

Yes

Multi-column

Yes

T-SQL

Yes for R.I.

Yes for Tx Support

Starts at $5 / DB / Month

(DB = Database)

DocumentDB

JSON

Petabytes

Yes

Automatically created/maintained

Yes

JavaScript

Through JavaScript

Supports Tx

starts at $45 / CU / mo

(CU = Capacity Unit)

Azure Table

XML

200TB

Primary Key only

No secondary index

No

Limited

(uses BATCH operation)

$0.12 per GB

 

NOTE 1: This blog post was updated on 9/11/14 at 9:48pm ET to correct an inaccuracy on DocumentDB indexing.
NOTE 2: This blog post was updated on 9/14/14 at 6:37pm ET to correct a typo.

About Herve Roggero

Herve Roggero, Microsoft Azure MVP, @hroggero, is the founder of Blue Syntax Consulting (http://www.bluesyntaxconsulting.com). Herve's experience includes software development, architecture, database administration and senior management with both global corporations and startup companies. Herve holds multiple certifications, including an MCDBA, MCSE, MCSD. He also holds a Master's degree in Business Administration from Indiana University. Herve is the co-author of "PRO SQL Azure" and “PRO SQL Server 2012 Practices” from Apress, a PluralSight author, and runs the Azure Florida Association.

About the new Microsoft Innovation Center (MIC) in Miami

Last night I attended a meeting at the new MIC in Miami, run by Blain Barton (@blainbar), Sr IT Pro Evangelist at Microsoft. The meeting was well attended and is meant to be run as a user group format in a casual setting. Many of the local Microsoft MVPs and group leaders were in attendance as well, which allows technical folks to connect with community leaders in the area.

If you live in South Florida, I highly recommend to look out for future meetings at the MIC; most meetings will be about the Microsoft Azure platform, either IT Pro or Dev topics.

For more information on the MIC, check out this announcement:  http://www.microsoft.com/en-us/news/press/2014/may14/05-02miamiinnovationpr.aspx.

About Herve Roggero

Herve Roggero, Microsoft Azure MVP, @hroggero, is the founder of Blue Syntax Consulting (http://www.bluesyntaxconsulting.com). Herve's experience includes software development, architecture, database administration and senior management with both global corporations and startup companies. Herve holds multiple certifications, including an MCDBA, MCSE, MCSD. He also holds a Master's degree in Business Administration from Indiana University. Herve is the co-author of "PRO SQL Azure" and “PRO SQL Server 2012 Practices” from Apress, a PluralSight author, and runs the Azure Florida Association.

Azure Table JSON vs. AtomPub

A few months ago Microsoft released an update to the Microsoft Azure platform allowing developers to access Azure Tables using JSON payloads. Until then, developers had no choice but to use the AtomPub model.  Since this was the only data representation model available, there was no choice to be made, and as a result the use of the AtomPub model was implicit. With the introduction of JSON payloads, developers can now choose one or the other. Although the choice of using AtomPub or JSON is largely transparent (there are at this time a couple of minor exceptions), meaning that there are no functionality differences, developers should use the JSON payload going forward because the payload is significantly smaller. Take a quick look at payload examples on the MSDN documentation

To use the JSON payload, developers can set the PayloadFormat property as such (tableClient is and instance of CloudTableClient):

  tableClient.DefaultRequestOptions.PayloadFormat = TablePayloadFormat.JsonNoMetadata;

In order to see how much data was actually flowing through, Microsoft published a small test application that allows you to send a few entities into an Azure Table. The source code provided by Microsoft can be found on that page, so it’s pretty easy to try it out. However I needed to see this in action, and wanted to use Fiddler to see the actual payload leaving my machine. So I used the same general idea provided by the sample code from Microsoft and modified it to create my own test harness. To do this, I created a new Console application using Visual Studio 2012, added the Windows Azure Storage 3.0 library as a Nuget package, and created a simple application. You can download my test harness here.  The application saves an log data in a test table called LogTest.  You can change the ‘count’ variable to add more records if you want.

I used Fiddler to capture the tests. The first test was to insert 100 entities in my Azure Table using JSON. The second test was to read those entities back using JSON. The third was to insert 100 entities using AtomPub and the fourth was to read them back using AtomPub. Here are the results of those tests:

  JSON Payload AtomPub Payload Payload Savings
Insert 100 Entities 97,700 bytes sent 155,870 bytes sent -37%
Read 100 Entities 40,550 bytes received 122,578 bytes received -67%

 

In my tests, the JSON payload reduced network traffic (bytes sent) by 37% when inserting data, and about 67% when reading data (bytes received). Note that there was virtually no performance improvement in terms of execution time; the advantage of using JSON is limited to smaller network payloads. Still, this is a significant difference; considering that extracting data out of the Microsoft data centers costs money, companies that read lots of data from Azure Tables could realize cost savings as a result of using JSON payloads.

About Herve Roggero

Herve Roggero, Microsoft Azure MVP, @hroggero, is the founder of Blue Syntax Consulting (http://www.bluesyntaxconsulting.com). Herve's experience includes software development, architecture, database administration and senior management with both global corporations and startup companies. Herve holds multiple certifications, including an MCDBA, MCSE, MCSD. He also holds a Master's degree in Business Administration from Indiana University. Herve is the co-author of "PRO SQL Azure" and “PRO SQL Server 2012 Practices” from Apress, a PluralSight author, and runs the Azure Florida Association.

Discover the Management API in Microsoft Azure

Microsoft recently published a Management API framework within Microsoft Azure. If you haven’t had a chance to review this new feature, still in preview at the time of this writing, you may be surprised… So what is this new feature? Who would use it, and why?

The Management API offers interesting capabilities in at least two scenarios:  transform existing web services into more modern REST APIs, and leverage an advanced API management platform.

If you want to move your older public APIs, such as web services, into the era of mobile computing, complete with monitoring, servicing and security management, you are in luck… Microsoft Azure’s Management API provides a remarkably easy way to build a service proxy that hides older public APIs and allows you to define the REST methods you want to expose individually, and how they map to your older Web Services. The Management API provides a user interface to define the API verbs and the backend web service endpoint that serves it.

In addition to providing a mechanism to upgrade existing web services as REST services without writing any line of code, the Management API offers a strong management interface allowing you to specify caching options, access keys, access control and monitoring of your REST endpoints.

This Management API is designed for companies to upgrade their traditional web services as REST services without the need to change their code base. It also allows small and large organizations to obtain a management platform that a simple web service upgrade would not provide out of the box, such as caching and monitoring.

Although the Management API is still in a preview phase, I highly recommend you investigate the capabilities of this amazing new feature in Microsoft Azure. To learn more about it, please visit the Microsoft Azure website: http://azure.microsoft.com/en-us/documentation/articles/api-management-get-started/

About Herve Roggero

Herve Roggero, Microsoft Azure MVP, @hroggero, is the founder of Blue Syntax Consulting (http://www.bluesyntaxconsulting.com). Herve's experience includes software development, architecture, database administration and senior management with both global corporations and startup companies. Herve holds multiple certifications, including an MCDBA, MCSE, MCSD. He also holds a Master's degree in Business Administration from Indiana University. Herve is the co-author of "PRO SQL Azure" and “PRO SQL Server 2012 Practices” from Apress, a PluralSight author, and runs the Azure Florida Association.

Microsoft Azure News: Capturing VM Images

If you have a Virtual Machine (VM) in Microsoft Azure that has a specific configuration, it used to be difficult to clone that VM. You had to sysprep the VM, and clone the data disks. This was slow, prone to errors, and stopped you from being productive.

No more!

A new option, called Capture, allows you to easily select a VM, running or not. The capture will copy the OS disk and data disks and create a new image out of it automatically for you. This means you can now easily clone an entire VM without affecting productivity.  To capture a VM, simply browse to your Virtual Machines in the Microsoft Azure management website, select the VM you want to clone, and click on the Capture button at the bottom. A window will come up asking to name your image. It took less than 1 minute for me to build a clone of my server.

And because it is stored as an image, I can easily create a new VM with it. So that’s what I did… And that took about 5 minutes total.  That’s amazing…  To create a new VM from your image, click on the NEW icon (bottom left), select Compute/Virtual Machine/From Gallery, and select My Images from the left menu when selecting an Image. You will find your newly created image. Because this is a clone, you will not be prompted for a new login; the user id/password is the same.

About Herve Roggero

Herve Roggero, Microsoft Azure MVP, @hroggero, is the founder of Blue Syntax Consulting (http://www.bluesyntaxconsulting.com). Herve's experience includes software development, architecture, database administration and senior management with both global corporations and startup companies. Herve holds multiple certifications, including an MCDBA, MCSE, MCSD. He also holds a Master's degree in Business Administration from Indiana University. Herve is the co-author of "PRO SQL Azure" and “PRO SQL Server 2012 Practices” from Apress, a PluralSight author, and runs the Azure Florida Association.

Twitter