Shaun Xu

The Sheep-Pen of the Shaun


News

logo

Shaun, the author of this blog is a semi-geek, clumsy developer, passionate speaker and incapable architect with about 10 years’ experience in .NET and JavaScript. He hopes to prove that software development is art rather than manufacturing. He's into cloud computing platform and technologies (Windows Azure, Amazon and Aliyun) and right now, Shaun is being attracted by JavaScript (Angular.js and Node.js) and he likes it.

Shaun is working at Worktile Inc. as the chief architect for overall design and develop worktile, a web-based collaboration and task management tool, and lesschat, a real-time communication aggregation tool.

MVP

My Stats

  • Posts - 122
  • Comments - 574
  • Trackbacks - 0

Tag Cloud


Recent Comments


Recent Posts


Archives


.NET


 

In Windows Azure, when we published a cloud service, or a virtual machine, it will provide a public virtual IP (VIP) address and a DNS name to us. For example, when I created a new cloud service I need to provide the name, which is the public URL prefixing of it. Worldwide azure will provide [name].cloudapp.net while China azure will provide [name].chinaclouapp.cn. This URL will never been changed until we delete the cloud service regardless if anything we deployed in it or not. So it’s very stable that we can use to visit and bind with other external service, such as our DNS server through CNAME and some web services. But the public IP address we got was not as stable as this URL. If we need to use this VIP in our application we need to be very carefully since it might be changed out of our expectation.

I knew that the VIP might be changed in some cases, but I just dig into it once I got a requirement from a customer who need to bind their SMS service with the VIP of their cloud service. The communication between the SMS service and integrated application was socket, which means it only allow IP binding. And since they have a very strict firewall rule and change policy it would be very hard and time consuming to change the binding IP once it had been configured. So our target is to work out a plan to minimize the VIP changes.

 

VIP Grant, Keep and Release Policy

In the MSDN page (here) it said

The VIP of a cloud service is allocated when you first deploy it to Windows Azure in a particular environment, such as the Production environment. The VIP doesn’t change unless you delete the deployment explicitly or it is implicitly deleted by the deployment update process.

Well this is a little bit general and I would like to explain more about when the VIP would be granted, kept and released. I will use cloud service as the example, and the virtual machine would be very similar.

When we created a new cloud service, we specified the region, name and then Windows Azure provided a public URL to us. But at this moment since there is no deployment in the cloud service, there is no VIP assigned to it.

image

Once we deployed a package, regardless if it’s web role or worker role, regardless if you specified a public endpoint or not, Windows Azure will assign an internal IP address and a public IP address. They will be linked in the cloud service load balancer. The public address is the Virtual IP (VIP) we are talking right now.

For example, once we deployed a role with one instance, there will be an internal IP assigned to that virtual machine. In this case it’s 192.168.10.2. Then the load balancer will received a VIP assigned, in this case it’s 137.116.164.23, and linked with the virtual machine.

image

If we increase the instance count of this role, windows azure will create another virtual machine and host our role, assigned a new internal IP. And it will be linked to the load balancer with the existing VIP. So that in this case, the incoming request will arrive at the VIP, then be routed to one of the instances.

Also, if you update the package (application) you deployed in the cloud service, it will NOT cause the VIP changed as you still make the deployment exists on the instances.

image

The VIP will NOT be changed if any of the instance was crashed, reallocated or hard failure. Windows Azure will create another instance in cases above, deploy the application and link the new instance to the load balancer. So user can still visit it through the VIP. Furthermore, VIP will NOT be changed even though all instances are crashed at the same time.

This will only be happened if you specifically assign all instances in one fault domain.

image

But, if you removed the deployment from the cloud service, Windows Azure will release all instances you have, then release the VIP the load balancer has. This is the only way you get your VIP lost.

image

 

How to Keep My VIP

Based on the description above, if we keep the deployment exists in our cloud service, we will never loss our VIP. This should be very easy if the application is running on Windows Azure. Just make sure not delete the deployment, the VIP will be stable.

But if we wanted to update our application, which means deploy a new version in the cloud service, what we should do to ensure the VIP will not be changed? So let’s have a look on how to do it through Visual Studio.

Firstly I created a cloud service and deployed a web role. Then we can see the VIP was assigned from the portal.

01

Then in Visual Studio we trigger another deployment. Just make sure that in the publish wizard we have the “Deployment update” checked and “Delete deployment on failure” unchecked. This means

1, Deployment the package by updating on the existing deployments, instead of deleting the exist deployment and redeploy the new one.

2, Do not delete the deployment if this new one was failed.

02

Also, clicked the “Settings” link next to the “Deployment update” box and make sure the “If deployment can’t be updated, do a full deployment” is unchecked. This ensures that, never do a full deployment (delete and redeploy) even though failed or not possible.

03

After confirmed these configuration, we can start the deployment and the VIP was not changed as you see below.

04

In Visual Studio Windows Azure Activities Log window, we can see that the package was been uploaded but there’s not step for virtual machine creation. This our deployment was safe and the VIP was not changed.

05

If we unchecked the “Deployment update” setting and trigger an new one, the log window will be like this. And we will find that the steps for virtual machine creation, starting, etc..

06

This means Visual Studio deleted our existing deployment and the VIP was lost.

07

And the good news was, the VIP will never been changed even though

- The instance count was changed (increased or decreased).

- The VM size was changed.

- The role was changed (added, removed).

- The guest OS was changed.

- The endpoint was changed (public or internal, added or removed).

 

More Stable VIP Solution

Even though we knew that the VIP will not be changed unless the deployment was deleted, we still might need some architecture consideration to make the VIP more stable.

For example, the case I mentioned at the beginning of this post, if we need to bind our application with some external service with IP and it’s very difficult to change the binding, we need to do more to ensure the VIP will not be changed even though we deleted the deployment.

The solution is, we need to firstly identify the minimum component related with the external service, and separated it into a dedicate cloud service.

image

With this architecture, we can bind the VIP of the “stable” cloud service to the external service, and the main logic of our application will be in another “dynamic” cloud service. In this way the deployment of the “stable” cloud service almost never been changed since it only contains the minimum logic to communicate with the external service. For example in my case, it only contains the logic to send SMS.

Then we can update our application, change the code, redeploy and even delete the deployment in the “dynamic” cloud service. The VIP of this cloud service might be changed, but it will not affect the “stable” one.

image

 

Summary

In this post I described the policy Windows Azure grand, keep and release the Virtual VIP of a cloud service. And I also explained how to make sure the VIP will not change when publishing through Visual Studio. And finally I introduced an architecture solution to make the sensitive VIP more stable even though the main application cloud service VIP was changed.

 

Hope this helps.

Shaun

All documents and related graphics, codes are provided "AS IS" without warranty of any kind.
Copyright © Shaun Ziyan Xu. This work is licensed under the Creative Commons License.

 

Many people are using Windows Azure Blob Storage to store their data in the cloud. Blob storage provides 99.9% availability with easy-to-use API through .NET SDK and HTTP REST. For example, we can store JavaScript files, images, documents in blob storage when we are building an ASP.NET web application on a Web Role in Windows Azure. Or we can store our VHD files in blob and mount it as a hard drive in our cloud service.

If you are familiar with Windows Azure, you should know that there are two kinds of blob: page blob and block blob. The page blob is optimized for random read and write, which is very useful when you need to store VHD files. The block blob is optimized for sequential/chunk read and write, which has more common usage. Since we can upload block blob in blocks through BlockBlob.PutBlock, and them commit them as a whole blob with invoking the BlockBlob.PutBlockList, it is very powerful to upload large files, as we can upload blocks in parallel, and provide pause-resume feature.

There are many documents, articles and blog posts described on how to upload a block blob. Most of them are focus on the server side, which means when you had received a big file, stream or binaries, how to upload them into blob storage in blocks through .NET SDK.  But the problem is, how can we upload these large files from client side, for example, a browser.

This questioned to me when I was working with a Chinese customer to help them build a network disk production on top of azure. The end users upload their files from the web portal, and then the files will be stored in blob storage from the Web Role. My goal is to find the best way to transform the file from client (end user’s machine) to the server (Web Role) through browser. In this post I will demonstrate and describe what I had done, to upload large file in chunks with high speed, and save them as blocks into Windows Azure Blob Storage.

 

Traditional Upload, Works with Limitation

The simplest way to implement this requirement is to create a web page with a form that contains a file input element and a submit button.

   1: @using (Html.BeginForm("About", "Index", FormMethod.Post, new { enctype = "multipart/form-data" }))
   2: {
   3:     <input type="file" name="file" />
   4:     <input type="submit" value="upload" /> 
   5: }

And then in the backend controller, we retrieve the whole content of this file and upload it in to the blob storage through .NET SDK. We can split the file in blocks and upload them in parallel and commit. The code had been well blogged in the community.

   1: [HttpPost]
   2: public ActionResult About(HttpPostedFileBase file)
   3: {
   4:     var container = _client.GetContainerReference("test");
   5:     container.CreateIfNotExists();
   6:     var blob = container.GetBlockBlobReference(file.FileName);
   7:     var blockDataList = new Dictionary<string, byte[]>();
   8:     using (var stream = file.InputStream)
   9:     {
  10:         var blockSizeInKB = 1024;
  11:         var offset = 0;
  12:         var index = 0;
  13:         while (offset < stream.Length)
  14:         {
  15:             var readLength = Math.Min(1024 * blockSizeInKB, (int)stream.Length - offset);
  16:             var blockData = new byte[readLength];
  17:             offset += stream.Read(blockData, 0, readLength);
  18:             blockDataList.Add(Convert.ToBase64String(BitConverter.GetBytes(index)), blockData);
  19:  
  20:             index++;
  21:         }
  22:     }
  23:  
  24:     Parallel.ForEach(blockDataList, (bi) =>
  25:     {
  26:         blob.PutBlock(bi.Key, new MemoryStream(bi.Value), null);
  27:     });
  28:     blob.PutBlockList(blockDataList.Select(b => b.Key).ToArray());
  29:  
  30:     return RedirectToAction("About");
  31: }

This works perfect if we selected an image, a music or a small video to upload. But if I selected a large file, let’s say a 6GB HD-movie, after upload for about few minutes the page will be shown as below and the upload will be terminated.

image

In ASP.NET there is a limitation of request length and the maximized request length is defined in the web.config file. It’s a number which less than about 4GB. So if we want to upload a really big file, we cannot simply implement in this way. Also, in Windows Azure, a cloud service network load balancer will terminate the connection if exceed the timeout period. From my test the timeout looks like 2 - 3 minutes. Hence, when we need to upload a large file we cannot just use the basic HTML elements.

Besides the limitation mentioned above, the simple HTML file upload cannot provide rich upload experience such as chunk upload, pause and pause-resume. So we need to find a better way to upload large file from the client to the server.

 

Upload in Chunks through HTML5 and JavaScript

In order to break those limitation mentioned above we will try to upload the large file in chunks. This takes some benefit to us such as

- No request size limitation: Since we upload in chunks, we can define the request size for each chunks regardless how big the entire file is.

- No timeout problem: The size of chunks are controlled by us, which means we should be able to make sure request for each chunk upload will not exceed the timeout period of both ASP.NET and Windows Azure load balancer.

It was a big challenge to upload big file in chunks until we have HTML5. There are some new features and improvements introduced in HTML5 and we will use them to implement our solution.

 

In HTML5, the File interface had been improved with a new method called “slice”. It can be used to read part of the file by specifying the start byte index and the end byte index. For example if the entire file was 1024 bytes, file.slice(512, 768) will read the part of this file from the 512nd byte to 768th byte, and return a new object of interface called "Blob”, which you can treat as an array of bytes.

In fact,  a Blob object represents a file-like object of immutable, raw data. The File interface is based on Blob, inheriting blob functionality and expanding it to support files on the user's system. For more information about the Blob please refer here.

File and Blob is very useful to implement the chunk upload. We will use File interface to represent the file the user selected from the browser and then use File.slice to read the file in chunks in the size we wanted. For example, if we wanted to upload a 10MB file with 512KB chunks, then we can read it in 512KB blobs by using File.slice in a loop.

 

Assuming we have a web page as below. User can select a file, an input box to specify the block size in KB and a button to start upload.

   1: <div>
   2:     <input type="file" id="upload_files" name="files[]" /><br />
   3:     Block Size: <input type="number" id="block_size" value="512" name="block_size" />KB<br />
   4:     <input type="button" id="upload_button_blob" name="upload" value="upload (blob)" />
   5: </div>

Then we can have the JavaScript function to upload the file in chunks when user clicked the button.

   1: <script type="text/javascript">
   1:     
   2:     $(function () {
   3:         $("#upload_button_blob").click(function () {
   4:         });
   5:     });
</script>

Firstly we need to ensure the client browser supports the interfaces we are going to use. Just try to invoke the File, Blob and FormData from the “window” object. If any of them is “undefined” the condition result will be “false” which means your browser doesn’t support these premium feature and it’s time for you to get your browser updated.

FormData is another new feature we are going to use in the future. It could generate a temporary form for us. We will use this interface to create a form with chunk and associated metadata when invoked the service through ajax.

   1: $("#upload_button_blob").click(function () {
   2:     // assert the browser support html5
   3:     if (window.File && window.Blob && window.FormData) {
   4:         alert("Your brwoser is awesome, let's rock!");
   5:     }
   6:     else {
   7:         alert("Oh man plz update to a modern browser before try is cool stuff out.");
   8:         return;
   9:     }
  10: });

Each browser supports these interfaces by their own implementation and currently the Blob, File and File.slice are supported by Chrome 21, FireFox 13, IE 10, Opera 12 and Safari 5.1 or higher.

After that we worked on the files the user selected one by one since in HTML5, user can select multiple files in one file input box.

   1: var files = $("#upload_files")[0].files;
   2: for (var i = 0; i < files.length; i++) {
   3:     var file = files[i];
   4:     var fileSize = file.size;
   5:     var fileName = file.name;
   6: }

Next, we calculated the start index and end index for each chunks based on the size the user specified from the browser. We put them into an array with the file name and the index, which will be used when we upload chunks into Windows Azure Blob Storage as blocks since we need to specify the target blob name and the block index.

At the same time we will store the list of all indexes into another variant which will be used to commit blocks into blob in Azure Storage once all chunks had been uploaded successfully.

   1: $("#upload_button_blob").click(function () {
   2:     // assert the browser support html5
   3:     ... ...
   4:     // start to upload each files in chunks
   5:     var files = $("#upload_files")[0].files;
   6:     for (var i = 0; i < files.length; i++) {
   7:         var file = files[i];
   8:         var fileSize = file.size;
   9:         var fileName = file.name;
  10:  
  11:         // calculate the start and end byte index for each blocks(chunks)
  12:         // with the index, file name and index list for future using
  13:         var blockSizeInKB = $("#block_size").val();
  14:         var blockSize = blockSizeInKB * 1024;
  15:         var blocks = [];
  16:         var offset = 0;
  17:         var index = 0;
  18:         var list = "";
  19:         while (offset < fileSize) {
  20:             var start = offset;
  21:             var end = Math.min(offset + blockSize, fileSize);
  22:  
  23:             blocks.push({
  24:                 name: fileName,
  25:                 index: index,
  26:                 start: start,
  27:                 end: end
  28:             });
  29:             list += index + ",";
  30:  
  31:             offset = end;
  32:             index++;
  33:         }
  34:     }
  35: });

Now we have all chunks’ information ready. The next step should be upload them one by one to the server side, and at the server side when received a chunk it will upload as a block into Blob Storage, and finally commit them with the index list through BlockBlobClient.PutBlockList. But since all these invokes are ajax calling, which means not synchronized call. So we need to introduce a new JavaScript library to help us coordinate the asynchronize operation, which named “async.js”.

You can download this JavaScript library here, and you can find the document here.

I will not explain this library too much in this post. We will put all procedures we want to execute as a function array, and pass into the proper function defined in async.js to let it help us to control the execution sequence, in series or in parallel. Hence we will define an array and put the function for chunk upload into this array.

   1: $("#upload_button_blob").click(function () {
   2:     // assert the browser support html5
   3:     ... ...
   4:  
   5:     // start to upload each files in chunks
   6:     var files = $("#upload_files")[0].files;
   7:     for (var i = 0; i < files.length; i++) {
   8:         var file = files[i];
   9:         var fileSize = file.size;
  10:         var fileName = file.name;
  11:         // calculate the start and end byte index for each blocks(chunks)
  12:         // with the index, file name and index list for future using
  13:         ... ...
  14:  
  15:         // define the function array and push all chunk upload operation into this array
  16:         blocks.forEach(function (block) {
  17:             putBlocks.push(function (callback) {
  18:             });
  19:         });
  20:     }
  21: });
  22:         });

As you can see, I used File.slice method to read each chunks based on the start and end byte index we calculated previously, and constructed a temporary HTML form with the file name, chunk index and chunk data through another new feature in HTML5 named FormData. Then post this form to the backend server through jQuery.ajax. This is the key part of our solution.

   1: $("#upload_button_blob").click(function () {
   2:     // assert the browser support html5
   3:     ... ...
   4:     // start to upload each files in chunks
   5:     var files = $("#upload_files")[0].files;
   6:     for (var i = 0; i < files.length; i++) {
   7:         var file = files[i];
   8:         var fileSize = file.size;
   9:         var fileName = file.name;
  10:         // calculate the start and end byte index for each blocks(chunks)
  11:         // with the index, file name and index list for future using
  12:         ... ...
  13:         // define the function array and push all chunk upload operation into this array
  14:         blocks.forEach(function (block) {
  15:             putBlocks.push(function (callback) {
  16:                 // load blob based on the start and end index for each chunks
  17:                 var blob = file.slice(block.start, block.end);
  18:                 // put the file name, index and blob into a temporary from
  19:                 var fd = new FormData();
  20:                 fd.append("name", block.name);
  21:                 fd.append("index", block.index);
  22:                 fd.append("file", blob);
  23:                 // post the form to backend service (asp.net mvc controller action)
  24:                 $.ajax({
  25:                     url: "/Home/UploadInFormData",
  26:                     data: fd,
  27:                     processData: false,
  28:                     contentType: "multipart/form-data",
  29:                     type: "POST",
  30:                     success: function (result) {
  31:                         if (!result.success) {
  32:                             alert(result.error);
  33:                         }
  34:                         callback(null, block.index);
  35:                     }
  36:                 });
  37:             });
  38:         });
  39:     }
  40: });

Then we will invoke these functions one by one by using the async.js. And once all functions had been executed successfully I invoked another ajax call to the backend service to commit all these chunks (blocks) as the blob in Windows Azure Storage.

   1: $("#upload_button_blob").click(function () {
   2:     // assert the browser support html5
   3:     ... ...
   4:     // start to upload each files in chunks
   5:     var files = $("#upload_files")[0].files;
   6:     for (var i = 0; i < files.length; i++) {
   7:         var file = files[i];
   8:         var fileSize = file.size;
   9:         var fileName = file.name;
  10:         // calculate the start and end byte index for each blocks(chunks)
  11:         // with the index, file name and index list for future using
  12:         ... ...
  13:         // define the function array and push all chunk upload operation into this array
  14:         ... ...
  15:         // invoke the functions one by one
  16:         // then invoke the commit ajax call to put blocks into blob in azure storage
  17:         async.series(putBlocks, function (error, result) {
  18:             var data = {
  19:                 name: fileName,
  20:                 list: list
  21:             };
  22:             $.post("/Home/Commit", data, function (result) {
  23:                 if (!result.success) {
  24:                     alert(result.error);
  25:                 }
  26:                 else {
  27:                     alert("done!");
  28:                 }
  29:             });
  30:         });
  31:     }
  32: });

That’s all in the client side. The outline of our logic would be

- Calculate the start and end byte index for each chunks based on the block size.

- Defined the functions of reading the chunk form file and upload the content to the backend service through ajax.

- Execute the functions defined in previous step with “async.js”.

- Commit the chunks by invoking the backend service in Windows Azure Storage finally.

 

Save Chunks as Blocks into Blob Storage

In above we finished the client size JavaScript code. It uploaded the file in chunks to the backend service which we are going to implement in this step. We will use ASP.NET MVC as our backend service, and it will receive the chunks, upload into Windows Azure Bob Storage in blocks, then finally commit as one blob.

As in the client side we uploaded chunks by invoking the ajax call to the URL "/Home/UploadInFormData", I created a new action under the Index controller and it only accepts HTTP POST request.

   1: [HttpPost]
   2: public JsonResult UploadInFormData()
   3: {
   4:     var error = string.Empty;
   5:     try
   6:     {
   7:     }
   8:     catch (Exception e)
   9:     {
  10:         error = e.ToString();
  11:     }
  12:  
  13:     return new JsonResult()
  14:     {
  15:         Data = new
  16:         {
  17:             success = string.IsNullOrWhiteSpace(error),
  18:             error = error
  19:         }
  20:     };
  21: }

Then I retrieved the file name, index and the chunk content from the Request.Form object, which was passed from our client side. And then, used the Windows Azure SDK to create a blob container (in this case we will use the container named “test”.) and create a blob reference with the blob name (same as the file name). Then uploaded the chunk as a block of this blob with the index, since in Blob Storage each block must have an index (ID) associated with so that finally we can put all blocks as one blob by specifying their block ID list.

   1: [HttpPost]
   2: public JsonResult UploadInFormData()
   3: {
   4:     var error = string.Empty;
   5:     try
   6:     {
   7:         var name = Request.Form["name"];
   8:         var index = int.Parse(Request.Form["index"]);
   9:         var file = Request.Files[0];
  10:         var id = Convert.ToBase64String(BitConverter.GetBytes(index));
  11:  
  12:         var container = _client.GetContainerReference("test");
  13:         container.CreateIfNotExists();
  14:         var blob = container.GetBlockBlobReference(name);
  15:         blob.PutBlock(id, file.InputStream, null);
  16:     }
  17:     catch (Exception e)
  18:     {
  19:         error = e.ToString();
  20:     }
  21:  
  22:     return new JsonResult()
  23:     {
  24:         Data = new
  25:         {
  26:             success = string.IsNullOrWhiteSpace(error),
  27:             error = error
  28:         }
  29:     };
  30: }

Next, I created another action to commit the blocks into blob once all chunks had been uploaded. Similarly, I retrieved the blob name from the Request.Form. I also retrieved the chunks ID list, which is the block ID list from the Request.Form in a string format, split them as a list, then invoked the BlockBlob.PutBlockList method. After that our blob will be shown in the container and ready to be download.

   1: [HttpPost]
   2: public JsonResult Commit()
   3: {
   4:     var error = string.Empty;
   5:     try
   6:     {
   7:         var name = Request.Form["name"];
   8:         var list = Request.Form["list"];
   9:         var ids = list
  10:             .Split(',')
  11:             .Where(id => !string.IsNullOrWhiteSpace(id))
  12:             .Select(id => Convert.ToBase64String(BitConverter.GetBytes(int.Parse(id))))
  13:             .ToArray();
  14:  
  15:         var container = _client.GetContainerReference("test");
  16:         container.CreateIfNotExists();
  17:         var blob = container.GetBlockBlobReference(name);
  18:         blob.PutBlockList(ids);
  19:     }
  20:     catch (Exception e)
  21:     {
  22:         error = e.ToString();
  23:     }
  24:  
  25:     return new JsonResult()
  26:     {
  27:         Data = new
  28:         {
  29:             success = string.IsNullOrWhiteSpace(error),
  30:             error = error
  31:         }
  32:     };
  33: }

Now we finished all code we need. The whole process of uploading would be like this below.

image

Below is the full client side JavaScript code.

   1: <script type="text/javascript" src="~/Scripts/async.js"></script>
   2: <script type="text/javascript">
   3:     $(function () {
   4:         $("#upload_button_blob").click(function () {
   5:             // assert the browser support html5
   6:             if (window.File && window.Blob && window.FormData) {
   7:                 alert("Your brwoser is awesome, let's rock!");
   8:             }
   9:             else {
  10:                 alert("Oh man plz update to a modern browser before try is cool stuff out.");
  11:                 return;
  12:             }
  13:  
  14:             // start to upload each files in chunks
  15:             var files = $("#upload_files")[0].files;
  16:             for (var i = 0; i < files.length; i++) {
  17:                 var file = files[i];
  18:                 var fileSize = file.size;
  19:                 var fileName = file.name;
  20:  
  21:                 // calculate the start and end byte index for each blocks(chunks)
  22:                 // with the index, file name and index list for future using
  23:                 var blockSizeInKB = $("#block_size").val();
  24:                 var blockSize = blockSizeInKB * 1024;
  25:                 var blocks = [];
  26:                 var offset = 0;
  27:                 var index = 0;
  28:                 var list = "";
  29:                 while (offset < fileSize) {
  30:                     var start = offset;
  31:                     var end = Math.min(offset + blockSize, fileSize);
  32:  
  33:                     blocks.push({
  34:                         name: fileName,
  35:                         index: index,
  36:                         start: start,
  37:                         end: end
  38:                     });
  39:                     list += index + ",";
  40:  
  41:                     offset = end;
  42:                     index++;
  43:                 }
  44:  
  45:                 // define the function array and push all chunk upload operation into this array
  46:                 var putBlocks = [];
  47:                 blocks.forEach(function (block) {
  48:                     putBlocks.push(function (callback) {
  49:                         // load blob based on the start and end index for each chunks
  50:                         var blob = file.slice(block.start, block.end);
  51:                         // put the file name, index and blob into a temporary from
  52:                         var fd = new FormData();
  53:                         fd.append("name", block.name);
  54:                         fd.append("index", block.index);
  55:                         fd.append("file", blob);
  56:                         // post the form to backend service (asp.net mvc controller action)
  57:                         $.ajax({
  58:                             url: "/Home/UploadInFormData",
  59:                             data: fd,
  60:                             processData: false,
  61:                             contentType: "multipart/form-data",
  62:                             type: "POST",
  63:                             success: function (result) {
  64:                                 if (!result.success) {
  65:                                     alert(result.error);
  66:                                 }
  67:                                 callback(null, block.index);
  68:                             }
  69:                         });
  70:                     });
  71:                 });
  72:  
  73:                 // invoke the functions one by one
  74:                 // then invoke the commit ajax call to put blocks into blob in azure storage
  75:                 async.series(putBlocks, function (error, result) {
  76:                     var data = {
  77:                         name: fileName,
  78:                         list: list
  79:                     };
  80:                     $.post("/Home/Commit", data, function (result) {
  81:                         if (!result.success) {
  82:                             alert(result.error);
  83:                         }
  84:                         else {
  85:                             alert("done!");
  86:                         }
  87:                     });
  88:                 });
  89:             }
  90:         });
  91:     });
  92: </script>

And below is the full ASP.NET MVC controller code.

   1: public class HomeController : Controller
   2: {
   3:     private CloudStorageAccount _account;
   4:     private CloudBlobClient _client;
   5:  
   6:     public HomeController()
   7:         : base()
   8:     {
   9:         _account = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("DataConnectionString"));
  10:         _client = _account.CreateCloudBlobClient();
  11:     }
  12:  
  13:     public ActionResult Index()
  14:     {
  15:         ViewBag.Message = "Modify this template to jump-start your ASP.NET MVC application.";
  16:  
  17:         return View();
  18:     }
  19:  
  20:     [HttpPost]
  21:     public JsonResult UploadInFormData()
  22:     {
  23:         var error = string.Empty;
  24:         try
  25:         {
  26:             var name = Request.Form["name"];
  27:             var index = int.Parse(Request.Form["index"]);
  28:             var file = Request.Files[0];
  29:             var id = Convert.ToBase64String(BitConverter.GetBytes(index));
  30:  
  31:             var container = _client.GetContainerReference("test");
  32:             container.CreateIfNotExists();
  33:             var blob = container.GetBlockBlobReference(name);
  34:             blob.PutBlock(id, file.InputStream, null);
  35:         }
  36:         catch (Exception e)
  37:         {
  38:             error = e.ToString();
  39:         }
  40:  
  41:         return new JsonResult()
  42:         {
  43:             Data = new
  44:             {
  45:                 success = string.IsNullOrWhiteSpace(error),
  46:                 error = error
  47:             }
  48:         };
  49:     }
  50:  
  51:     [HttpPost]
  52:     public JsonResult Commit()
  53:     {
  54:         var error = string.Empty;
  55:         try
  56:         {
  57:             var name = Request.Form["name"];
  58:             var list = Request.Form["list"];
  59:             var ids = list
  60:                 .Split(',')
  61:                 .Where(id => !string.IsNullOrWhiteSpace(id))
  62:                 .Select(id => Convert.ToBase64String(BitConverter.GetBytes(int.Parse(id))))
  63:                 .ToArray();
  64:  
  65:             var container = _client.GetContainerReference("test");
  66:             container.CreateIfNotExists();
  67:             var blob = container.GetBlockBlobReference(name);
  68:             blob.PutBlockList(ids);
  69:         }
  70:         catch (Exception e)
  71:         {
  72:             error = e.ToString();
  73:         }
  74:  
  75:         return new JsonResult()
  76:         {
  77:             Data = new
  78:             {
  79:                 success = string.IsNullOrWhiteSpace(error),
  80:                 error = error
  81:             }
  82:         };
  83:     }
  84: }

And if we selected a file from the browser we will see our application will upload chunks in the size we specified to the server through ajax call in background, and then commit all chunks in one blob.

image

Then we can find the blob in our Windows Azure Blob Storage.

image

 

Optimized by Parallel Upload

In previous example we just uploaded our file in chunks. This solved the problem that ASP.NET MVC request content size limitation as well as the Windows Azure load balancer timeout. But it might introduce the performance problem since we uploaded chunks in sequence. In order to improve the upload performance we could modify our client side code a bit to make the upload operation invoked in parallel.

The good news is that, “async.js” library provides the parallel execution function. If you remembered the code we invoke the service to upload chunks, it utilized “async.series” which means all functions will be executed in sequence. Now we will change this code to “async.parallel”. This will invoke all functions in parallel.

   1: $("#upload_button_blob").click(function () {
   2:     // assert the browser support html5
   3:     ... ...
   4:     // start to upload each files in chunks
   5:     var files = $("#upload_files")[0].files;
   6:     for (var i = 0; i < files.length; i++) {
   7:         var file = files[i];
   8:         var fileSize = file.size;
   9:         var fileName = file.name;
  10:         // calculate the start and end byte index for each blocks(chunks)
  11:         // with the index, file name and index list for future using
  12:         ... ...
  13:         // define the function array and push all chunk upload operation into this array
  14:         ... ...
  15:         // invoke the functions one by one
  16:         // then invoke the commit ajax call to put blocks into blob in azure storage
  17:         async.parallel(putBlocks, function (error, result) {
  18:             var data = {
  19:                 name: fileName,
  20:                 list: list
  21:             };
  22:             $.post("/Home/Commit", data, function (result) {
  23:                 if (!result.success) {
  24:                     alert(result.error);
  25:                 }
  26:                 else {
  27:                     alert("done!");
  28:                 }
  29:             });
  30:         });
  31:     }
  32: });

In this way all chunks will be uploaded to the server side at the same time to maximize the bandwidth usage.

image

This should work if the file was not very large and the chunk size was not very small. But for large file this might introduce another problem that too many ajax calls are sent to the server at the same time. So the best solution should be, upload the chunks in parallel with maximum concurrency limitation. The code below specified the concurrency limitation to 4, which means at the most only 4 ajax calls could be invoked at the same time.

   1: $("#upload_button_blob").click(function () {
   2:     // assert the browser support html5
   3:     ... ...
   4:     // start to upload each files in chunks
   5:     var files = $("#upload_files")[0].files;
   6:     for (var i = 0; i < files.length; i++) {
   7:         var file = files[i];
   8:         var fileSize = file.size;
   9:         var fileName = file.name;
  10:         // calculate the start and end byte index for each blocks(chunks)
  11:         // with the index, file name and index list for future using
  12:         ... ...
  13:         // define the function array and push all chunk upload operation into this array
  14:         ... ...
  15:         // invoke the functions one by one
  16:         // then invoke the commit ajax call to put blocks into blob in azure storage
  17:         async.parallelLimit(putBlocks, 4, function (error, result) {
  18:             var data = {
  19:                 name: fileName,
  20:                 list: list
  21:             };
  22:             $.post("/Home/Commit", data, function (result) {
  23:                 if (!result.success) {
  24:                     alert(result.error);
  25:                 }
  26:                 else {
  27:                     alert("done!");
  28:                 }
  29:             });
  30:         });
  31:     }
  32: });

 

Summary

In this post we discussed how to upload files in chunks to the backend service and then upload them into Windows Azure Blob Storage in blocks. We focused on the frontend side and leverage three new feature introduced in HTML 5 which are

- File.slice: Read part of the file by specifying the start and end byte index.

- Blob: File-like interface which contains the part of the file content.

- FormData: Temporary form element that we can pass the chunk alone with some metadata to the backend service.

Then we discussed the performance consideration of chunk uploading. Sequence upload cannot provide maximized upload speed, but the unlimited parallel upload might crash the browser and server if too many chunks. So we finally came up with the solution to upload chunks in parallel with the concurrency limitation.

We also demonstrated how to utilize “async.js” JavaScript library to help us control the asynchronize call and the parallel limitation.

 

Regarding the chunk size and the parallel limitation value there is no “best” value. You need to test vary composition and find out the best one for your particular scenario. It depends on the local bandwidth, client machine cores and the server side (Windows Azure Cloud Service Virtual Machine) cores, memory and bandwidth.

Below is one of my performance test result. The client machine was Windows 8 IE 10 with 4 cores. I was using Microsoft Cooperation Network. The web site was hosted on Windows Azure China North data center (in Beijing) with one small web role (1.7GB 1 core CPU, 1.75GB memory with 100Mbps bandwidth). The test cases were

- Chunk size: 512KB, 1MB, 2MB, 4MB.

- Upload Mode: Sequence, parallel (unlimited), parallel with limit (4 threads, 8 threads).

- Chunk Format: base64 string, binaries.

- Target file: 100MB.

- Each case was tested 3 times.

Below is the test result chart.

image

Some thoughts, but not guidance or best practice:

- Parallel gets better performance than series.

- No significant performance improvement between parallel 4 threads and 8 threads.

- Transform with binaries provides better performance than base64.

- In all cases, chunk size in 1MB - 2MB gets better performance.

 

Hope this helps,

Shaun

All documents and related graphics, codes are provided "AS IS" without warranty of any kind.
Copyright © Shaun Ziyan Xu. This work is licensed under the Creative Commons License.