Shaun Xu

The Sheep-Pen of the Shaun


News

logo

Shaun, the author of this blog is a semi-geek, clumsy developer, passionate speaker and incapable architect with about 10 years experience in .NET. He hopes to prove that software development is art rather than manufacturing. He's into cloud computing platform and technologies (Windows Azure, Aliyun) as well as WCF and ASP.NET MVC. Recently he's falling in love with JavaScript and Node.js.

Currently Shaun is working at IGT Technology Development (Beijing) Co., Ltd. as the architect responsible for product framework design and development.

MVP

My Stats

  • Posts - 97
  • Comments - 348
  • Trackbacks - 0

Tag Cloud


Recent Comments


Recent Posts


Archives


Post Categories



Many people are using Windows Azure Blob Storage to store their data in the cloud. Blob storage provides 99.9% availability with easy-to-use API through .NET SDK and HTTP REST. For example, we can store JavaScript files, images, documents in blob storage when we are building an ASP.NET web application on a Web Role in Windows Azure. Or we can store our VHD files in blob and mount it as a hard drive in our cloud service.

If you are familiar with Windows Azure, you should know that there are two kinds of blob: page blob and block blob. The page blob is optimized for random read and write, which is very useful when you need to store VHD files. The block blob is optimized for sequential/chunk read and write, which has more common usage. Since we can upload block blob in blocks through BlockBlob.PutBlock, and them commit them as a whole blob with invoking the BlockBlob.PutBlockList, it is very powerful to upload large files, as we can upload blocks in parallel, and provide pause-resume feature.

There are many documents, articles and blog posts described on how to upload a block blob. Most of them are focus on the server side, which means when you had received a big file, stream or binaries, how to upload them into blob storage in blocks through .NET SDK.  But the problem is, how can we upload these large files from client side, for example, a browser.

This questioned to me when I was working with a Chinese customer to help them build a network disk production on top of azure. The end users upload their files from the web portal, and then the files will be stored in blob storage from the Web Role. My goal is to find the best way to transform the file from client (end user’s machine) to the server (Web Role) through browser. In this post I will demonstrate and describe what I had done, to upload large file in chunks with high speed, and save them as blocks into Windows Azure Blob Storage.

 

Traditional Upload, Works with Limitation

The simplest way to implement this requirement is to create a web page with a form that contains a file input element and a submit button.

   1: @using (Html.BeginForm("About", "Index", FormMethod.Post, new { enctype = "multipart/form-data" }))
   2: {
   3:     <input type="file" name="file" />
   4:     <input type="submit" value="upload" /> 
   5: }

And then in the backend controller, we retrieve the whole content of this file and upload it in to the blob storage through .NET SDK. We can split the file in blocks and upload them in parallel and commit. The code had been well blogged in the community.

   1: [HttpPost]
   2: public ActionResult About(HttpPostedFileBase file)
   3: {
   4:     var container = _client.GetContainerReference("test");
   5:     container.CreateIfNotExists();
   6:     var blob = container.GetBlockBlobReference(file.FileName);
   7:     var blockDataList = new Dictionary<string, byte[]>();
   8:     using (var stream = file.InputStream)
   9:     {
  10:         var blockSizeInKB = 1024;
  11:         var offset = 0;
  12:         var index = 0;
  13:         while (offset < stream.Length)
  14:         {
  15:             var readLength = Math.Min(1024 * blockSizeInKB, (int)stream.Length - offset);
  16:             var blockData = new byte[readLength];
  17:             offset += stream.Read(blockData, 0, readLength);
  18:             blockDataList.Add(Convert.ToBase64String(BitConverter.GetBytes(index)), blockData);
  19:  
  20:             index++;
  21:         }
  22:     }
  23:  
  24:     Parallel.ForEach(blockDataList, (bi) =>
  25:     {
  26:         blob.PutBlock(bi.Key, new MemoryStream(bi.Value), null);
  27:     });
  28:     blob.PutBlockList(blockDataList.Select(b => b.Key).ToArray());
  29:  
  30:     return RedirectToAction("About");
  31: }

This works perfect if we selected an image, a music or a small video to upload. But if I selected a large file, let’s say a 6GB HD-movie, after upload for about few minutes the page will be shown as below and the upload will be terminated.

image

In ASP.NET there is a limitation of request length and the maximized request length is defined in the web.config file. It’s a number which less than about 4GB. So if we want to upload a really big file, we cannot simply implement in this way. Also, in Windows Azure, a cloud service network load balancer will terminate the connection if exceed the timeout period. From my test the timeout looks like 2 - 3 minutes. Hence, when we need to upload a large file we cannot just use the basic HTML elements.

Besides the limitation mentioned above, the simple HTML file upload cannot provide rich upload experience such as chunk upload, pause and pause-resume. So we need to find a better way to upload large file from the client to the server.

 

Upload in Chunks through HTML5 and JavaScript

In order to break those limitation mentioned above we will try to upload the large file in chunks. This takes some benefit to us such as

- No request size limitation: Since we upload in chunks, we can define the request size for each chunks regardless how big the entire file is.

- No timeout problem: The size of chunks are controlled by us, which means we should be able to make sure request for each chunk upload will not exceed the timeout period of both ASP.NET and Windows Azure load balancer.

It was a big challenge to upload big file in chunks until we have HTML5. There are some new features and improvements introduced in HTML5 and we will use them to implement our solution.

 

In HTML5, the File interface had been improved with a new method called “slice”. It can be used to read part of the file by specifying the start byte index and the end byte index. For example if the entire file was 1024 bytes, file.slice(512, 768) will read the part of this file from the 512nd byte to 768th byte, and return a new object of interface called "Blob”, which you can treat as an array of bytes.

In fact,  a Blob object represents a file-like object of immutable, raw data. The File interface is based on Blob, inheriting blob functionality and expanding it to support files on the user's system. For more information about the Blob please refer here.

File and Blob is very useful to implement the chunk upload. We will use File interface to represent the file the user selected from the browser and then use File.slice to read the file in chunks in the size we wanted. For example, if we wanted to upload a 10MB file with 512KB chunks, then we can read it in 512KB blobs by using File.slice in a loop.

 

Assuming we have a web page as below. User can select a file, an input box to specify the block size in KB and a button to start upload.

   1: <div>
   2:     <input type="file" id="upload_files" name="files[]" /><br />
   3:     Block Size: <input type="number" id="block_size" value="512" name="block_size" />KB<br />
   4:     <input type="button" id="upload_button_blob" name="upload" value="upload (blob)" />
   5: </div>

Then we can have the JavaScript function to upload the file in chunks when user clicked the button.

   1: <script type="text/javascript">
   1:     
   2:     $(function () {
   3:         $("#upload_button_blob").click(function () {
   4:         });
   5:     });
</script>

Firstly we need to ensure the client browser supports the interfaces we are going to use. Just try to invoke the File, Blob and FormData from the “window” object. If any of them is “undefined” the condition result will be “false” which means your browser doesn’t support these premium feature and it’s time for you to get your browser updated.

FormData is another new feature we are going to use in the future. It could generate a temporary form for us. We will use this interface to create a form with chunk and associated metadata when invoked the service through ajax.

   1: $("#upload_button_blob").click(function () {
   2:     // assert the browser support html5
   3:     if (window.File && window.Blob && window.FormData) {
   4:         alert("Your brwoser is awesome, let's rock!");
   5:     }
   6:     else {
   7:         alert("Oh man plz update to a modern browser before try is cool stuff out.");
   8:         return;
   9:     }
  10: });

Each browser supports these interfaces by their own implementation and currently the Blob, File and File.slice are supported by Chrome 21, FireFox 13, IE 10, Opera 12 and Safari 5.1 or higher.

After that we worked on the files the user selected one by one since in HTML5, user can select multiple files in one file input box.

   1: var files = $("#upload_files")[0].files;
   2: for (var i = 0; i < files.length; i++) {
   3:     var file = files[i];
   4:     var fileSize = file.size;
   5:     var fileName = file.name;
   6: }

Next, we calculated the start index and end index for each chunks based on the size the user specified from the browser. We put them into an array with the file name and the index, which will be used when we upload chunks into Windows Azure Blob Storage as blocks since we need to specify the target blob name and the block index.

At the same time we will store the list of all indexes into another variant which will be used to commit blocks into blob in Azure Storage once all chunks had been uploaded successfully.

   1: $("#upload_button_blob").click(function () {
   2:     // assert the browser support html5
   3:     ... ...
   4:     // start to upload each files in chunks
   5:     var files = $("#upload_files")[0].files;
   6:     for (var i = 0; i < files.length; i++) {
   7:         var file = files[i];
   8:         var fileSize = file.size;
   9:         var fileName = file.name;
  10:  
  11:         // calculate the start and end byte index for each blocks(chunks)
  12:         // with the index, file name and index list for future using
  13:         var blockSizeInKB = $("#block_size").val();
  14:         var blockSize = blockSizeInKB * 1024;
  15:         var blocks = [];
  16:         var offset = 0;
  17:         var index = 0;
  18:         var list = "";
  19:         while (offset < fileSize) {
  20:             var start = offset;
  21:             var end = Math.min(offset + blockSize, fileSize);
  22:  
  23:             blocks.push({
  24:                 name: fileName,
  25:                 index: index,
  26:                 start: start,
  27:                 end: end
  28:             });
  29:             list += index + ",";
  30:  
  31:             offset = end;
  32:             index++;
  33:         }
  34:     }
  35: });

Now we have all chunks’ information ready. The next step should be upload them one by one to the server side, and at the server side when received a chunk it will upload as a block into Blob Storage, and finally commit them with the index list through BlockBlobClient.PutBlockList. But since all these invokes are ajax calling, which means not synchronized call. So we need to introduce a new JavaScript library to help us coordinate the asynchronize operation, which named “async.js”.

You can download this JavaScript library here, and you can find the document here.

I will not explain this library too much in this post. We will put all procedures we want to execute as a function array, and pass into the proper function defined in async.js to let it help us to control the execution sequence, in series or in parallel. Hence we will define an array and put the function for chunk upload into this array.

   1: $("#upload_button_blob").click(function () {
   2:     // assert the browser support html5
   3:     ... ...
   4:  
   5:     // start to upload each files in chunks
   6:     var files = $("#upload_files")[0].files;
   7:     for (var i = 0; i < files.length; i++) {
   8:         var file = files[i];
   9:         var fileSize = file.size;
  10:         var fileName = file.name;
  11:         // calculate the start and end byte index for each blocks(chunks)
  12:         // with the index, file name and index list for future using
  13:         ... ...
  14:  
  15:         // define the function array and push all chunk upload operation into this array
  16:         blocks.forEach(function (block) {
  17:             putBlocks.push(function (callback) {
  18:             });
  19:         });
  20:     }
  21: });
  22:         });

As you can see, I used File.slice method to read each chunks based on the start and end byte index we calculated previously, and constructed a temporary HTML form with the file name, chunk index and chunk data through another new feature in HTML5 named FormData. Then post this form to the backend server through jQuery.ajax. This is the key part of our solution.

   1: $("#upload_button_blob").click(function () {
   2:     // assert the browser support html5
   3:     ... ...
   4:     // start to upload each files in chunks
   5:     var files = $("#upload_files")[0].files;
   6:     for (var i = 0; i < files.length; i++) {
   7:         var file = files[i];
   8:         var fileSize = file.size;
   9:         var fileName = file.name;
  10:         // calculate the start and end byte index for each blocks(chunks)
  11:         // with the index, file name and index list for future using
  12:         ... ...
  13:         // define the function array and push all chunk upload operation into this array
  14:         blocks.forEach(function (block) {
  15:             putBlocks.push(function (callback) {
  16:                 // load blob based on the start and end index for each chunks
  17:                 var blob = file.slice(block.start, block.end);
  18:                 // put the file name, index and blob into a temporary from
  19:                 var fd = new FormData();
  20:                 fd.append("name", block.name);
  21:                 fd.append("index", block.index);
  22:                 fd.append("file", blob);
  23:                 // post the form to backend service (asp.net mvc controller action)
  24:                 $.ajax({
  25:                     url: "/Home/UploadInFormData",
  26:                     data: fd,
  27:                     processData: false,
  28:                     contentType: "multipart/form-data",
  29:                     type: "POST",
  30:                     success: function (result) {
  31:                         if (!result.success) {
  32:                             alert(result.error);
  33:                         }
  34:                         callback(null, block.index);
  35:                     }
  36:                 });
  37:             });
  38:         });
  39:     }
  40: });

Then we will invoke these functions one by one by using the async.js. And once all functions had been executed successfully I invoked another ajax call to the backend service to commit all these chunks (blocks) as the blob in Windows Azure Storage.

   1: $("#upload_button_blob").click(function () {
   2:     // assert the browser support html5
   3:     ... ...
   4:     // start to upload each files in chunks
   5:     var files = $("#upload_files")[0].files;
   6:     for (var i = 0; i < files.length; i++) {
   7:         var file = files[i];
   8:         var fileSize = file.size;
   9:         var fileName = file.name;
  10:         // calculate the start and end byte index for each blocks(chunks)
  11:         // with the index, file name and index list for future using
  12:         ... ...
  13:         // define the function array and push all chunk upload operation into this array
  14:         ... ...
  15:         // invoke the functions one by one
  16:         // then invoke the commit ajax call to put blocks into blob in azure storage
  17:         async.series(putBlocks, function (error, result) {
  18:             var data = {
  19:                 name: fileName,
  20:                 list: list
  21:             };
  22:             $.post("/Home/Commit", data, function (result) {
  23:                 if (!result.success) {
  24:                     alert(result.error);
  25:                 }
  26:                 else {
  27:                     alert("done!");
  28:                 }
  29:             });
  30:         });
  31:     }
  32: });

That’s all in the client side. The outline of our logic would be

- Calculate the start and end byte index for each chunks based on the block size.

- Defined the functions of reading the chunk form file and upload the content to the backend service through ajax.

- Execute the functions defined in previous step with “async.js”.

- Commit the chunks by invoking the backend service in Windows Azure Storage finally.

 

Save Chunks as Blocks into Blob Storage

In above we finished the client size JavaScript code. It uploaded the file in chunks to the backend service which we are going to implement in this step. We will use ASP.NET MVC as our backend service, and it will receive the chunks, upload into Windows Azure Bob Storage in blocks, then finally commit as one blob.

As in the client side we uploaded chunks by invoking the ajax call to the URL "/Home/UploadInFormData", I created a new action under the Index controller and it only accepts HTTP POST request.

   1: [HttpPost]
   2: public JsonResult UploadInFormData()
   3: {
   4:     var error = string.Empty;
   5:     try
   6:     {
   7:     }
   8:     catch (Exception e)
   9:     {
  10:         error = e.ToString();
  11:     }
  12:  
  13:     return new JsonResult()
  14:     {
  15:         Data = new
  16:         {
  17:             success = string.IsNullOrWhiteSpace(error),
  18:             error = error
  19:         }
  20:     };
  21: }

Then I retrieved the file name, index and the chunk content from the Request.Form object, which was passed from our client side. And then, used the Windows Azure SDK to create a blob container (in this case we will use the container named “test”.) and create a blob reference with the blob name (same as the file name). Then uploaded the chunk as a block of this blob with the index, since in Blob Storage each block must have an index (ID) associated with so that finally we can put all blocks as one blob by specifying their block ID list.

   1: [HttpPost]
   2: public JsonResult UploadInFormData()
   3: {
   4:     var error = string.Empty;
   5:     try
   6:     {
   7:         var name = Request.Form["name"];
   8:         var index = int.Parse(Request.Form["index"]);
   9:         var file = Request.Files[0];
  10:         var id = Convert.ToBase64String(BitConverter.GetBytes(index));
  11:  
  12:         var container = _client.GetContainerReference("test");
  13:         container.CreateIfNotExists();
  14:         var blob = container.GetBlockBlobReference(name);
  15:         blob.PutBlock(id, file.InputStream, null);
  16:     }
  17:     catch (Exception e)
  18:     {
  19:         error = e.ToString();
  20:     }
  21:  
  22:     return new JsonResult()
  23:     {
  24:         Data = new
  25:         {
  26:             success = string.IsNullOrWhiteSpace(error),
  27:             error = error
  28:         }
  29:     };
  30: }

Next, I created another action to commit the blocks into blob once all chunks had been uploaded. Similarly, I retrieved the blob name from the Request.Form. I also retrieved the chunks ID list, which is the block ID list from the Request.Form in a string format, split them as a list, then invoked the BlockBlob.PutBlockList method. After that our blob will be shown in the container and ready to be download.

   1: [HttpPost]
   2: public JsonResult Commit()
   3: {
   4:     var error = string.Empty;
   5:     try
   6:     {
   7:         var name = Request.Form["name"];
   8:         var list = Request.Form["list"];
   9:         var ids = list
  10:             .Split(',')
  11:             .Where(id => !string.IsNullOrWhiteSpace(id))
  12:             .Select(id => Convert.ToBase64String(BitConverter.GetBytes(int.Parse(id))))
  13:             .ToArray();
  14:  
  15:         var container = _client.GetContainerReference("test");
  16:         container.CreateIfNotExists();
  17:         var blob = container.GetBlockBlobReference(name);
  18:         blob.PutBlockList(ids);
  19:     }
  20:     catch (Exception e)
  21:     {
  22:         error = e.ToString();
  23:     }
  24:  
  25:     return new JsonResult()
  26:     {
  27:         Data = new
  28:         {
  29:             success = string.IsNullOrWhiteSpace(error),
  30:             error = error
  31:         }
  32:     };
  33: }

Now we finished all code we need. The whole process of uploading would be like this below.

image

Below is the full client side JavaScript code.

   1: <script type="text/javascript" src="~/Scripts/async.js"></script>
   2: <script type="text/javascript">
   3:     $(function () {
   4:         $("#upload_button_blob").click(function () {
   5:             // assert the browser support html5
   6:             if (window.File && window.Blob && window.FormData) {
   7:                 alert("Your brwoser is awesome, let's rock!");
   8:             }
   9:             else {
  10:                 alert("Oh man plz update to a modern browser before try is cool stuff out.");
  11:                 return;
  12:             }
  13:  
  14:             // start to upload each files in chunks
  15:             var files = $("#upload_files")[0].files;
  16:             for (var i = 0; i < files.length; i++) {
  17:                 var file = files[i];
  18:                 var fileSize = file.size;
  19:                 var fileName = file.name;
  20:  
  21:                 // calculate the start and end byte index for each blocks(chunks)
  22:                 // with the index, file name and index list for future using
  23:                 var blockSizeInKB = $("#block_size").val();
  24:                 var blockSize = blockSizeInKB * 1024;
  25:                 var blocks = [];
  26:                 var offset = 0;
  27:                 var index = 0;
  28:                 var list = "";
  29:                 while (offset < fileSize) {
  30:                     var start = offset;
  31:                     var end = Math.min(offset + blockSize, fileSize);
  32:  
  33:                     blocks.push({
  34:                         name: fileName,
  35:                         index: index,
  36:                         start: start,
  37:                         end: end
  38:                     });
  39:                     list += index + ",";
  40:  
  41:                     offset = end;
  42:                     index++;
  43:                 }
  44:  
  45:                 // define the function array and push all chunk upload operation into this array
  46:                 var putBlocks = [];
  47:                 blocks.forEach(function (block) {
  48:                     putBlocks.push(function (callback) {
  49:                         // load blob based on the start and end index for each chunks
  50:                         var blob = file.slice(block.start, block.end);
  51:                         // put the file name, index and blob into a temporary from
  52:                         var fd = new FormData();
  53:                         fd.append("name", block.name);
  54:                         fd.append("index", block.index);
  55:                         fd.append("file", blob);
  56:                         // post the form to backend service (asp.net mvc controller action)
  57:                         $.ajax({
  58:                             url: "/Home/UploadInFormData",
  59:                             data: fd,
  60:                             processData: false,
  61:                             contentType: "multipart/form-data",
  62:                             type: "POST",
  63:                             success: function (result) {
  64:                                 if (!result.success) {
  65:                                     alert(result.error);
  66:                                 }
  67:                                 callback(null, block.index);
  68:                             }
  69:                         });
  70:                     });
  71:                 });
  72:  
  73:                 // invoke the functions one by one
  74:                 // then invoke the commit ajax call to put blocks into blob in azure storage
  75:                 async.series(putBlocks, function (error, result) {
  76:                     var data = {
  77:                         name: fileName,
  78:                         list: list
  79:                     };
  80:                     $.post("/Home/Commit", data, function (result) {
  81:                         if (!result.success) {
  82:                             alert(result.error);
  83:                         }
  84:                         else {
  85:                             alert("done!");
  86:                         }
  87:                     });
  88:                 });
  89:             }
  90:         });
  91:     });
  92: </script>

And below is the full ASP.NET MVC controller code.

   1: public class HomeController : Controller
   2: {
   3:     private CloudStorageAccount _account;
   4:     private CloudBlobClient _client;
   5:  
   6:     public HomeController()
   7:         : base()
   8:     {
   9:         _account = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("DataConnectionString"));
  10:         _client = _account.CreateCloudBlobClient();
  11:     }
  12:  
  13:     public ActionResult Index()
  14:     {
  15:         ViewBag.Message = "Modify this template to jump-start your ASP.NET MVC application.";
  16:  
  17:         return View();
  18:     }
  19:  
  20:     [HttpPost]
  21:     public JsonResult UploadInFormData()
  22:     {
  23:         var error = string.Empty;
  24:         try
  25:         {
  26:             var name = Request.Form["name"];
  27:             var index = int.Parse(Request.Form["index"]);
  28:             var file = Request.Files[0];
  29:             var id = Convert.ToBase64String(BitConverter.GetBytes(index));
  30:  
  31:             var container = _client.GetContainerReference("test");
  32:             container.CreateIfNotExists();
  33:             var blob = container.GetBlockBlobReference(name);
  34:             blob.PutBlock(id, file.InputStream, null);
  35:         }
  36:         catch (Exception e)
  37:         {
  38:             error = e.ToString();
  39:         }
  40:  
  41:         return new JsonResult()
  42:         {
  43:             Data = new
  44:             {
  45:                 success = string.IsNullOrWhiteSpace(error),
  46:                 error = error
  47:             }
  48:         };
  49:     }
  50:  
  51:     [HttpPost]
  52:     public JsonResult Commit()
  53:     {
  54:         var error = string.Empty;
  55:         try
  56:         {
  57:             var name = Request.Form["name"];
  58:             var list = Request.Form["list"];
  59:             var ids = list
  60:                 .Split(',')
  61:                 .Where(id => !string.IsNullOrWhiteSpace(id))
  62:                 .Select(id => Convert.ToBase64String(BitConverter.GetBytes(int.Parse(id))))
  63:                 .ToArray();
  64:  
  65:             var container = _client.GetContainerReference("test");
  66:             container.CreateIfNotExists();
  67:             var blob = container.GetBlockBlobReference(name);
  68:             blob.PutBlockList(ids);
  69:         }
  70:         catch (Exception e)
  71:         {
  72:             error = e.ToString();
  73:         }
  74:  
  75:         return new JsonResult()
  76:         {
  77:             Data = new
  78:             {
  79:                 success = string.IsNullOrWhiteSpace(error),
  80:                 error = error
  81:             }
  82:         };
  83:     }
  84: }

And if we selected a file from the browser we will see our application will upload chunks in the size we specified to the server through ajax call in background, and then commit all chunks in one blob.

image

Then we can find the blob in our Windows Azure Blob Storage.

image

 

Optimized by Parallel Upload

In previous example we just uploaded our file in chunks. This solved the problem that ASP.NET MVC request content size limitation as well as the Windows Azure load balancer timeout. But it might introduce the performance problem since we uploaded chunks in sequence. In order to improve the upload performance we could modify our client side code a bit to make the upload operation invoked in parallel.

The good news is that, “async.js” library provides the parallel execution function. If you remembered the code we invoke the service to upload chunks, it utilized “async.series” which means all functions will be executed in sequence. Now we will change this code to “async.parallel”. This will invoke all functions in parallel.

   1: $("#upload_button_blob").click(function () {
   2:     // assert the browser support html5
   3:     ... ...
   4:     // start to upload each files in chunks
   5:     var files = $("#upload_files")[0].files;
   6:     for (var i = 0; i < files.length; i++) {
   7:         var file = files[i];
   8:         var fileSize = file.size;
   9:         var fileName = file.name;
  10:         // calculate the start and end byte index for each blocks(chunks)
  11:         // with the index, file name and index list for future using
  12:         ... ...
  13:         // define the function array and push all chunk upload operation into this array
  14:         ... ...
  15:         // invoke the functions one by one
  16:         // then invoke the commit ajax call to put blocks into blob in azure storage
  17:         async.parallel(putBlocks, function (error, result) {
  18:             var data = {
  19:                 name: fileName,
  20:                 list: list
  21:             };
  22:             $.post("/Home/Commit", data, function (result) {
  23:                 if (!result.success) {
  24:                     alert(result.error);
  25:                 }
  26:                 else {
  27:                     alert("done!");
  28:                 }
  29:             });
  30:         });
  31:     }
  32: });

In this way all chunks will be uploaded to the server side at the same time to maximize the bandwidth usage.

image

This should work if the file was not very large and the chunk size was not very small. But for large file this might introduce another problem that too many ajax calls are sent to the server at the same time. So the best solution should be, upload the chunks in parallel with maximum concurrency limitation. The code below specified the concurrency limitation to 4, which means at the most only 4 ajax calls could be invoked at the same time.

   1: $("#upload_button_blob").click(function () {
   2:     // assert the browser support html5
   3:     ... ...
   4:     // start to upload each files in chunks
   5:     var files = $("#upload_files")[0].files;
   6:     for (var i = 0; i < files.length; i++) {
   7:         var file = files[i];
   8:         var fileSize = file.size;
   9:         var fileName = file.name;
  10:         // calculate the start and end byte index for each blocks(chunks)
  11:         // with the index, file name and index list for future using
  12:         ... ...
  13:         // define the function array and push all chunk upload operation into this array
  14:         ... ...
  15:         // invoke the functions one by one
  16:         // then invoke the commit ajax call to put blocks into blob in azure storage
  17:         async.parallelLimit(putBlocks, 4, function (error, result) {
  18:             var data = {
  19:                 name: fileName,
  20:                 list: list
  21:             };
  22:             $.post("/Home/Commit", data, function (result) {
  23:                 if (!result.success) {
  24:                     alert(result.error);
  25:                 }
  26:                 else {
  27:                     alert("done!");
  28:                 }
  29:             });
  30:         });
  31:     }
  32: });

 

Summary

In this post we discussed how to upload files in chunks to the backend service and then upload them into Windows Azure Blob Storage in blocks. We focused on the frontend side and leverage three new feature introduced in HTML 5 which are

- File.slice: Read part of the file by specifying the start and end byte index.

- Blob: File-like interface which contains the part of the file content.

- FormData: Temporary form element that we can pass the chunk alone with some metadata to the backend service.

Then we discussed the performance consideration of chunk uploading. Sequence upload cannot provide maximized upload speed, but the unlimited parallel upload might crash the browser and server if too many chunks. So we finally came up with the solution to upload chunks in parallel with the concurrency limitation.

We also demonstrated how to utilize “async.js” JavaScript library to help us control the asynchronize call and the parallel limitation.

 

Regarding the chunk size and the parallel limitation value there is no “best” value. You need to test vary composition and find out the best one for your particular scenario. It depends on the local bandwidth, client machine cores and the server side (Windows Azure Cloud Service Virtual Machine) cores, memory and bandwidth.

Below is one of my performance test result. The client machine was Windows 8 IE 10 with 4 cores. I was using Microsoft Cooperation Network. The web site was hosted on Windows Azure China North data center (in Beijing) with one small web role (1.7GB 1 core CPU, 1.75GB memory with 100Mbps bandwidth). The test cases were

- Chunk size: 512KB, 1MB, 2MB, 4MB.

- Upload Mode: Sequence, parallel (unlimited), parallel with limit (4 threads, 8 threads).

- Chunk Format: base64 string, binaries.

- Target file: 100MB.

- Each case was tested 3 times.

Below is the test result chart.

image

Some thoughts, but not guidance or best practice:

- Parallel gets better performance than series.

- No significant performance improvement between parallel 4 threads and 8 threads.

- Transform with binaries provides better performance than base64.

- In all cases, chunk size in 1MB - 2MB gets better performance.

 

Hope this helps,

Shaun

All documents and related graphics, codes are provided "AS IS" without warranty of any kind.
Copyright © Shaun Ziyan Xu. This work is licensed under the Creative Commons License.

Comments

Gravatar # re: Upload File to Windows Azure Blob in Chunks through ASP.NET MVC, JavaScript and HTML5
Posted by Thomas Mueller on 10/1/2013 11:51 PM
Do you have the complete code posted somewhere? I'm getting null values from Request.Form[] in UploadInFormData(), presumably because of the contentType: "multipart/form-data".
Gravatar # re: Upload File to Windows Azure Blob in Chunks through ASP.NET MVC, JavaScript and HTML5
Posted by Venkata Appana on 12/18/2013 12:58 AM
This is an awesome article depicting the uploading process step by step. I believe the pattern is same while downloading the file from the storage. Read by chunks from blob storage and appending file chunks in JavaScript.
Gravatar # re: Upload File to Windows Azure Blob in Chunks through ASP.NET MVC, JavaScript and HTML5
Posted by AxeOfMen on 3/1/2014 8:14 AM
If you are seeing an empty Request.Form[] structure then...
In the jquery ajax call, replace
contentType: "multipart/form-data",
with
contentType: false,
Gravatar # re: Upload File to Windows Azure Blob in Chunks through ASP.NET MVC, JavaScript and HTML5
Posted by Pieter-Jan on 4/25/2014 4:34 PM
When i upload multiple files, only the latest one will be committed to the backend. How can i fix this?
Post A Comment
Title:
Name:
Email:
Comment:
Verification: