Bruce Ge

  Home  |   Contact  |   Syndication    |   Login
  6 Posts | 0 Stories | 11 Comments | 0 Trackbacks

News

Archives

Post Categories

I was using Gzip Encoder to compress wcf message, it surprised me that sometimes the compression message size is even bigger than the original size, so I looked the code, I found within GZipMessageEncoderFactory.cs, the method "CompressBuffer" in the GZipMessageEncoderFactory class is not quite right. it was like this originally:

private static ArraySegment<byte> CompressBuffer(ArraySegment<byte> buffer, BufferManager bufferManager, int messageOffset)
{
   ....
  var byteArray = new ArraySegment<byte>(bufferedBytes, messageOffset,
                                                       bufferedBytes.Length - messageOffset);

  return byteArray;
}

if we use the method above, the count of the new ArraySegment<T> will be the bufferedBytes, as BufferManager using TakeBuffer method to grab a buffer which is the nearest bigger 2N byte, e.g. if input param is 522 bytes for TakeBuffer method, it will grab a buffer which has 1024 bytes. And the constructor of ArraySegment will take the third parameter(bold above) to reflects the actual size of the ArraySegment, if size of the bufferedBytes(first params) is less than third parameter, it will fill byte0 to the rest. that is why sometimes the size is even bigger than the original size. 

it should be like this:

private static ArraySegment<byte> CompressBuffer(ArraySegment<byte> buffer, BufferManager bufferManager, int messageOffset)
{
   ....
  var byteArray = new ArraySegment<byte>(bufferedBytes, messageOffset,
                       compressedBytes.Length);

  return byteArray;
}

Also I found the Gzip compression doesn't have high compression ratio, I decided to use 7zip and gzip combination then, I found a good article about 7zip: 7Zip (LZMA) In-Memory Compression with C#, after adding it to my project, I can use it straight forward:

private static ArraySegment<byte> CompressBuffer(ArraySegment<byte> buffer, BufferManager bufferManager, int messageOffset)
{
 using (var memoryStream = new MemoryStream())
 {
  var zipperStream = new GZipStream(memoryStream, CompressionMode.Compress, true);

  using (zipperStream)
   zipperStream.Write(buffer.Array,
      buffer.Offset, buffer.Count);

  byte[] compressedBytes1 = memoryStream.ToArray();
  byte[] compressedBytes = SevenZip.Compression.LZMA.SevenZipHelper.Compress(compressedBytes1);
  byte[] bufferedBytes = bufferManager.TakeBuffer(
     compressedBytes.Length + messageOffset);

  Array.Copy(compressedBytes, 0, bufferedBytes, messageOffset,
     compressedBytes.Length);

  bufferManager.ReturnBuffer(buffer.Array);
  var byteArray = new ArraySegment<byte>(bufferedBytes, messageOffset,
     compressedBytes.Length);

  return byteArray;
 }
}

//Helper method to decompress an array of bytes
private static ArraySegment<byte> DecompressBuffer(ArraySegment<byte> buffer, BufferManager bufferManager)
{
 var memoryStream1 = new MemoryStream(buffer.Array, buffer.Offset, buffer.Count - buffer.Offset);
 var memoryStream=new MemoryStream(SevenZip.Compression.LZMA.SevenZipHelper.Decompress(memoryStream1.ToArray()));

 var decompressedStream = new MemoryStream();
 int totalRead = 0;
 int blockSize = 1024;
 byte[] tempBuffer = bufferManager.TakeBuffer(blockSize);
 using (var gzStream = new GZipStream(memoryStream, CompressionMode.Decompress))
 {
  while (true)
  {
   int bytesRead = gzStream.Read(tempBuffer, 0, blockSize);
   if (bytesRead == 0)
    break;
   decompressedStream.Write(tempBuffer, 0, bytesRead);
   totalRead += bytesRead;
  }
 }
 bufferManager.ReturnBuffer(tempBuffer);
  
 byte[] decompressedBytes = decompressedStream.ToArray();
 byte[] bufferManagerBuffer = bufferManager.TakeBuffer(decompressedBytes.Length + buffer.Offset);
 Array.Copy(buffer.Array, 0, bufferManagerBuffer, 0, buffer.Offset);
 Array.Copy(decompressedBytes, 0, bufferManagerBuffer, buffer.Offset, decompressedBytes.Length);

 var byteArray = new ArraySegment<byte>(bufferManagerBuffer, buffer.Offset, decompressedBytes.Length);
 bufferManager.ReturnBuffer(buffer.Array);

 return byteArray;
}

 

 

  • Share This Post:
  • Share on Twitter
  • Share on Facebook
  • Share on Technorati
posted on Tuesday, October 27, 2009 4:20 PM

Feedback

# re: WCF Message Compression - Gzip, 7zip 4/29/2010 6:20 AM Matt G
I haven't tried your LZMA compression - interesting! Do you have any figures on how much the additional compression improved things for your test case?

However, I did the *exact* same thing you did on the byteArray (used compressedBytes.Length instead of their original calculation) and it caused some intermittent errors on the server side:

System.ArgumentOutOfRangeException: The space needed for encoding (2 bytes) exceeds the message frame offset.

The # of bytes varied on packet size. Kind of a bummer because using your change improved the compression from 18.6% of original size to 13.8% of original size for us. We use the compressed TCP channel for Sync ...our Sync case is such that we do a lot of download and bidirectional tables and can pull ~280 Mb from server to client.

I suppose some of the extra space is used by WCF to encode the final information, but I'm not a WCF pro. Just a heads up.

# re: WCF Message Compression - Gzip, 7zip 5/20/2010 3:20 AM Matt G
I'll answer my own question on the compression size: it varies on your data.
I ran a few tests against a 226 Mb dataset of ours
a) Gzip compression only - 28% of original size, 63.3 Mb, 4 min to compress.
b) 7Zip compression only - 15% of original size, 32.9 Mb, 27 minutes to compress.
c) Gzip, then 7Zip compression - 26% of original size, 59.7 Mb, 5 min to compress.

So in my scenario, Gzip is the fastest but biggest, 7Zip is the slowest but compresses the best...but is Gzip then 7Zip worth it? Depends on your dataset and app needs. To check this, I broke down GZip+7Zip compression further to particular database tables - 7Zip compressed most of the Gzip data further by about 3-4%, but two of my six tables compressed down further by 16.8% and 25% respectively. Both the tables tend to have similar results in their rows rather than the mishmash of data in the other four.

I was surprised to find out that Gzip+7Zip is faster than 7Zip alone, but I suspect that gzipping the data first makes it more difficult/impossible for 7Zip to further compress, but I don't know for sure.

(Note that in my testing, I used the LZMA package linked above and whatever default settings were inside SevenZipHelper.)

Post A Comment
Title:
Name:
Email:
Website:
Comment:
Verification: