# Running with Code

Like scissors, only more dangerous

Home  |   Contact  |   Syndication    |   Login
 79 Posts | 0 Stories | 173 Comments | 2 Trackbacks

### Misc

Thursday, January 6, 2011 #

My new blog is at http://robpaveza.net - it occurred to me a little late that I probably should post this.  Go check it out!

-Rob

Tuesday, January 19, 2010 #

One of the problems that a large part of the a certain gaming community has understood over the years has been one of version checking.  A common, though now older, method of version checking among this community has been to execute a known algorithm based on a seeded value; however, the algorithm would change based on a formula sent over the wire.  For instance, suppose for every four bytes in a file, there are four state values: A, B, C, and S.  The S value is the current four bytes of the file.  The server might send the following formula as an initialization: A=A-S B=B-C C=C+A A=A+B.  In addition, it sends some startup values for A, B, and C.  It means, that for every four bytes of the file, we need to perform the math in the stops outlined in the above file initialization string.

Now, one of the common ways to approach this problem has been to, basically, attack it by brute force.  We’d keep track of the state values in an array, then keep track of the indices of the state values in another array offset by their letters, then keep track of operators in another array, and finally doing double-dereferencing (dereferencing the index of the state value then actually dereferencing the state value.  So you might have code that looks like this:

```foreach (step)
{
states[Transform('S')] = ReadNext();
foreach (string formula in list)
{
states[Transform(formula[0])] = DoMath(states[Transform(formula[2])], states[Transform(formula[4])], GetOperator(formula));
}
}```

Here, the “Transform” function translates a character to its index into the state value index.  This is a pretty sub-optimal solution given all of the extra dereferencing, and this is really a pseudo-implementation of this activity.  What would be best is if we could somehow unroll that inner loop and access the values directly (or through a single dereference, as a pointer would do).  In other words, it could be rewritten better like so:

```foreach (step)
{
S = ReadNext();
A = A - S;
B = B - C;
C = C + A;
A = A + B;
}```

The challenge is that, the server provides the verification string, and it changes over time, so the client can’t reliably predict which combination of formulae will be used.  Although in the wild only a fixed set of combinations have ever been observed, there are a number of others that could potentially be presented, with no fixed number of formulas, three potential writeable state values and four readable state values per formula, and eight binary operators (+, –, *, /, %, &, |, and ^).  So, either we keep going with the inner loop, or we figure out some way to get all the benefits of compilation without the headaches of having to know exactly what we’re programming before we program it.  Fortunately, the .NET framework provides a way for us to do exactly that: dynamic methods.

To simplify the code that we need to generate, we’ll rewrite the inner code to look like this:

```foreach (step)
{
S = ReadNext();
ExecuteStep(ref A, ref B, ref C, ref S);
}```

Now, all we need to do is dynamically emit the ExecuteStep method.  To do so we’ll need to get into the System.Reflection.Emit namespace – kind of a scary place to be!  Fortunately, Reflector is going to make this easier for us – and we’ll be glad we’re doing this in IL.

In Part 2, we’ll look at how to actually emit the dynamic method by writing the equivalent code in C# and then looking at it in Reflector, then figuring out how to generate it at run-time.  Along the way, we’ll learn a little bit about the .NET evaluation stack.

Oh – one more thing – here’s why you should care about all of this.  A simple testing framework indicated a speed increase of a factor of four when changing this to use a dynamic method instead of the previous implementation.  Over 50 iterations, I observed the dynamic method versions taking a little less than 1/4 of the execution time of the original array-based implementation.

Now, if that’s not a marked improvement, I don’t know what is.  But remember, as with all performance optimizations, your mileage may vary.

Improving Performance with Dynamic Methods

• Part 1: The Problem Definition
• Part 2: Emit and Execute

Friday, August 7, 2009 #

For a couple reasons I love to look at programming interview questions around the internet.  Most of them revolve around data structures and algorithms, which is one of my favorite topics in computer science; and as a hiring manager, I find it valuable to try to see what the industry is asking people before bringing them on board.  I came across this site tonight, and while it had a lot of questions that I’ve seen before, this one – a variant of the programming exercise I was given for my first real (non-interning) job application – stood out:

Implement `Shuffle` given an array containing a deck of cards and the number of cards. Now make it O(n).

This caught my eye for a couple reasons: first, why would the algorithm not ever be O(n)?  Second, if I think its obvious implementation is O(n), then do I have a misconception about what O(n) means?  The problem doesn’t give any other details about what “Shuffle” means either.

Still, this is my naive algorithm:

```void Shuffle ( cards[], length )
for i = 0 to length
n = Random % length
Swap(&cards[i], &cards[n])```

Swap() isn’t linear or otherwise; it’s constant time.  The only loop happening here is the single for loop, from 0 to the length of the array.  We can improve the entropy of the resultant shuffle by using a cryptographically-strong random number generator such as RNGCryptoServiceProvider, but it would require more memory.  Here’s an actual C# implementation:

Card.cs:

```using System;
namespace LinearBigOShuffle
{
public class Card
{
public Rank Rank { get; set; }
public Suit Suit { get; set; }
public override string ToString()
{
return string.Format("{0,-8}{1}", Suit, Rank);
}
}
public enum Suit
{
Club = 1, Diamond = 2, Heart = 3, Spade = 4,
}
public enum Rank
{
Ace = 1, Two = 2, Three = 3, Four = 4, Five = 5, Six = 6, Seven = 7, Eight = 8, Nine = 8, Ten = 10,
Jack = 11, Queen = 12, King = 13,
}
}
```
Program.cs:
```using System;
namespace LinearBigOShuffle
{
class Program
{
static void Main()
{
// initialize the cards array
Card[] cards = InitializeStandardDeck();
// Verify that the cards were initialized correctly
PrintCards(cards);
Console.WriteLine("Shuffling...");
Shuffle(cards);
PrintCards(cards);
// Wait for the user to press enter to exit.
Console.ReadLine();
}
private static void Shuffle(Card[] cards)
{
Random r = new Random();
int n = cards.Length;
for (int i = 0; i < n; i++)
{
int nextSwapIndex = r.Next(n);
Card temp = cards[nextSwapIndex];
cards[nextSwapIndex] = cards[i];
cards[i] = temp;
}
}
private static Card[] InitializeStandardDeck()
{
Card[] cards = new Card[52];
int index = 0;
for (int suit = (int)Suit.Club; suit <= (int)Suit.Spade; suit++)
{
for (int rank = 1; rank <= (int)Rank.King; rank++)
{
cards[index++] = new Card { Rank = (Rank)rank, Suit = (Suit)suit };
}
}
return cards;
}
private static void PrintCards(Card[] cards)
{
foreach (Card c in cards)
{
Console.WriteLine(c);
}
}
}
}
```

In this implementation, I’ve included a couple helper methods and assumed Ace-low, but I don’t think that really matters.  If you run the program, you’ll see the in-order deck and the shuffled deck.  There are a couple caveats with the Shuffle implementation I’ve provided:

• There is no guarantee that a card won’t end up in the same relative position in which it started, or the same absolute position.
• There is no guarantee that an individual card won’t move multiple times.

If you think about it though, neither of these caveats are untrue for really shuffling cards!

### What about Big-O?

The question I raised earlier was, just how big is Big-O anyway?  Well, Big-O is great for analyzing algorithmic complexity, but that’s not always what we want when measuring the performance of an implementation.  I’m depending on a library function in the Random class!  What if it was incredibly slow, because it produced truly random numbers instead of pseudorandom numbers?  In that scenario, I would not have a terribly performant implementation, even if my algorithm was “theoretically” correct.

Big-O is great!  But I guess what I was trying to say here is…

Don’t think it’s the only thing that will impact your code!

Sunday, May 17, 2009 #

Desert Code Camp – the Arizona conference by and for software developers – is right around the corner!

This year’s code camp is on June 13 at DeVry University in Phoenix.  It’s not too late to request a session, or to sign up to present one, and it’s certainly not too late just to show up!

This year I will be presenting Back to Basics: Object-Oriented Design Patterns in C#, as well as Optimization Patterns: Reducing Memory Footprint in .NET.  The latter is based on parts 2 and 4 of my “Speedy C#” blog series.

Please come, and please help spread the word by supporting our Facebook group.

I hope to see you there!

Wednesday, April 8, 2009 #

Today I’m going to talk about a feature of C# that has been around since 2.0 (with the introduction of anonymous delegates) but which gets nearly no lip service and, despite the fact that most C# developers have probably used it, they’ve probably used it without thinking about it.  This feature is called closure, and it refers to the ability of a nested function to make reference to the surrounding function’s variables.

This article will make extensive discussion of how delegates are implemented in C#; a review may be appropriate before diving in.  Also, we’ll be making a lot of use of The Tool Formerly Known as Lutz Roeder’s .NET Reflector, which is now owned by Red Gate Software.

### Anonymous Methods without Closure

Supposing that I had a reason to do so, I could assign an event handler as an anonymous method.  I think this is generally bad practice (there is no way to explicitly dissociate the event handler, because it doesn’t have a name), but you can:

```    public partial class SampleNoClosure : Form
{
public SampleNoClosure()
{
InitializeComponent();
button1.Click += delegate
{
MessageBox.Show("I was clicked!  See?");
};
}
}```

This will work as expected; on click, a small alert dialog will appear.  Nothing terribly special about that, right?  We could have written that as a lambda expression as well, not that it buys us anything.  It looks like this in Reflector:

We see that the compiler auto-generates a method that matches the appropriate signature.  Nothing here should be completely surprising.

### Simple Example of Closure

Here is a sample class that includes closure.  The enclosed variable is sum.  You’ll note that everything just makes sense internally, right?

```    public partial class SimpleClosureExample : Form
{
public SimpleClosureExample()
{
InitializeComponent();
int sum = 1;
for (int i = 1; i <= 10; i++)
sum += sum * i;
button1.Click += delegate
{
MessageBox.Show("The sum was " + sum.ToString());
};
}
}```

So, it only makes sense that sum can be part of that anonymous function, right?  But we need to bear in mind that all C# code must be statically-compiled; we can’t just calculate sum.  Besides, what happens if the value was a parameter to the function?  Something that couldn’t be precompiled?  Well, in order to handle these scenarios, we need to think about how this will work.

In order to keep that method state alive, we need to create another object.  That’s how the state can be maintained regardless of threads and regardless of calls to the function.  We can see it as a nested class here, and the anonymous method looks just like it does in code:

### A More Advanced Example

Whenever you work with a LINQ expression, chances are you’re using closure and anonymous functions (and lambda expressions) and don’t realize it.  Consider this LINQ-to-SQL query:

```            int boardID = 811;
int perPage = 20;
int pageIndex = 0;
var topics = (from topic in dc.Topics
orderby topic.IsSticky descending, topic.LastMessage.TimePosted descending
where topic.BoardID == boardID
select new
{
topic.TopicID,
Subject = topic.FirstMessage.Subject,
LatestSubject = topic.LastMessage.Subject,
LatestChange = topic.LastMessage.ModifiedTime,
NameOfUpdater = topic.LastMessage.PosterName,
Updater = topic.LastMessage.User,
Starter = topic.FirstMessage.User,
NameOfStarter = topic.FirstMessage.PosterName,
topic.ReplyCount,
topic.ViewCount
})
.Skip(perPage * pageIndex)
.Take(perPage);
foreach (var topic in topics)
{
Console.WriteLine("{0} - {1} {2} {3} {4} by {5}", topic.Subject, topic.NameOfStarter, topic.ReplyCount, topic.ViewCount, topic.LatestChange, topic.NameOfUpdater);
}```

The closure here is happening within the where clause; you may recall that the C# where clause evaluates to the IEnumerable<T> extension method Where(Func<TSource, bool> predicate).

Here, it’s very easy to imagine a case where we wanted to write actual parameters.  This query is used to generate and display a topic list for a message board; all “stickied” posts should be at the top and the rest should be sorted by last time posted.  If I’m making that into a web server control, I’m going to need to not hard-code the board ID, the number of topics per page to display, and which page I’m looking at.

Now, this is kind of a hard thing to conceptualize; when I was going through this project, I expected all three variables to be incorporated into the class.  It turns out that Skip() and Take() don’t evaluate a lambda expression – they take straight values – so we don’t ultimately have to store them for evaluation later.  However, as expected, boardID does have to be stored, and that provides us with an interesting question of why.  And you might be asking why that is even the case; LINQ-to-SQL translates this into SQL for us:

```SELECT TOP (20) [t0].[TopicID], [t2].[Subject], [t1].[Subject] AS [LatestSubject], [t1].[ModifiedTime] AS [LatestChange], [t1].[PosterName] AS [NameOfUpdater], [t4].[test], [t4].[UserID], [t4].[Username], [t4].[Email], [t4].[PasswordHash], [t6].[test] AS [test2], [t6].[UserID] AS [UserID2], [t6].[Username] AS [Username2], [t6].[Email] AS [Email2], [t6].[PasswordHash] AS [PasswordHash2], [t2].[PosterName] AS [NameOfStarter], [t0].[ReplyCount], [t0].[ViewCount]
FROM [dbo].[Topics] AS [t0]
LEFT OUTER JOIN [dbo].[Messages] AS [t1] ON [t1].[MessageID] = [t0].[LastMessageID]
LEFT OUTER JOIN [dbo].[Messages] AS [t2] ON [t2].[MessageID] = [t0].[FirstMessageID]
LEFT OUTER JOIN (
SELECT 1 AS [test], [t3].[UserID], [t3].[Username], [t3].[Email], [t3].[PasswordHash]
FROM [dbo].[Users] AS [t3]
) AS [t4] ON [t4].[UserID] = [t1].[UserID]
LEFT OUTER JOIN (
SELECT 1 AS [test], [t5].[UserID], [t5].[Username], [t5].[Email], [t5].[PasswordHash]
FROM [dbo].[Users] AS [t5]
) AS [t6] ON [t6].[UserID] = [t2].[UserID]
WHERE [t0].[BoardID] = @p0
ORDER BY [t0].[IsSticky] DESC, [t1].[TimePosted] DESC
```

So why, if we already have the SQL generated, do we need to bother with it?  Well, you may recall that LINQ-to-SQL doesn’t support all possible operators.  If we break support for the LINQ-to-SQL query and we have to pull back all of the relevant items, we’ll have to use that class.  At this point though, it goes unused.

### Review

A closure is when you take the variables of a function and use them within a function declared inside of it – in C#, this is through anonymous delegates and lambda expressions.  C# typically will accomplish the use of closures by creating an implicit child class to contain the required state of the function as it executes, handing off the actual method to the contained class.

### Further Reading

Thursday, April 2, 2009 #

I’m working on porting an existing forum-based community from SMF to a new .NET-based forum platform that I’m authoring.  I’m excited about it; I love SMF, but it doesn’t have what I want and frankly, it’s a scary beast to try to tackle.  I’d considered using some kind of bridge between it and my code, but I knew I wanted deep integration of the forums with the new community site, and I wanted the community site in .NET.  So I made the decision to write an importer to talk between MySQL and my SQL Server-based solution.  I chose LINQ-to-SQL as my O/R mapper because, quite frankly, I find it much easier and more elegant to work with; so far as I know, I’m not the only one who thinks so.

Because of the nature of the data that I’m importing, I needed to run several SubmitChanges() calls to get the data into the database.  But I wanted to make sure that these submissions only worked if they ALL worked.  So I needed a transaction external to the normal LINQ-to-SQL in-memory object mapper.  Unfortunately, when I began a transaction using the underlying Connection property of the DataContext, I was met with an error:

```System.InvalidOperationException: SqlConnection does not support parallel transactions.
at System.Data.SqlClient.SqlInternalConnection.BeginSqlTransaction(IsolationLevel iso, String transactionName)
at System.Data.SqlClient.SqlInternalConnection.BeginTransaction(IsolationLevel iso)
at System.Data.SqlClient.SqlConnection.BeginDbTransaction(IsolationLevel isolationLevel)
at System.Data.Linq.DataContext.SubmitChanges(ConflictMode failureMode)```

The solution was simple: DataContext has a Transaction property!  By setting this to the transaction that I was beginning, I was able to run the complete import in a single transaction:

```dc.Connection.Open();
using (DbTransaction transaction = dc.Connection.BeginTransaction(IsolationLevel.ReadCommitted))
{
dc.Transaction = transaction;
try
{
// do databasey things
dc.SubmitChanges();
transaction.Commit();
}
catch (Exception ex)
{
transaction.Rollback();
Console.WriteLine("Exception caught; transaction rolled back.");
Console.WriteLine(ex.ToString());
}
}
```

It took about 2 minutes to import 37,000 or so messages, plus all users, categories, forums, private messages, and polls from SMF.  The app ends up taking something in the neighborhood of 120mb of memory (I need to keep objects around to reference them for their children, since I assign new IDs), but it’s still a small one-off price to pay.

Tuesday, March 31, 2009 #

For the last few months I’ve been working on a fairly large-sized web application for work.  The application is mostly Flex-based, so I’m working on back-end components, which are accessed via FluorineFx (an ActionScript-.NET connector).  The site also needs to integrate with Ektron, a somewhat perilous enterprise-level CMS, because a lot of the company’s web-facing data is already stored their and they want to reuse as much as possible.  The site is a completely new version of a much less-ambitious site designed for the same purpose, driven by a 400mb Access database.

So, as part of our deployment setup, our projects consist of:

• 3 projects that encompass most of the site’s actual functionality.  One is for Ektron integration, one is for control development, and one is for the service implementations.
• 1 tool that imports data from the old site.
• 1 tool that imports localization data from Ektron.
• 1 tool that creates Ektron Smart Forms in order to create that localization data.
• 1 tool that encrypts member passwords (because until that’s done we can’t see the old members database).
• 1 tool that both backs up Ektron staging data and also restores that data to a clean environment.
• A team test project.
• A test harness for other parts of the system.
• Oh yeah, the web site.

I generally find myself on this approach whenever I’m working on a large system.  Importing data from SMF?  Write a tool.  Need to transform something into an XML format your site understands?  Write a tool.

For me, the hardest part about getting tooling correct is making sure that I’m not wasting time when I’m designing one.  For instance, smart forms in Ektron are a cool idea, at least in concept.  I’ve found them somewhat difficult to use, but it’s still a cool idea for a CMS.  Our client wants to use Ektron to store localization data, but right now all I have are the dictionaries of English text in a Flex ActionScript file.  Here’s our process:

• Grep the text out of the ActionScript dictionaries into a .CSV file.
• Using Sebastien Lorien’s CSV reader, read each dictionary in one at a time.
• Create a .xml file corresponding to the Ektron smart form exported file format.
• Manually import or update each corresponding Ektron smart form.

When this eventually goes to the client (or to a new staging database for ourselves), we’ll run a tool on our database in “backup mode,” which will back up all of the smart forms, as well as the corresponding content rows, into a binary file.  Then we’ll run the tool on the new database in “restore mode,” which will update all of the content to the latest versions.

What does this allow us to do?

• QA content on the test site and be confident that it’s going to be represented on the production site.  Because entering content into Ektron in this manner isn’t particularly easy, we don’t feel comfortable that we could do it with no errors.  By automating the process, we can be reasonably confident that we’ll eliminate typos.
• Avoid repeated input time.  It might have cost me the same amount of time to write the backup/restore tool as it would have taken to just do it by hand.  However, I was fairly confident (and correct) that I’d have to do it more than once.  We get the added bonus of being able to deliver the tool to the client, who will also have to run it in their production environment.

I love writing tools. It makes everyone’s lives easier, and that’s never, ever bad.

Tuesday, March 17, 2009 #

This morning I received an email that posed a question so interesting that I thought I would blog about the answer.  The question was, essentially, how can we invoke a partial update (using ASP.NET AJAX triggers), from an on(up) button handler in Flash?

There are a few different ways to approach this problem.  I believe the method I’m going to write out here is what I like to call the “path of least resistance” – it’ll get you there quickly.  However, it will create some interdependencies among your controls.  However, using this technique as a baseline, you should be able to adapt it to fit other better techniques, which I’ll describe later.

Flash lives in a sandbox relative to your web page; it doesn’t interact with the rest of your page’s code (by default), and so you need to provide it with a way to do so.  In ActionScript 1 and 2 it’s relatively easy to invoke JavaScript using getURL():

I won’t get into the details of how to access script with ActionScript; I’ll simply leave it to the professionals (and use this as an example).  With this capability we should have everything we need to build an AJAX-enabled Flash button.  I’ve created a sample ASPX page that will host what we need:

```<%@ Page Language="C#" AutoEventWireup="true"  CodeFile="Default.aspx.cs" Inherits="_Default" %>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
<title></title>
</head>
<body>
<form id="form1" runat="server">
<div>
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=9,0,0,0" width="150" height="60" id="blog" align="middle">
<param name="allowScriptAccess" value="always" />
<param name="allowFullScreen" value="false" />
<param name="movie" value="blog.swf" /><param name="quality" value="high" /><param name="bgcolor" value="#ffffff" />
<embed src="blog.swf" quality="high" bgcolor="#ffffff" width="150" height="60" name="blog" align="middle" allowScriptAccess="always" allowFullScreen="false" type="application/x-shockwave-flash" pluginspage="http://www.macromedia.com/go/getflashplayer" />
</object>
<asp:ScriptManager ID="sm" runat="server" EnablePartialRendering="true">

</asp:ScriptManager>

<asp:UpdatePanel ID="update" runat="server">
<Triggers>

</Triggers>
<ContentTemplate>
<asp:Label ID="lblText" runat="server" Text="Click the Flash button to turn me blue." />
</ContentTemplate>
</asp:UpdatePanel>
</div>
</form>
</body>
</html>
```

We still need a way to invoke the partial update.  Unfortunately, what I termed the “path of least resistance” is going to involve a little voodoo: we’re going to create a Button, set its display to none (so that it’s on the page but hidden), and then treat it as a trigger for the UpdatePanel:

```        <object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=9,0,0,0" width="150" height="60" id="blog" align="middle">
<param name="allowScriptAccess" value="always" />
<param name="allowFullScreen" value="false" />
<param name="movie" value="blog.swf" /><param name="quality" value="high" /><param name="bgcolor" value="#ffffff" />
<embed src="blog.swf" quality="high" bgcolor="#ffffff" width="150" height="60" name="blog" align="middle" allowScriptAccess="sameDomain" allowFullScreen="false" type="application/x-shockwave-flash" pluginspage="http://www.macromedia.com/go/getflashplayer" />
</object>
<asp:Button runat="server" style="display: none;" ID="btnFlashGo" />
<asp:ScriptManager ID="sm" runat="server" EnablePartialRendering="true">

</asp:ScriptManager>

<asp:UpdatePanel ID="update" runat="server">
<Triggers>
<asp:AsyncPostBackTrigger ControlID="btnFlashGo" EventName="Click" />
</Triggers>
<ContentTemplate>
<asp:Label ID="lblText" runat="server" Text="Click the Flash button to turn me blue." />
</ContentTemplate>
</asp:UpdatePanel>```

We’re still missing one piece: we need to generate the function call to actually make a postback.  There’s a relatively convenient way to do that, using the ClientScriptManager; drop this Literal onto the page:

```        <asp:Button runat="server" style="display: none;" ID="btnFlashGo" OnClick="btnFlashGo_Click" />
<asp:Literal ID="flashScript" runat="server" />```

and wire it up in the backend:

```public partial class _Default : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
this.flashScript.Text = string.Format(@"<script type=""text/javascript"">
function flashClicked()
{{
{0}
}}
</script>", ClientScript.GetPostBackEventReference(this.btnFlashGo, ""));
}
protected void btnFlashGo_Click(object sender, EventArgs e)
{
lblText.ForeColor = Color.Blue;
}
}
```

This setup should give us everything we need to make the AJAX call, and sure enough:

## Better Approaches

I would be happier – much happier – with this solution if it didn’t depend on so much black magic.  There are a few ways to get it to work better, and while I don’t want to take the time to demonstrate them here, I can describe them a bit.

### A FlashButton Control

In my mind, a FlashButton control is the best solution; it can derive from Control or WebControl (although to be honest simply Control is preferable), and can automatically do the heavy lifting.  You could incorporate SWFObject as a script resource and provide it automatically.  FlashButton could expose its own events, which means that you could eliminate that Button (the hidden one) and instead create the script callback reference pointing to the FlashButton’s event itself (and the UpdatePanel’s trigger could point there as well).

### A FlashButtonManager Control

A FlashButtonManager could extend the support for FlashButton much like ScriptManager does for UpdatePanel.  While the approach with a single FlashButton works well when only a single FlashButton is on the page, incorporating multiple FlashButton objects becomes tricky when you factor in things like handling multiple callbacks (for instance, naming the flashClicked() function).  A FlashButtonManager could be designed such that it handles each flashButton on the page, perhaps setting up FlashButtons with a FlashVar to specify a parameter when calling flashClicked(), and then using that parameter to determine which one was clicked and firing the appropriate postback event.

## Final Thoughts

You need to enable your Flash app to talk to JavaScript in order to make AJAX partial updates work correctly.  Fortunately it’s not super-challenging to do so!  You should be careful though – there is some voodoo going on in the back-end of this kind of solution – but with creative architecture, you can avoid a lot of headache.

Friday, March 13, 2009 #

I’ve officially launched the Facebook UI for ASP.NET project at CodePlex at http://facebookui.codeplex.com/.  It’s kind of a preview edition of what’s to come – and there’s quite a bit.  Here’s a quick overview:

Now:

• Design-time and run-time server support for about 30 FBML controls.
• Visual Studio IntelliSense support for FBML controls
• Web Site project templates in Visual Studio 2008

Version 1:

• All FBML tags within: Users/Groups, Profile-Specific, Embedded Media, Visibility on Profile, Tools, Forms, Message Attachments, Additional Permissions, Notifications and Requests, Status Messages, Page Navigation, Dialog, and Wall sections.
• Complete API support.
• Extensibility points within the FBML controls framework and API framework to enable updates from Facebook to be rolled in without requiring an immediate release of a new library.

Version 1.5:

• LINQ-to-FQL (or as some folks at the user group mentioned, perhaps it’ll be called LINQ-to-Facebook).
• Higher-level controls will be included, such as validators and simple logic controls, like a birthday display.

Version 2:

• Facebook internationalization should be supported.
• Mobile platform support should be extended.

Beyond that, I’m not sure where we’re going to take the project.  We’ll have to see!

CLR Profiler is a free and incredibly useful tool offered by Microsoft.  I'm fairly certain its primary use (at least from Microsoft's perspective) is to illustrate use of the CLR Profiling COM APIs, which aren't exceptionally clear-cut (in my opinion), particularly from a .NET programmer's point of view.  The really difficult part of using CLR Profiler is becoming accustomed to its interface and the data it presents; however, once you do so, I'm certain you'll find it incredibly helpful in addressing difficulties with memory usage.  This article aims to introduce you to the "important parts" of CLR Profiler - specifically, which graphs you should view, how you should interpret them, and how to address the problems you find.  This article will not review some of the more complicated parts of injecting CLR Profiler into something such as your ASP.NET application; there are other resources for that purpose.

For the purposes of this article, I've re-introduced a wasteful error into BN# that I found by using CLR Profiler.  We'll work through finding it in this article.

## Getting Started

Once you have CLR Profiler "installed" - and I use the term loosely - you can start the application from the install path (don't look for a Start Menu item).  There are two versions of binaries, x86 and x64 versions; you should know which edition of the application you'd like to run.  If you're running a platform-neutral application (most .NET apps would fall under this category), and you're on an x64 system, you should use that one.  If you're running 32-bit Windows, or are running a program specifically targeted to x86, then you should run the x86 version of CLR Profiler.

As an important note, for Windows Vista users, if you're running with UAC enabled, make sure to run CLR Profiler as an administrator.  CLR Profiler works by injecting a COM DLL into the target, but it can't do that if you're not running the process as an administrator.

When profiling memory, I turn off Calls tracking: it's located in the bottom-right of the UI window.

If your application requires access to the local application directory - for instance, by using the Application class in Windows Forms - you should go through the explicit Profile Application menu item within the File menu, and set the working directory option of that UI.  Otherwise, go ahead and click Start Application, browse to your application, and go.

## During Operation

Other than the fact that your application will be measurably slower, you should be able to run the application as you otherwise would.  Your mileage will vary, but you'll get better results with more memory in your system.  But all developers have at least 4gb powering their boxes now, right?

During the application, you can click on the Show Heap now button on the main CLR Profiler GUI, which will display a heap graph of the current application, displaying the path to all currently allocated memory:

To be honest, I find the heap graph to be relatively confusing, but the good news is that you don't need to keep using it.  But once you've dumped that temporary log, you can view the current heap and interesting information by closing that window and, in the main CLR Profiler window, going to the View menu, and choosing Summary, which displays a cool window:

This window helps you understand what's happening:

• Allocated bytes is really interesting – it relates the total amount of memory that you’ve allocated within managed code.
• Final Heap Bytes is the amount of managed memory that currently is in use on the heap.  This doesn't necessarily reflect unmanaged items.
• Relocated Bytes is the amount of memory that has been moved by the garbage collector during compaction operations.
• Gen X collections shows the number of garbage collections that have occurred for each generation.
• Garbage Collector Generation Sizes shows the number of bytes being used by each heap.

## What's Happening with BN#?

I had a suspicion based on memory usage (reported by Task Manager) that BN# wasn’t quite as efficient as I would have hoped.  I wanted to do some investigation, so I plugged in CLR Profiler.  After a 30-second (or so) connection to Battle.net, joining Clan Recruitment, this is what I saw:

That’s pretty heavy – 31mb or so total allocated memory but only ending up with about 3mb on the heap and only 3.5mb were relocated throughout the lifetime of the app – that told me that I was doing a lot of allocating and freeing very rapidly.  What’s the next step?

I clicked on the Allocation Graph button and took a look:

In this we can see that byte arrays are on the heap frequently and account for about 35% of all memory allocations.  That’s a big problem – especially since I pooled their creation already!  CLR profiler helps me track it down though, as I follow the highlighted call chain back to its source:

This image indicates that I have a problem with a method called DataReader::get_m_data().  Now, as I mentioned, I had to recreate this problem, and the path of least resistance for me was to change the identifier m_data (used frequently in DataReader) to be a property instead of a field, so originally this said get_Data.  I thought that was odd until I saw its implementation:

```        protected virtual byte[] Data
{
get
{
byte[] dataCopy = new byte[_m_data.Length];
Buffer.BlockCopy(_m_data, 0, dataCopy, 0, dataCopy.Length);
return dataCopy;
}
}```

So here, for every operation that accesses the Data property (in the original implementation, it was every operation, because the Data property was virtual), I was duplicating the entire arrayEVERY TIME.

I then changed the implementation so that operations defined within the base class wouldn’t needlessly go through a property, and derived classes had direct access to the buffer by reference (via the UnderlyingBuffer property).  What were my results?

I think that fairly well speaks to the effectiveness of using tools like this. :)  A decrease of 27% in allocations, 33% in gen-0 collections, and 53% decrease of the amount of byte[] allocations:

### The "Speedy C#" Series:

Saturday, March 7, 2009 #

I enrolled in the software engineering program at Arizona State, primarily because that program was willing to take me without having completed an undergraduate degree in computer science (the M.S. Computer Science and M.C.S. programs were not).  But as I went through the process I decided that I wanted to complete a thesis instead of a project, and I found a topic that I wanted to pursue that was clearly out of the realm of what I would consider software “engineering.”  So I began working on undergraduate coursework to make up the deficiencies; the first course I took was Operating Systems.  I thought that we would be actually creating a primitive operating system in the class and was consequently very excited (I’ve read a lot about the topic and have had friends who have worked on them, but I never have myself) – it turned out to be somewhat less exciting, but not too bad to talk about the way OSes allocate memory, manage resources, and provide synchronization services.

Now, I had just built up an app that made extensive use of mutexes to provide non-busy waiting while waiting for external hardware to do some work.  By the time class started, I had a solid understanding of the idea of synchronization primitives – you wait(), and you signal(); waiting allows a thread to pass through if the primitive has free slots and otherwise blocks the thread in a non-busy wait; signalling permits the primitive to allow one or more threads through.

One of our “lab” assignment was to implement a reader-writer lock using Java.  Except that it had mostly been completed already, and if that wasn’t adequate, there was “pseudo-code” to help us out.  Let’s take a look.

```public final class Semaphore
{
public Semaphore() {
value = 0;
}

public Semaphore(int v) {
value = v;
}

public synchronized void P() {
while (value <= 0) {
try {
wait();
}
catch (InterruptedException e) { }
}

value --;
}

public synchronized void V() {
++value;

notify();
}

private int value;
}

```

This class irritates me for one particular reason: its methods are named P and V.  Now, this class was almost a full year ago, and I wasn’t going to blog about it, except that I stumbled upon the Wikipedia article about semaphores today:

The canonical names P and V come from the initials of Dutch words. V stands for verhogen, or "increase". Several explanations have been given for P (including proberen for "to test", passeer for "pass", probeer "try", and pakken "grab"), but in fact Dijkstra wrote that he intended P to stand for the made-up portmanteau word prolaag, short for probeer te verlagen, or "try-and-decrease" (A less ambiguous, and more accurate, English translation would be "try-to-decrease".) This confusion stems from the unfortunate characteristic of the Dutch language that the words for increase and decrease both begin with the letter V, and the words spelled out in full would be impossibly confusing for non–Dutch-speakers.

I’m fully aware of the great things Edsgar Dijkstra has given to computing.  But let’s also consider that this concept made it into the ALGOL 68 programming language with better names than P and V.  In fact, Java has an intrinsic Semaphore class already built in that has methods named acquire and release.  When I submitted the assignment I chose to rename them to wait() and signal(), but that’s really beside the point.

The point of this whole rant is this: how can we expect students to come out of the university and be effective software developers if we stick to, frankly, archaic naming conventions like this, particularly in assignments which are presented like this by the professor?

See, while I understand the idea of “increase” and “decrease” being legitimate names for a data structure like this when you’re inventing it, now that we’re in the world of object-oriented programming, how can a university course really help its students?  I would submit that a semaphore has basic behaviors: wait and signal (or acquire and release), and that semaphores predefine a fixed number of threads that can acquire them.  But once that number is defined, what do we care how the semaphore is implemented?

We should be helping students get into the habit of following the rules of encapsulation.  Naming the methods of a semaphore “Increase” and “Try to decrease” not only leaks information about the internal mechanics of a semaphore (which may be inaccurate – semaphores could be implemented in hardware, for example, and the internal mechanics of a software semaphore representation of that hardware component may be entirely different), but it also misleads the developer because the names don’t really describe the behavior of what’s happening.  Arguably, wait() doesn’t because the calling thread only waits sometimes (and so I prefer acquire() in this situation), but at least that’s relatively close.

When the wet-behind-the-ears college graduate gets out of school, we don’t expect years upon years of experience from which they may draw.  But I think that it wouldn’t hurt to get these habits in front of students as early and frequently as possible.

Wednesday, February 25, 2009 #

On March 10th, I’ll be presenting an ASP.NET control library for Facebook at the Phoenix ASP.NET Users Group.  Among other things, we’ll be talking about how to create a Facebook application using ASP.NET, debugging aids, and the Facebook API.  Along with that, Terralever will be releasing our library to the open-source community on Codeplex.  (This post will be updated once we’ve done so).

The library is currently in varying degrees of maturity.  We’ve got about half of the FBML controls supported in the library with full design-time support:

If you haven’t worked with Facebook before, aside from having design-time support, the best part about having Facebook controls with runat=”server” on them is that you have full access to them in codebehind.  Because of the way Facebook controls are accessed (using the XML fb: prefix), we couldn’t do that before without causing exceptions because ASP.NET would try to access them.  That generally led us to using ugly ASP.NET databinding syntax:

<fb:name usereflexive=”true” useyou=”true” capitalize=”true” uid=’<%# Eval(“FbUserID”) %>’ />

We still have the ability to use this type of syntax, but we have the flexibility to do specific databinding (for instance, within a repeater control), accessing that control within a type-safe way.

There are a few big differences, though.  Facebook has this concept of a meta-control called fb:else; it’s used in conjunction with several other controls (such as the one shown in the screenshot above).  But that’s DEFINITELY not supported in the ASP.NET code editor, and it’s fairly irregular for an ASP.NET application.  Fortunately, we already have a way around that problem, templates:

By using designer templates we’re able to provide that metadata to the editor.  (More on that before too long).

### Release Roadmap

For the release in early March following the user group presentation, we’ll release a subset of the FBML user controls (the Terralever.Facebook.UI.FbmlControls namespace) to the community, as well as some base-level utility classes, such as base classes for Canvas- and IFrame-based pages.  Hopefully, there will also be a preview of LINQ-to-FQL, although that’s not necessarily going to happen.  Debugging aids, including specialized and extended tracing HTTP modules, will be part of the release.  We’ll also include a couple controls we’ve had to model, including a Pager control, a Birthday List control, and a control template for hosting on user profile boxes.

#### Version 1

Additional FBML controls will be added as time goes on, but the second biggest release is going to be adding support to the Terralever.Facebook.UI.CanvasControls namespace.  Just like the System.Web.UI.WebControls namespace is to the System.Web.UI.HtmlControls namespace, .CanvasControls is to .FbmlControls.  It provides support for some of the higher-level functionality that we’ve come to expect from ASP.NET.  If you’ve tried to use the normal validation controls on a Facebook Canvas page, you’ve known the bitter defeat of it.  I’m not so sure it’ll get to some of the more data-centric controls we’ve come to know and love, like DataGridView (because let’s be honest – it just doesn’t fit into a Facebook page).  But we’ll see some support for postbacks (on that note, Microsoft, ClientScriptManager shouldn’t be sealed and Page.ClientScript should be virtual), all the regular validator controls, and some Facebook-friendly client UI controls.

We’ll include much greater support for the Facebook REST-based API.  I haven’t decided yet whether we’ll consume them as WCF services or simply use an XML parsing strategy, such as LINQ-to-XML.  It it isn’t already included, we’ll definitely include a LINQ-to-FQL implementation.

#### Version 2

Long-term (version 2 and beyond) will most likely add support for the Facebook internationalization platform, the Editor control, and the Mobile platform.  We’ll see how it goes and what is requested by the community.

Sunday, October 26, 2008 #

I was exceptionally fortunate to get an opportunity to attend Microsoft's 2008 Professional Developers Conference in Los Angeles, starting tomorrow!  It's going to be a busy week, undoubtedly, but it's exciting to see what's up and coming.  Some of the things that I'm really geared up for are:

• Under the Hood: Advances in the .NET Type System - We'll get a chance to see the next version of the CLR and the looser coupling it will provide.  In particular I'm hoping they announce support for covariant and contravariant types, something I suspect has been on the whiteboard for a while now.
• The Future of C# - presented by none other than Anders Hejlsberg, the creator of C#.  Unfortunately I haven't found out whether any of my wishlist items made it into the mix, but we'll see.
• Coding4Fun: Windows Presentation Foundation Animation, YouTube, iTunes, Twitter, and Nintendo's Wiimote - Yes, that's all one title, one session.  I hope we're presented with Wiimote classes for use in our Windows 7 and Silverlight apps! :)
• Windows 7: Unlocking the GPU with Direct3D - I'm intrigued because D3D has historically been something that is relegated to the games/CAD developer.  Are we going to start seeing D3D on the desktop?
• Windows 7: Developing Multi-touch Applications - It's so impractical and yet I'm double-booked for it with a concurrency symposium.
• WPF: Extensible BitmapEffects, Pixel Shaders, and WPF Graphics Futures - Integrating shaders into WPF?  How can you not be intrigued?

I'll be blogging about the goings-on during the week and taking copious notes about what I get to see.  At some point I need to get free goodies and play with the MS Surface too.

See you at PDC!

Wednesday, August 20, 2008 #

I recently went to my cousin's birthday party (he turned 6), and since he's apparently become a huge Star Wars fan (read: played LEGO Star Wars and loves shooting off C-3PO's leg), my parents decided to get him a couple small Star Wars LEGO toys that were designed for that age group.  They're small - maybe 30 pieces or so - but they're still pretty neat, and I can't help but be amazed at how the pieces still come together to form the whole.  I was a LEGO and K'nex fanatic when I was younger, but I don't think I really played around with them since I was 14 or so - ever since I left Illinois.  So when I was asked to help my cousin put them together, my first reaction was, "Hey, I don't do that anymore."  But as I got started, I was drawn in - even on the little 30-piece landspeeder.

And so, I recently made one of the biggest entirely-vanity buys since I've moved into my new place:

The "Ultimate Collector's Edition" Imperial Star Destroyer.  I couldn't bring myself to invest the full \$500 in the Millenium Falcon, but I thought that the price wasn't too insane, a paltry 3104 pieces wasn't too daunting, and that I would have enough fun putting together the monstrous 37" set.  The instruction book is about the thickness of my high school year books, and I'm disappointed but not entirely surprised that they didn't include the hyperdrive units as part of the set.  I guess I'm on my own for making it actually fly.

I took the pieces out and set them on my dining room table, and as I analyze the rather daunting task ahead, I realized that, really, it's like my every day job.  And the first page of the book seemed to confirm it:

Everything is done in components.  We have a frame, then a wing, then another wing.  The bridge.  All of these go together with little regard for how the rest of the ship is built.  Just like we build software.  We try to solve a problem, but we try to make sure that our components can be used or adapted to solve another problem as well.

So maybe all of my formative years building with those LEGO sets weren't for nought.  Maybe they were setting the stage for software development even then, tuning my problem-solving skills, making me think of things as tasks to be decomposed until the smallest task can be approached.  Who's to say?

Sadly, my software development experience doesn't have a way for me to estimate time to build this bad boy.  But at least my motivation's there!

Update: Check out the building marathon!

Wednesday, August 13, 2008 #

So often in the managed world we're able to get away with not worrying about memory management.  "But the GC takes care of cleaning my objects for me!"  That's true; but if you want your application to be performant, you should at least understand what's going on in all of those circuits and silicon.

In Part 2, I talked a bit about how creating object pools can help you to avoid garbage collections by keeping memory allocated for a long time.  Here, I'm going to talk a bit more extensively about how objects are stored in memory, what a "pinned object" is, and how pointers can be used quickly in C#.

NOTE: This article assumes you are familiar with pointers and pointer arithmetic.  If not, you may wish to brush up.

### Objects in Memory - A Closer Look at the Heap

When you create an instance of a class (not a struct or an enum), your object is being stored on the "heap" - a large contiguous area of memory that is just there.  (For more information on the heap, read up on Part 2).  This includes, interestingly enough, any Array objects you create (such as a byte[[) - they're reference objects, not value objects.  (The one exception is if you use the stackalloc operator in C#).  So, suppose I make the following class:

`   1: class Sample`
`   2: {`
`   3:     public int A;`
`   4:     public long B;`
`   5:     public short C;`
`   6:     public short D;`
`   7: }`

Here's how it would conceptually look in a memory block:

As you can see, the class is laid out contiguously (although the CLR does not guarantee this behavior unless it is decorated with [StructLayout(LayoutKind.Sequential)]).  Still, you get the idea.

However, when we create an object and get a reference to it, we don't actually get a pointer to the object - we get a "reference".  This isn't a reference like you might expect in C or C++, either; rather, it's similar to a handle.  We can use it just like it's conceptually in memory like I laid out.  However, the CLR hides implementation details; for example, every object on the heap has at least a reference to its RuntimeTypeHandle so that casting can be checked at runtime.  To demonstrate, let's take a byte[].  When it's stored on the heap, it's pretty clear what we're looking at.  Arrays of any type are an interesting edge case in .NET; normally, C# does not allow you to obtain a pointer of a managed type (and in fact you can't do what I'm about to demonstrate with a reference type), but arrays themselves ARE managed types (don't worry about the last two lines of output just yet).

`   1: static unsafe void Main(string[] args)`
`   2: {`
`   3:     byte[] bytes = new byte[100];`
`   4:     bytes[0] = 1;`
`   5:     bytes[1] = 2;`
`   6:     bytes[2] = 3;`
`   7:     bytes[3] = 4;`
`   8:  `
`   9:     Type arrayType = null;`
`  10:     fixed (byte* pMem = &bytes[0])`
`  11:     {`
`  12:         Console.WriteLine("{0:x16}", (long)pMem);`
`  13:         int* pArrayBase = (int*) pMem;`
`  14:         Console.WriteLine("{0:x8}", *pArrayBase);`
`  15:         pArrayBase--;`
`  16:         Console.WriteLine("{0:x8}", *pArrayBase);`
`  17:         pArrayBase--;`
`  18:         Console.WriteLine("{0:x8}", *pArrayBase);`
`  19:         pArrayBase--;`
`  20:         Console.WriteLine("{0:x8}", *pArrayBase);`
`  21:         pArrayBase--;`
`  22:         Console.WriteLine("{0:x8}", *pArrayBase);`
`  23:         long rtth = *(long*) pArrayBase;`
`  24:         RuntimeTypeHandle handle;`
`  25:         // RTTH is a value-type whose only member is an IntPtr; can be set as a long on x64`
`  26:         RuntimeTypeHandle* pH = &handle;`
`  27:         *((long*) pH) = rtth;`
`  28:         arrayType = Type.GetTypeFromHandle(handle);`
`  29:     }`
`  30:  `
`  31:     if (arrayType != null)`
`  32:     {`
`  33:         Console.WriteLine(arrayType.Name);`
`  34:     }`
`  35:  `
`  36:     Console.WriteLine("byte[] RTTH: {0:x16}", typeof (byte[]).TypeHandle.Value.ToInt64());`
`  37:     int a = 1;`
`  38:     int b = 2;`
`  39:     int* pA = &a;`
`  40:     int* pB = &b;`
`  41:     Console.WriteLine(*pB);`
`  42:     Console.WriteLine(*(pB - 1));`
`  43:  `
`  44:     Console.ReadLine();`
`  45: }`

Now, just to clarify: I run on x64.  The above code will not function as expected on x86.  There are a few items that will also produce slightly varying results for you; for instance, pMem shouldn't be cast to a long on x86, and to get to the instance's stored RTTH, you only need to decrement the pointer 3 times on x86 (whereas the RTTH on x64 is 8 bytes long).  Here's the output on my machine:

```0000000002a31748                Console.WriteLine("{0:x16}", (long)pMem);
04030201                        Console.WriteLine("{0:x8}", *(pMem));
00000000                        Console.WriteLine("{0:x8}", *(pMem - 1));
00000064                        Console.WriteLine("{0:x8}", *(pMem - 2));
00000642                        Console.WriteLine("{0:x8}", *(pMem - 3));
7890a4a8                        Console.WriteLine("{0:x8}", *(pMem - 4));Byte[]                          Console.WriteLine(arrayType.Name);
byte[] RTTH: 00000642789562c2   Console.WriteLine("{0:x16}", typeof(byte[]).TypeHandle.Value.ToInt64());
2                               Console.WriteLine(*pB);
1                               Console.WriteLine(*(pB - 1));```

So, here we see that the runtime type identifier is stored as part of the object reference on the heap; so is the array length (that's the hex value 00000064 that you see on the fourth line of output - it's 100 in decimal).  That's how arrays are stored on the heap, and it's pretty much how objects are stored; when we have an object reference, we can treat it as if it's a pointer into memory.  But it's more than that; below our "pointer" exists additional information about the object.  We don't get to see that additional information because the CLR hides it from us.

What are reference variables then?  Ultimately, they're stack variables that contain our "pointer" that isn't really a pointer.  I said not to worry too much about the last two lines before, but they are intended to show you one thing: stack variables are allocated sequentially on the stack.  I declared a, then b; by obtaining a pointer to b, I was also able to obtain a pointer to a by decrementing the pointer by the size of the variable (in this case, 32 bits).  To show you that my handle is in fact legitimately pointing to a stack variable, take a look at the following code:

`   1: static unsafe void Main(string[] args)`
`   2: {`
`   3:     Sample s = new Sample {A = 0x01020304, B = 0x0f0e0d0c0b0a0908, C = 0x0706, D = 0x0504};`
`   4:     long a = 1;`
`   5:     long b = 2;`
`   6:     long* pA = &a;`
`   7:     long* pB = &b;`
`   8:     Console.WriteLine("{0:x16}", *pB);`
`   9:     Console.WriteLine("{0:x16}", *(pB - 1));`
`  10:     Console.WriteLine("{0:x16}", *(pB - 2));`
`  11:  `
`  12:     long prS = (long)(pB - 2); // the location of s on the stack`
`  13:     long* pS = *(long**)prS;`
`  14:     Console.WriteLine("{0:x16}", *pS);`
`  15:     Console.WriteLine("{0:x16}", *(pS + 1));`
`  16:     Console.WriteLine("{0:x16}", *(pS + 2));`
`  17:  `
`  18:     Console.ReadLine();`
`  19: }`

Again, the above code will not function as expected on x86 (to make it do so, replace all long references with int).  The output of this code is fascinating:

```0000000000000002      b
0000000000000001      a
0000000002be16c8      s
00000642801a4400      *pS
0f0e0d0c0b0a0908      *(ps + 1)
0504070601020304      *(ps + 2) ```

You might notice that s is a pointer to the heap, and that dereferencing it gives us a number that looks suspiciously similar to a RuntimeTypeHandle just like in the last example, and you'd be correct.  The other interesting thing is the variable order: the B variable in the Sample class was aligned so that it would be first (8-byte alignment on x64 appears to be the default).  Applying [StructLayout] to it as noted before makes it look right (although to the untrained eye it will look entirely backwards due to endianness).

In Part 2, I talked about how garbage collection allows us to not worry so much about external fragmentation of the heap, because the GC performs a process called "compaction," by which objects are moved around in memory so that there aren't small areas of free space.  The interesting question is: what happens if a GC compaction happens and we have a pointer to an object?

### Accessing Memory Locations with Pinned Objects

The CLR allows us to "pin" an object so that it is not moved during garbage collection.  This can potentially have some big consequences for garbage collection, though; the heap is still fragmented if an object is pinned during a pass.  What's more is that if the object becomes eligible for compaction after the pass, it's still considered a gen-0 object even though it should have moved to gen-1.  C# enables us to pin an object via the fixed statement.

In truth, the only objects worth pinning are arrays.  You can't pin a regular reference object to get a pointer for the reason shown above (it's not guaranteed to follow any particular pattern), and single value-type objects can be accessed directly on the stack without pinning.  Pinning arrays has some good performance benefits (which I'll get to a bit later), but like I said, not without a cost.

The neatest part about pointers in C# is that a pointer can be cast to a pointer of any other value-type; this is exceptionally common in C code (reading a file into memory by reading the length of a struct, and then treating the memory as a pointer to that struct, for example).  Sometimes it's simply easier for us to do that in C# than it is to use a stream.  Consider the case of reading a PE file header; it's a nightmare!  So many lines of code when you could simply read in a buffer and call it a PE file header.  Strong typing imposes that limitation, but thankfully even on edge cases like this, we can work around it.

I'm not going to discuss the performance characteristics of pinned objects during a garbage collection; for one, they're hard to measure, but more importantly, it's been well-documented to hurt the performance of the garbage collector.

### Getting Pointers without the Pinning

There are other means by which to obtain, create, and manage pointers aside from the standard fixed statement.  As mentioned earlier, you can use the stackalloc statement to allocate a block of memory on the stack; it provides a pointer to the stack with the base of an array.  Alternatively, if you don't care about portability, you can use native Windows functions to allocate memory for you.  These functions might include LocalAlloc, HeapAlloc, VirtualAlloc, or VirtualAllocEx, depending on what your needs are.

An interesting prospect might be to allocate multiple heaps using the HeapCreate APIs; this would allow you to manage your memory per-area of responsibility; Noel Llopis suggests such a strategy in his book C++ for Game Programmers.  Although all of this memory management might seem like overkill, if you're really hunting for the next tweak to speed up your code, this might help you get over the line.

### Performance Characteristics of Unsafe vs. Safe Code

Let's not kid ourselves; unsafe code is inherently unsafe because the runtime doesn't manage the code for us.  So before using code like this in your applications, be absolutely certain that you need it.

The CLR provides the means to access heap memory via the Marshal.AllocHGlobal method.  The documentation notes that it uses LocalAlloc, probably because LocalAlloc doesn't require a pointer to a heap.  Despite the admonition that you'll get better performance and more features out of the other functions, the use of LocalAlloc does not seem to be a hindrance in speed relative to using HeapCreate/HeapAlloc/HeapDestroy.  The execution times are shown here:

 Debug Mode - 5 Iterations Release Mode - 5 Iterations Debug Mode - 25 Iterations Release Mode - 25 Iterations Normal .NET Array [] notation x86: 17ms; x64: 45ms x86: 15ms; x64: 65ms x86: 109ms; x64: 252ms x86: 95ms; x64: 333ms Marshal.AllocHGlobal with pointers x86: 15ms; x64: 36ms x86: 14ms; 30ms x86: 95ms; x64: 193ms x86: 80ms; x64: 148ms LocalAlloc P/Invoke with Pointers x86: 16ms; x64: 37ms x86: 14ms; x64: 31ms x86: 96ms; x64: 193ms x86: 78ms; x64: 161ms HeapAlloc P/Invoke with Pointers x86: 16ms; x64: 42ms x86: 14ms; x64: 32ms x86: 102ms; x64: 197ms x86: 88ms; x64: 166ms

Surprisingly, the normal array bracket notation performed significantly worse in release builds than in debug builds on x64; I don't really have an answer for why that would be.  I did not perform extensive statistical regression or even provide averages; I ran each set three times, and if they all looked mostly the same, I used the data.  These data are from x64 machines; the x86 results were from setting compilation target to x86 and running the program in WOW64.  I was surprised how much slower x64 was, though it might have been because we were using machine words on x86, and half-words on x64.  Perhaps memory access would be faster if we were using longs on x64.  (Prelim tests seem to confirm this; I will post a follow-up soon.)

Here are the P/Invoke declarations:

`   1: public enum LocalAllocFlags`
`   2: {`
`   3:     Fixed = 0,`
`   4:     Moveable = 2,`
`   5:     ZeroInit = 0x40,`
`   6: }`
`   7:  `
`   8: public enum HeapCreateFlags`
`   9: {`
`  10:     None = 0,`
`  11:     EnableExecute = 0x40000,`
`  12:     GenerateExceptions = 4,`
`  13:     NoSerialize = 1,`
`  14: }`
`  15:  `
`  16: public enum HeapAllocFlags`
`  17: {`
`  18:     None = 0,`
`  19:     GenerateExceptions = 4,`
`  20:     NoSerialize = 1,`
`  21:     ZeroMemory = 8,`
`  22: }`
`  23:  `
`  24: static class UnsafeNativeMethods`
`  25: {`
`  26:     [DllImport("kernel32.dll")]`
`  27:     public static extern IntPtr LocalAlloc(LocalAllocFlags flags, UIntPtr uBytes);`
`  28:  `
`  29:     [DllImport("kernel32.dll")]`
`  30:     public static extern IntPtr LocalFree(IntPtr hMem);`
`  31:  `
`  32:     [DllImport("kernel32.dll")]`
`  33:     public static extern IntPtr HeapCreate(HeapCreateFlags flOptions, UIntPtr dwInitialSize, UIntPtr dwMaxSize);`
`  34:  `
`  35:     [DllImport("kernel32.dll")]`
`  36:     public static extern IntPtr HeapAlloc(IntPtr hHeap, HeapAllocFlags dwFlags, UIntPtr dwBytes);`
`  37:  `
`  38:     [DllImport("kernel32.dll")]`
`  39:     public static extern IntPtr HeapFree(IntPtr hHeap, HeapAllocFlags dwFlags, IntPtr lpMem);`
`  40:  `
`  41:     [DllImport("kernel32.dll")]`
`  42:     [return: MarshalAs(UnmanagedType.Bool)]`
`  43:     public static extern bool HeapDestroy(IntPtr hHeap);`
`  44: }`

And finally, here's the benchmarking code:

`   1: class Program`
`   2: {`
`   3:     private const int ITERATIONS = 25;`
`   4:     static unsafe void Main(string[] args)`
`   5:     {`
`   6:         Console.WriteLine("Press <enter> to start.");`
`   7:         Console.ReadLine();`
`   8:  `
`   9:         Stopwatch arrayClock = Stopwatch.StartNew();`
`  10:         for (int iter = 0; iter < ITERATIONS; iter++)`
`  11:         {`
`  12:             RunArrayTest();`
`  13:         }`
`  14:         arrayClock.Stop();`
`  15:         Console.WriteLine("{0}ms elapsed for Array test, {1} iterations.  Press <enter> to continue.", arrayClock.ElapsedMilliseconds, ITERATIONS);`
`  16:         Console.ReadLine();`
`  17:  `
`  18:         Stopwatch marshalClock = Stopwatch.StartNew();`
`  19:         for (int iter = 0; iter < ITERATIONS; iter++)`
`  20:         {`
`  21:             RunMarshalAllocHGlobalTest();`
`  22:         }`
`  23:         marshalClock.Stop();`
`  24:         Console.WriteLine("{0}ms elapsed for Marshal test, {1} iterations.  Press <enter> to continue.", marshalClock.ElapsedMilliseconds, ITERATIONS);`
`  25:         Console.ReadLine();`
`  26:  `
`  27:         Stopwatch localClock = Stopwatch.StartNew();`
`  28:         for (int iter = 0; iter < ITERATIONS; iter++)`
`  29:         {`
`  30:             RunLocalAllocTest();`
`  31:         }`
`  32:         localClock.Stop();`
`  33:         Console.WriteLine("{0}ms elapsed for LocalAlloc P/Invoke test, {1} iterations.  Press <enter> to continue.", localClock.ElapsedMilliseconds, ITERATIONS);`
`  34:         Console.ReadLine();`
`  35:  `
`  36:         Stopwatch heapClock = Stopwatch.StartNew();`
`  37:         for (int iter = 0; iter < ITERATIONS; iter++)`
`  38:         {`
`  39:             RunHeapAllocTest();`
`  40:         }`
`  41:         heapClock.Stop();`
`  42:         Console.WriteLine("{0}ms elapsed for HeapAlloc P/Invoke test, {1} iterations.  Press <enter> to continue.", heapClock.ElapsedMilliseconds, ITERATIONS);`
`  43:         Console.ReadLine();`
`  44:     }`
`  45:  `
`  46:     private unsafe static void RunHeapAllocTest()`
`  47:     {`
`  48:         UIntPtr pSize = new UIntPtr((uint)(1048576 * sizeof(int)));`
`  49:         IntPtr pHeap = UnsafeNativeMethods.HeapCreate(HeapCreateFlags.None, pSize, UIntPtr.Zero);`
`  50:         if (pHeap == IntPtr.Zero)`
`  51:         {`
`  52:             Console.WriteLine("Could not create heap.");`
`  53:             return;`
`  54:         }`
`  55:         IntPtr pMem = UnsafeNativeMethods.HeapAlloc(pHeap, HeapAllocFlags.ZeroMemory, pSize);`
`  56:         if (pMem == IntPtr.Zero)`
`  57:         {`
`  58:             Console.WriteLine("Could not allocate heap.");`
`  59:             return;`
`  60:         }`
`  61:  `
`  62:         int* pNumbers = (int*)pMem.ToPointer();`
`  63:         for (int i = 0; i < 1048576; i++)`
`  64:         {`
`  65:             pNumbers[i] = i;`
`  66:         }`
`  67:         UnsafeNativeMethods.HeapFree(pHeap, HeapAllocFlags.None, pMem);`
`  68:         UnsafeNativeMethods.HeapDestroy(pHeap);`
`  69:     }`
`  70:  `
`  71:     private unsafe static void RunLocalAllocTest()`
`  72:     {`
`  73:         UIntPtr pSize = new UIntPtr((uint)(1048576 * sizeof(int)));`
`  74:         IntPtr pMem = UnsafeNativeMethods.LocalAlloc(LocalAllocFlags.ZeroInit, pSize);`
`  75:         if (pMem == IntPtr.Zero)`
`  76:         {`
`  77:             Console.WriteLine("Could not allocate heap memory.");`
`  78:             return;`
`  79:         }`
`  80:  `
`  81:         int* pNumbers = (int*)pMem.ToPointer();`
`  82:         for (int i = 0; i < 1048576; i++)`
`  83:         {`
`  84:             pNumbers[i] = i;`
`  85:         }`
`  86:         UnsafeNativeMethods.LocalFree(pMem);`
`  87:     }`
`  88:  `
`  89:     private unsafe static void RunMarshalAllocHGlobalTest()`
`  90:     {`
`  91:         IntPtr pMem = Marshal.AllocHGlobal(1048576 * sizeof (int));`
`  92:         if (pMem == IntPtr.Zero)`
`  93:         {`
`  94:             Console.WriteLine("Could not allocate memory.");`
`  95:             return;`
`  96:         }`
`  97:  `
`  98:         int* pNumbers = (int*) pMem.ToPointer();`
`  99:         for (int i = 0; i < 1048576; i++)`
` 100:         {`
` 101:             pNumbers[i] = i;`
` 102:         }`
` 103:         Marshal.FreeHGlobal(pMem);`
` 104:     }`
` 105:  `
` 106:     private static void RunArrayTest()`
` 107:     {`
` 108:         int[] array = new int[1048576]; //4mb array`
` 109:         for (int i = 0; i < 1048576; i++)`
` 110:         {`
` 111:             array[i] = i;`
` 112:         }`
` 113:     }`
` 114: }`

There isn't anything to complicated; a 4MB buffer is allocated using the selected method and then each 32-bit element is populated with its array index.  Unsafe code outperforms safe code in each x64 test, though the difference is much more marginal on x86.  The explanation is simple; safe code is checking the array index on every lookup.

Summary

Using pointers and unsafe code can be a boost to your application's performance, but you should consider where, when, and how you do it.  Since you don't have control over when the GC is invoked, pinning objects like arrays can be costly.  You might instead consider using Windows API functions or direct memory access functions through the Marshal class to organize your memory if you absolutely need to chug that last piece of speed out of your code, but be warned - it's not safe out there.

The "Speedy C#" Series: