Posts
39
Comments
101
Trackbacks
0
Sunday, April 28, 2013
Extending sp_WhoIsActive With context_info

If you work with SQL Server you should be familiar with Adam Machanic’s fantastic sp_WhoIsActive stored procedure. This procedure provides high levels of detail about all commands that are currently being executed on a SQL Server instance. I’ve found it immensely helpful for troubleshooting long-running queries and locking/contention issues in live production SQL Server instances. That said, I’ve always wanted to include one extra little bit of information in the dataset that sp_WhoIsActive returns: the id of the user responsible issuing for the database command. Obviously sp_WhoIsActive can report on the SQL or Windows account that was used to initiate the connection to the database and issue the command, but when you utilize connection pooling to let your application talk to SQL Server you end up having all of the connections use the same connection string which means that they all end up using the same SQL/Windows account to connect to the database. I want a way to see the user login from my application (what I’ll call the “application user” for the rest of this post) that is responsible for each of the queries being run. Let’s start by looking at the first problem: making SQL Server aware of the application user that initiated the connection.

Problem 1: Providing the Application User Information To The SQL Connection

In order for sp_WhoIsActive to have a prayer of figuring out what application user owns each database command being executed SQL Server needs to somehow tie that information to each spid that the application uses for running commands. The only reasonable way I’ve found to make this work is to leverage the CONTEXT_INFO SQL function. This function lets associate 128 bytes of data with the current database session. Fortunately this 128 bytes is more than adequate to store a 32 bit integer representing the ID of the application user that is responsible for initiating each database connection from your application. To do this, you’ll need to invoke the CONTEXT_INFO function on each connection that your application uses before you start using it to run commands. If you use a factory or provider pattern for doling out all of the database connections that your application uses (and if you’re not, you should definitely consider doing so) then implementing this CONTEXT_INFO call on each connection is pretty easy. Here’s a snippet of a function that can do this:

   1:   private void SetContextInfo(SqlConnection connection, int userId)
   2:          {
   3:              var command = connection.CreateCommand();
   4:              command.CommandType = CommandType.Text;
   5:              command.CommandText = "SET CONTEXT_INFO @currentUserId";
   6:   
   7:              var currentUserIdParameter = command.CreateParameter();
   8:              currentUserIdParameter.ParameterName = "@currentUserId";
   9:              currentUserIdParameter.DbType = DbType.Int32;
  10:              currentUserIdParameter.Size = 4;
  11:              currentUserIdParameter.Value = userId;
  12:              command.Parameters.Add(currentUserIdParameter);
  13:              command.ExecuteNonQuery();
  14:          }

This method takes a SqlConnection and user Id integer, then invokes a simple SQL command to call the SET CONTEXT_INFO command passing in the user id value that was provided.  Once this is working, the next step is to surface this user id to the sp_WhoIsActive stored procedure.

Problem 2: Exposing the CONTEXT_INFO data in sp_WhoIsActive Results

As of this writing, the latest version of sp_WhoIsActive (11.11) returns one row per session (SPID) that is actively executing a command. Since context_info is associated with each session, we should be able to see the context_info value for each row that sp_WhoIsActive returns. Unfortunately this info isn’t surfaced by sp_WhoIsActive out-of-the-box. That said there is a way that we can take the output of sp_WhoIsActive and add the context_info for each SPID that is returned.

The sp_WhoIsActive stored procedure exposes a number of optional parameters than affect its behavior. One of these parameters, ‘@destination_table’, lets you specify a name of a table into which you want to insert the dataset that the stored procedure gathers. We can use this to insert the results into a temp table and then join from that temp table to get the context_info for each session. We’ll use the sys.dm_exec_sessions dynamic management view to get the context_info for each session. Here’s how the SQL to do this shapes up:

   1:  --only including two columns here for brevity
   2:  CREATE TABLE #tempWhoIsActive(
   3:    [dd hh:mm:ss.mss] [varchar](15) NULL,
   4:    [session_id] [smallint] NOT NULL
   5:  )
   6:   
   7:  EXEC sp_WhoIsActive
   8:    @destination_table = '#tempWhoIsActive',
   9:    --define the outout columns to match the temp table definition. See the definition of
  10:    --the sp_WhoIsActive stored procedure for available column names
  11:    @output_column_list = '[dd%][session_id]'
  12:   
  13:  SELECT
  14:    ISNULL(CONVERT(INT, SUBSTRING(s.context_info,1,4)),-1) as [ta_user_id],
  15:    t.[dd hh:mm:ss.mss],
  16:    t.[session_id]
  17:  FROM #tempWhoIsActive t
  18:  JOIN sys.dm_exec_sessions s ON t.session_id = s.session_id
  19:   
  20:  DROP TABLE #tempWhoIsActive

The code above is creating a temp table, invoking the sp_WhoIsActive stored procedure with a pre-defined set of output columns to load that temp table, then joining the temp table to sys.dm_exec_sessions view on the session_id. By joining to the sys.dm_exec_sessions view we can access the context_info value for each session.  Note the use of the SUBSTRING and CONVERT functions to extract the first 4 bytes of the 128 byte context_info value and convert it to an integer. If you were putting some other type of data into context_info you’d need to extract and convert it accordingly. Also note the use of the ‘ISNULL’ function, as any SPIDs being run directly via SSMS, SQL Agents, or other types of applications talking to the database server might not have any context_info value set.

Now that I have the application user id for each SPID, I can join to the appropriate application tables to look up the user login and any other pertinent information I might need about the user. I can also now take this code and create a separate stored procedure to wrap the sp_WhoIsActive call and include the application user information from context_info to make this a bit easier to invoke in a pinch to see what’s going on at the database tier and correlate that activity to actual users of the application.

Posted On Sunday, April 28, 2013 2:16 PM | Comments (0)
Sunday, March 31, 2013
Memory Leak in CFClientBase<T> Service Proxy for Compact Framework .NET

Version 3.5 of the .NET Compact Framework didn’t initially ship with any direct support for calling remote WCF services. Technically all of the network-related components necessary were there, but getting a working WCF client stood up without something like the full .NET framework’s ServiceModel Metadata Utility Tool (svcutil.exe) would be a lot of very tedious work. Thankfully Microsoft eventually released a PowerToys package that included a Compact Framework flavor of the ServiceModel Metadata Tool (netcfsvcutil.exe) that could inspect a remote WCF service and generate client proxy classes for consuming it from a CF .NET application.

I work on a CF .NET application that makes heavy use of WCF services to support workflows that take place within the confines of a warehouse-type building by various workers with rugged handheld devices. The devices need to “phone home” to a server throughout the day to both provide and receive updates on the work that is currently in progress and remains to be done. As I mentioned in an earlier post I recently went through a round of memory profiling of this CF .NET application to find and eliminate memory leaks.  One of the more surprising sources of leaks that was discovered during this exercise stemmed from the code generated by the netcfsvcutil.exe metadata tool.

When used to generate code, the netcfsvcutil.exe tool creates two files. The first file contains classes representing the service and data contracts that map to the running service endpoints or metadata that were passed in to the tool. The file contains the classes that your application will use to communicate with the remote services and will differ depending on what services are provided. These class names are suffixed with the word ‘Client’ so I’ll refer to them as the client classes for the rest of this post. The second file that gets generated contains a base class (CFClientBase<TChannel>) that the client classes from the first class inherit from. This base class handles the actual communication with the remote service endpoint including opening/closing of connections and serialization/deserialization of the data going back and forth across the wire. The code in this base class is common to all services and is always generated the same regardless of the services that the netcfsvcutil.exe tool is pointed at.

During the course of memory profiling the CF .NET application I noticed that object reference counts seemed to be growing pretty steadily with each new call to a particular remote WCF service. After some digging around in the code for a bit a couple of things became clear:

  1. The object references being “leaked” did not seem to be any that were part of our application code. That is to say that the leak appeared to be originating from the code generated by the netcfsvcutil.exe metadata tool.
  2. The symptoms of the memory leak only occurred when talking to a remote service for which we instantiated a single instance  of the generated proxy client class and re-used that instance for the life of the application. Some of the remote services that app used were invoked infrequently and in those cases we would instantiate and dispose a client class instance as-needed. The service that was causing issues was used very frequently and we decided it was better to instantiate a single instance of the client once and make it available as a singleton for the life of the application.

Some research on the topic of memory leaks in netcfsvcutil.exe generated classes lead me to a post on the MSDN Social site titled, “Memory leak in Windows Mobile WCF proxy client?” in which user Kendall J Bennett described a memory issue similar to the one I had observed. User Kimberly Z posted the answer which I’ll summarize here.

The CFClientBase<TChannel> class contains a protected struct called CFContractSerializerInfo. The purpose of this struct is to keep track of information about the various XmlObjectSerializer instances that the client classes need to use when serializing and deserializing data that is sent and received over the wire. The CFClientBase class attempts to cache the XmlObjectSerializer instances that it creates in a generic Dictionary. The dictionary is keyed off of the CFContractSerializerInfo struct. The declaration looks like this:

 

private System.Collections.Generic.Dictionary<CFContractSerializerInfo, System.Runtime.Serialization.XmlObjectSerializer> serializers = 
   new System.Collections.Generic.Dictionary<CFContractSerializerInfo, System.Runtime.Serialization.XmlObjectSerializer>(2);

When the client class needs to obtain a serializer for a request or response it calls a protected method called GetContractSerializer that attempts to lazily initialize and cache the serializers in the generic Dictionary:

                if (serializers.ContainsKey(info))
                {
                    serializer = this.serializers[info];
                }
                else
                {
                    serializer = new CFContractSerializer(info);
                    serializers[info] = serializer;
                }

At a quick glance this code might look OK. If the serializers dictionary already contains this a serializer matching the serializer info that we’re looking for, then just just use that one. If not, create a new CFContractSerializer (which inherits the abstract XmlObjectSerializer) and assign it to a dictionary entry that will be keyed off of the serializer info that was requested so that it can be used by the next request. The idea here is that we’ll only ever need to create one XmlObjectSerializer for each type of request and response that the client class needs to deal with. Unfortunately this code doesn’t work the way it should. The ‘serializers.ContainsKey(info)’ call always returns false because the CFContractSerializerInfo struct that is being used as the key type for the dictionary does not implement the ‘Equals’ and ‘GetHashCode’ methods necessary to do the equality comparisons that the dictionary relies on. Every time a request is sent or a response is received a new instance of an XmlObjectSerializer is being created and placed into the ‘serializers’ collection in the client class that is being used to talk to the remote service. If a single instance of a client class remains in scope and active for long enough the memory used by the application will continue to grow unbounded.

At a high level there are probably at least 3 ways to fix this issue:

  1. Don’t let a single instance of a client class remain in-scope for too long. If you create a new instance of the client class each time you need to invoke a new service, the old client class instances and XmlObjectSerializers that they contain will eventually get garbage collected like any other object that is no longer needed.
  2. Change the CFClientBase<TChannel> to key the ‘serializers’ dictionary off of something other than the CFContractSerializerInfo struct. Since the struct is being used to describe serializers of a certain type of message, you could key the the dictionary off of the type itself or maybe even a string representing the fully qualified name of the type.
  3. Update the CFContractSerializerInfo struct to implement the Equals and GetHashCode methods.

Depending on the situation and specific needs of an application I think any of these three options could be viable. In my case I opted to go with option 3. The biggest drawback to this approach is that it requires modifying the generated CFClientBase code file, but since that file doesn’t change depending on the services being invoked this change is pretty safe to make. I ended up leaving the generated code file mostly intact and instead just modified the struct definition to be partial and then added another code file that defines the Equals and GetHashCode methods. For my purposes simply using the MessageContractType property of the struct was sufficient for use in the Equals and GetHashCode methods since we need to use one XmlObjectSerializer per type that needs to be serialized. I’ve put the code here in a Gist: CFClientBase.CFContractSerializerInfo.cs.

Posted On Sunday, March 31, 2013 10:22 PM | Comments (0)
Monday, February 25, 2013
SQL Server Identity Quirks

If you’ve worked with SQL Server for any length of time odds are that you’ve used the IDENTITY property to create a surrogate key for a table. These auto-incrementing primary keys can be very handy, but they have a couple of quirks that have bitten me enough in the past (and recently) that I thought they warranted a blog post.

Disclaimer: None of what follows is revelatory; anyone with an even cursory knowledge SQL Server will already know everything I’m writing here. That said, I’m writing this post anyway in the hope that this knowledge might reach one or two folks that wouldn’t otherwise have been exposed to it.

Quirk #1: A Rolled Back Transaction Still Consumes An Identity Value

A column that makes use of the IDENTITY property uses two bits of metadata: a seed and an increment. The seed is the value from which the identity sequence will start. The increment is the value that is added to the identity as new rows are inserted. Most IDENTITY columns start their lives with a seed value of 1 and an increment value of 1. The first row inserted into a table with an IDENTITY column configured in this way will get the value ‘1’ in the IDENTITY column. The second row inserted will get the value ‘2’. The third row will get ‘3’, and so on. In this way you might expect that a table with an IDENTITY column that never has data updated or deleted will have a perfect sequence of identity values from 1 to N where N is the total number of rows in the table. This is technically possible, but tends to fall apart when transactions come into play.

Consider a simple database table named ‘CustomerQueue’ with the following columns:

  • Id– int IDENTITY(1,1) an identity column with a seed and increment of ‘1’
  • FirstName – nvarchar(50)
  • LastName – nvarchar(50)
  • …and so on, you get the idea

Now consider the following set of SQL statements:

BEGIN TRAN
INSERT CustomerQueue (FirstName, LastName) VALUES ('John', 'Knibb')
COMMIT
GO
 
BEGIN TRAN
INSERT CustomerQueue (FirstName, LastName) VALUES ('Susan', 'Tibbs')
COMMIT
GO
 
BEGIN TRAN
INSERT CustomerQueue (FirstName, LastName) VALUES ('Jesse', 'Taber')
ROLLBACK
GO
 
BEGIN TRAN
INSERT CustomerQueue (FirstName, LastName) VALUES ('Steven', 'Johnson')
COMMIT
GO

What would you expect the contents of the CustomerQueue table to be after these four SQL statements are executed? You might think it would look something like this:

Id FirstName LastName
1 John Knibb
2 Susan Tibbs
3 Steven Johnson

Note how the row for ‘Jesse Taber’ is missing because that transaction was rolled back instead of committed.

That set of data is pretty close, but in reality you’d end up with data that looks like this:

Id FirstName LastName
1 John Knibb
2 Susan Tibbs
4 Steven Johnson

Note how the Id value of 3 is missing. That value would have been assigned to the ‘Jesse Taber’ row had that transaction been committed, but it wasn’t and now it’s simply missing from the sequence. You might think that this is odd behavior (I certainly did the first time I observed it), but if you think about it a bit you’ll realize that it must work this way or else it wouldn’t work very well at all.

Imagine a busy deli counter at a grocery store with one of those “please take a number” ticket dispensers that is designed to ensure that customers are served in the order in which they arrived at the counter. An IDENTITY column is a bit like the roll of tickets in the dispenser: it contains a sequence of non-repeating numbers where each number is 1 higher than previous one. When customers pull a ticket out of the dispenser they have expressed intent to request some service at the deli counter. This is just like an INSERT statement that has been made in a yet-to-be-committed database transaction: it just represents an intent to insert some data into that row.

When ‘Jesse Taber’ first walked up to the deli counter and took a ticket out of the dispenser he had every intention of ordering something. After waiting for a few minutes, however, his cell phone beeped reminding him of an appointment he completely forgotten about that was supposed to start in 10 minutes. He had to hurry out of the store before being able to place an order. When he left he effectively rolled back his intention to order and the ticket that he took with him left a gap in the deli counter queue sequence. Much like you wouldn’t expect the ticket dispenser to hunt down the person(s) who took the ticket after Jesse and re-issue him or her Jesse’s number, you shouldn’t expect a SQL Server IDENTITY column to re-issue the identity value that was doled out to a transaction that ultimately got rolled back. It might be technically possible to build a system that could re-issue numbers on the fly to eliminate gaps in the sequence, but the value that would be derived from it probably wouldn’t be worth the effort to build it.

If it absolutely drives you mad to see gaps in a sequence like this I’d suggest that you take the same attitude toward it that the workers behind the counter take when they call a number for a person who had to duck out of the queue: just move on. Much the deli counter ordering system is tolerant of gaps in the queue sequence your system should be tolerant of gaps in any IDENTITY sequence.

Quirk #2: DBCC CHECKIDENT Might Reseed Unless You Tell It Not To

While it’s best to take a “set it and forget it” attitude toward IDENTITY columns you may, from time to time, feel the need to much with the identity sequence directly. Luckily for you SQL Server comes with a nice IDENTITY-related shotgun with which you can shoot yourself in the foot: the DBCC CHECKIDENT command. This command lets you both view and change the current seed value for an IDENTITY column. In its simplest form this command takes a single parameter representing the name of the table for which you’d like to check the current identity value:

DBCC CHECKIDENT ("dbo.CustomerQueue")
 

Judging solely by the name of command (CHECKIDENT) and the parameter that’s been provided you might think that this command would only check the current identity value for the table and not modify it. After all, there is an additional ‘RESEED’ argument that can be specified if you wish to force the current seed value of the identity to change. In most cases, just calling CHECKIDENT without the RESEED argument will not modify the current seed value. If, however, the current identity value for the table is less than the maximum value in the column, then the CHECKIDENT command will automatically reseed the identity with the maximum value from the column. I’ve always thought this behavior was odd, and I’ve never been able to come up with a good explanation for why this command was designed this way. Granted, having the seed of the IDENTITY column be lower than the current maximum value in the column carries a lot of potential risks, but I think I’d still rather see the default behavior not modify anything unless the RESEED argument is specified.

The moral of the story: if you really just want to check the current value of the identity in a table without modifying it you should use the NORESEED argument:

DBCC CHECKIDENT ("dbo.CustomerQueue", NORESEED)
Posted On Monday, February 25, 2013 9:23 PM | Comments (0)
Sunday, January 27, 2013
Beware Singletons that Raise Events

It’s pretty well documented that one of the most common ways in which a .NET application can “leak” memory stems failing to unsubscribe from events when the subscription is no longer needed. This answer to a Stack Overflow question on memory leaks in C# describes the issue very succinctly:

Event Handlers are a very common source of non-obvious memory leaks. If you subscribe to an event on object1 from object2, then do object2.Dispose() and pretend it doesn't exist (and drop out all references from your code), there is an implicit reference in object1's event that will prevent object2 from being garbage collected.

MyType object2 = new MyType();

// ...
object1.SomeEvent += object2.myEventHandler;
// ...

// Should call this
// object1.SomeEvent -= object2.myEventHandler;

object2.Dispose();
This is a common case of a leak - forgetting to easily unsubscribe from events. 
Of course, if object1 gets collected, object2 will get collected as well, but not until then.
Original question by user Joan Venge. Answer by user Reed Copsey.

Note: It’s probably debatable as to whether or not the event handler scenario illustrated above represents a true memory leak or not. Some would argue that the garbage collector takes care of nearly all memory leaks that might have otherwise occurred in unmanaged code and that the event handler issue isn’t really a memory leak in the strictest sense of the term. In my opinion, having event handlers hold references to objects and preventing them from being garbage collected is going to cause the memory used by the application to grow unnecessarily and that is, for lack of a better term, a “memory leak”.

How often do you make sure to “-=” an event handler that you subscribed to with the “+=” operator? I for one know that un-hooking an event handler is rarely something that I give a lot of thought to when writing an application. Have I been inadvertently causing memory leaks by failing to do so?

Probably not. The indomitable Jon Skeet points out why these types of memory issues are rarely cause for concern in an answer to another Stack Overflow question related to event handler memory leaks asked by user gillyb:

… while an event handler is subscribed, the publisher of the event holds a reference to the subscriber via the event handler delegate (assuming the delegate is an instance method).

If the publisher lives longer than the subscriber, then it will keep the subscriber alive even when there are no other references to the subscriber.

If you unsubscribe from the event with an equal handler, then yes, that will remove the handler and the possible leak. However, in my experience this is rarely actually a problem - because typically I find that the publisher and subscriber have roughly equal lifetimes anyway.

It is a possible cause... but in my experience it's rather over-hyped. Your mileage may vary, of course... you just need to be careful.

I agree with Jon’s sentiment that this event handler “leak” is not typically a cause for concern and that you can usually get away without explicitly unsubscribing from events without causing memory issues at runtime. Recently, however, I came across a situation where an event subscription was causing a certain type of object (with a fairly large number of other objects in its graph) to remain in memory indefinitely causing pretty severe memory bloat in the application. In this case there was an object that raised events implemented as a singleton. The event raised by the singleton had many other instances of shorter-lived objects subscribe to it. When the shorter-lived objects fell out of scope the garbage collector was unable to clean them up because of the reference to the singleton’s event.

Example: Gym Membership System

Whenever I go to my gym to workout I have to check-in by scanning my membership ID at the main reception desk. The computer connected to the barcode scanner beeps, and my membership profile briefly comes up on the screen so that the staff can see my picture and verify that I’m the owner of the membership ID that was scanned. If I have to make a change to my account (like updating the expiration date on the credit card that they charge my membership fees to), they have to exit the “check-in” screen and pull up my account in another area of the application. I have no idea how this piece of software was built, but given what I know about how it’s typically used I can speculate about how it might be susceptible the type of memory leak that I’m talking about in this post.

To keep things simple for illustrative purposes, let’s say that this gym membership system consists of four total classes:

  1. A main menu class (form for launching the other forms)
  2. An account lookup class (form for finding and changing account details)
  3. A member check-in class (form that accepts input of membership ID from the scanner, looks up the membership profile and shows it on the screen)
  4. A scanner class (wrapper class for interfacing with the drivers for the laser barcode scanner that sits on the reception desk)

For the purposes of this example let’s say that the main menu, account lookup, and member check-in classes are all very simply implemented as Windows Forms.  The particulars of the implementation aren’t important, so let’s just call these classes the views of our application. Let’s also say that, for the purposes of this example, these views are only a very small subset of the views contained by the application and that users of the application are likely to have to switch between these different views many times throughout the course of the day. In any given day the application might have to display several dozen different views to the user so the application creates views on-the-fly as needed and disposes of them as soon as they are no longer needed. Put another way, only one view should need to be actively in-use at one time, and therefore any view that is no longer needed should immediately fall out of scope and be a candidate for garbage collection. The one exception to this design is the main menu, as users will continually return to the menu in order to bring up the next view that they need so the main menu view will serve as the main entry point for the application and have a single instance that will remain in memory for as long as the application is running.

The scanner class, on the other hand, is implemented as a singleton. The application only needs to support taking input from one scanner that is connected to the PC at the member services desk via USB. Initializing the scanner drivers and hardware can take a few seconds so in order to save time the single instance of the scanner gets initialized at application startup and lives for the duration of the app. The scanner class and supporting types might look something like this:

   1:      public class ScanReceivedArgs
   2:      {
   3:          public string ScanString { get; set; }
   4:      }
   5:      public interface IScanner
   6:      {
   7:          event EventHandler<ScanReceivedArgs> ScanReceived;
   8:      }
   9:   
  10:      public class DesktopScanner : IScanner
  11:      {
  12:          private static readonly DesktopScanner instance = new DesktopScanner();
  13:          private DesktopScanner(){}
  14:   
  15:          public static DesktopScanner Instance {get { return instance; }}
  16:          public event EventHandler<ScanReceivedArgs> ScanReceived;
  17:      }

Obviously this code doesn’t really do anything and is missing all of the “guts” that would make it actually be able to talk to a piece of hardware, but we have enough here for the purposes of this example. There will be a single instance of the DesktopScanner class as enforced by the private constructor with a public “Instance” property to return the sole instance that the class spins up of itself. The IScanner interface that it implements exposes an event that will be fired when the scanner recognizes that a barcode has been placed in front of it. The arguments passed to the event contain the string representing the value that was scanned.

Now let’s see how the member check-in form might be implemented to make use of the scanner:

   1:      public partial class MemberCheckinForm : Form
   2:      {
   3:          public MemberCheckinForm()
   4:          {
   5:              InitializeComponent();
   6:          }
   7:   
   8:          private void MemberCheckinForm_Load(object sender, EventArgs e)
   9:          {
  10:              DesktopScanner.Instance.ScanReceived += ScanReceived;
  11:          }
  12:   
  13:          private void ScanReceived(object sender, ScanReceivedArgs args)
  14:          {
  15:              //look up member details to display on screen...
  16:          }
  17:      }

When the form loads, we subscribe to the scanner’s ScanReceived event and point it toward a method that will take the arguments and look up the member details associated with the barcode that was scanned. As stated previously, there are a large number of other views used by this application, including an account lookup view. I won’t bother showing any implementation details for those views but suffice it to say that they do not use the scanner’s ScanReceived event.

After running the application for little while and switching between the various views it becomes evident that the memory used by the application continues to grow steadily until the application is shut down. The gym is open for very long hours during the day and the staff can’t be bothered to constantly shut down and re-start the application then things start to slow down. There appears to be a memory leak of some kind that needs to be fixed.

By using the excellent ANTS Memory Profiler tool from Red Gate Software we can attach to the running application and take a snapshot to see what objects are being held in memory. Here’s what the profiler tells us:

image

Here we can see all of the types being used by our application along with their size in byes and how many live instances there are of each. As expected, we see that the DesktopScanner and MainMenu classes have a single live instance. What’s unexpected, however, is that there are 11 instances of the MemberCheckinForm. If we are properly creating and disposing all of our views as they are needed, we shouldn’t have these instances sticking around in memory. It’s also important to note here that none of the other view classes are showing up in this snapshot.

We can drill down into the details a bit further to see what is keeping these instances of the MemberCheckinForm from being garbage collected:

image

Here in the Instance List view we can see that each instance of the MemberCheckinForm is not a GC root object (the “root” object that, for the purposes of garbage collection, is keeping the class from being cleaned up) and is 5 references away from the GC root object that is keeping it from being garbage collected. By clicking on the Instance Retention Graph we can see a graphical representation of the object that is serving as the GC root for these instances:

image

Here we can see plain as day that the MemberCheckinForm is being held by our DesktopScanner’s ScanReceived event.

So now we know what the problem is, how can we go about fixing it? There are a few different approaches we could take:

  1. Implement the MemberCheckinForm as a singleton: This would probably work in this very simple example, but it goes against the general design of the application that’s trying to create and dispose views on an as-needed basis. In my opinion this is not the right way to go here.
  2. Use a “weak event handler” pattern: There have been some attempts to create “weak event handlers” in .NET. The idea here is that you create event handlers that will not keep an object that has subscribed to an event from being garbage collected. This is an interesting idea and is something that I might explore further in another post. If you’re interested, a Google search for “weak events in .NET” should yield plenty of good information and example implementations of varying quality and completeness.
  3. Update the MemberCheckinForm to unsubscribe from the event: The simplest approach to dealing with this issue is to simply update the MemberCheckinForm so that it will unsubscribe from the event when it gets disposed. This way, as long as each instance of MemberCheckinForm is getting disposed properly (which it should be anyway) it will also break the reference it has to the long-lived scanner class.

In order to have the MemberCheckinForm unsubscribe from the scanner event we can just move the Dispose method out of the partial “designer.cs” file for the form and modify it to unsubscribe from the event. The full body of the modified class looks like this:

   1:     public partial class MemberCheckinForm : Form
   2:      {
   3:          public MemberCheckinForm()
   4:          {
   5:              InitializeComponent();
   6:          }
   7:   
   8:          private void MemberCheckinForm_Load(object sender, EventArgs e)
   9:          {
  10:              DesktopScanner.Instance.ScanReceived += ScanReceived;
  11:          }
  12:   
  13:          private void ScanReceived(object sender, ScanReceivedArgs args)
  14:          {
  15:              //look up member details to display on screen...
  16:          }
  17:   
  18:          /// <summary>
  19:          /// Clean up any resources being used.
  20:          /// </summary>
  21:          /// <param name="disposing">true if managed resources should be disposed; otherwise, false.</param>
  22:          protected override void Dispose(bool disposing)
  23:          {
  24:              DesktopScanner.Instance.ScanReceived -= ScanReceived;
  25:              if (disposing && (components != null))
  26:              {
  27:                  components.Dispose();
  28:              }
  29:              base.Dispose(disposing);
  30:          }
  31:      }

Adding the “DesktopScanner.Instance.ScanReceived –= ScanReceived” makes all the difference and ensures that the MemberCheckinForm can be garbage collected properly.

The moral of the story here is that a singleton that raises an event can work well, but care should be taken to ensure that subscribers to that event are cleanly unsubscribing when they fall out of scope.

Posted On Sunday, January 27, 2013 9:32 PM | Comments (2)
Monday, December 31, 2012
YAGNI and Professional Code

I’ve heard (and used) YAGNI (You Ain’t Gonna Need It) quite often in my software development career. It’s a battle cry for shipping a minimum viable product and letting the real-world usage dictate what new features and improvements are really needed. Generally speaking I think that this ruthless minimalism is a good thing. I think we’ve all fallen into the “pie in the sky” thinking about adding lots of bells and whistles to whatever feature we’re working on. I for one also know the feeling of spending a lot of time on one aspect of a new feature only to later discover that no one really uses it. I like to think that, over time, I’ve begun to develop some sense of when a given feature is likely to be useful and when I should YAGNI it out of my task list, but then again I also feel like the more I know the less I know. Lately I’m finding that when I’m in doubt it’s best to err on the side of doing less and keeping things as simple as possible.

That said, there is a certain classification of feature that I sometimes regret omitting in the name of YAGNI. I recently read a fantastic post on Oren Eini’s blog titled, “On Professional Code”. I think this quote from that post sums it up quite nicely:

…a professional system is one that can be supported in production easily. About the most unprofessional thing you can say is: “I have no idea what is going on”.

This post really hit home for me because I've recently transitioned into a support role (or devops, if you like) at work. One of the things that I’m now responsible for is assisting our support team with troubleshooting difficult issues and figuring out what’s going on when the phone is ringing off the hook with end-users complaining that the “system is slow” and they can’t get their work done. While it’s a relatively rare occurrence, I absolutely hate having to say, “I don’t know what’s going on”. When there’s an odd error being thrown at a user who is trying to do something in our system it’s terribly frustrating to have to crack open the source code of the system to figure out why it’s being thrown. The error messages that are logged are often loaded with developer-speak or things like, “this shouldn’t happen”. I’m one of the more tenured developers on our team so I know that if I have difficulty understanding why a given error is being raised our support staff has almost zero chance of being able to figure it out on their own. When they can’t figure it out, they have to ask for help and when they have to ask for help me or one of my co-workers has to stop working on something else (usually some improvement to our infrastructure, an internal tool that will make our lives easier, or some performance profiling/tuning) to help them and make sure our customers can get their work done. I feel very comfortable in saying that these kinds of poorly documented and understood error conditions can be tremendously costly to any company. If you have to read the source code of your application to fix an issue that doesn’t require changing the source code (i.e. it’s a bug in the system), then you’re failing to write “professional code” as Oren defines it in that post.

As a developer I know that I’m as guilty as anyone of writing sub-par error handling code and leaving cryptic error messages in the log. I can’t speak for anyone but myself, but I think that this happens for one of two different reasons:

  1. Mistake: Sometimes I simply didn’t think that a particular piece of input would ever end up being passed into that routine that I wrote. I think I’m usually a pretty good practitioner of defensive programming, but sometimes I make a mistake.
  2. YAGNI: Other times I might have made a conscious decision to check for invalid input or assert that data was in the state the I expect it to be in, but I figure that the terse off-the-cuff error message will be adequate for troubleshooting this issue that “shouldn’t really ever happen anyway”. I have more important user-facing aspects of this feature to complete still, so I can surely apply YAGNI to this error message and move on.

As long as software is written by human beings we’ll always make mistakes. I’m not interested in exploring the mitigation of damage caused by human error in this post because that’s a topic that’s been explored in-depth by folks much smarter and more experienced than I am. Instead I’d like to focus on issue #2: that application of YAGNI to error-handling features. I’ve come to the realization lately that there is no such thing as ‘YAGNI’ when it comes to exposing information about what’s going on in a live production application. In fact, I’ve started using another five-letter acronym to describe the development of features like this: YCNHE or “You Can Never Have Enough”. 

In my opinion, the cost-benefit ratio for adding useful information to error handling code is such that you can almost never have enough. Any time I add a message to some code that will end up being logged I ask myself the following question: “Who will read this message and what action will they need to take?”. Almost no one can see a message like, “Unexpected value for parameter InputType” and know what needs to happen to fix it. This is just an informative statement with no imperative command or pointer to additional information. Granted there’s probably a ton of contextual information that was captured along with the message like the timestamp from the server, a stack trace showing what routine or line of code was being executed, and the username of the person that was running the application when the error occurred. That type of contextual information can be very useful to a developer trying to reproduce or fix a defect but it’s nearly useless to a non-developer trying to resolve an issue.

So what can you do to help alleviate this issue? Here are a few thoughts I’ve been formulating and will be experimenting with in the coming months:

  1. Encourage all developers to ask themselves, “Who will read this message and what action will they need to take?” whenever they are logging a message. Asking yourself this question leads to better error messages. Some quick examples of easily added information that can be very useful to non-developer staff:
    1. Does this message represent something that needs to be addressed immediately or is it just some information being captured for future reference?
    2. What is the primary key / identity of the records from the database that were involved in the error?
    3. What were the specific values that were input that caused the overflow/out of range/divide by zero/whatever run time error.
  2. Consider using numeric error codes as a point of reference in all logged errors. Often a logged error message can’t (and shouldn't) contain all of the information that might be needed in order to correct the issue. By using error codes you can easily setup external documentation about each unique error code that can serve as a place to capture additional troubleshooting steps, support staff notes, and all of the other various bits of information that can be useful when troubleshooting an issue in production. Using error codes can also help you establish relative severity in your errors. For example, you could say that codes in the 50000+ range represent potentially fatal errors that need to be sent out to e-mail/pager notifications immediately while stuff in the 10000 range might just need to be logged for future reference if needed.
  3. Create “dry run” and/or “debug” modes for complicated procedures/algorithms. If you can allow support staff to do a “dry run” of a complicated procedure that a customer is trying to do that will create a detailed debug-level log of everything that happened you can let them see how the problem gets broken down into steps by the code and on which step things fell apart. This is the type of detail that developer might need to use an interactive debugger for, but if a particular process in your system is complicated enough that you need to step through it a lot why not let all members of the team (developer or otherwise) get that same step-through experience?
  4. Make it easy for all members of the team to build and access support tools. The people that support your application in production always need new tools. They need a way to un-lock that bit of data for an end user who mistakenly locked it too soon. They need to be able to execute a query to see how many customers make use of a configurable feature. They need to be able to insert new rows into lookup tables that don’t get changed frequently enough to warrant the construction of an user-facing interface. Sometimes it’s faster and easier to simply do these one-off tasks manually on-the-spot than to build a tool but doing things manually simply doesn’t scale as your team and customer base grows. If you can lower the overhead involved with building these kinds of tools then you’re far more likely to actually build them. Command line applications are great for this because you don’t have to get bogged down in the creation of UIs for these tools that will only be used by internal staff.
  5. Have a resource/team dedicated to improving the situation. Our support/devops team is still new and getting ramped up, but I’ve already found it immensely helpful to to be able to improve our error messages and internal documentation on the spot whenever I need to answer a question for a member of our support team. The developers that build customer-facing features won’t always have the foresight to build features in a way that will make them easy to troubleshoot in production so having a team dedicated to that job will help ensure that it gets done.

I think it’s pretty commonly accepted in the software development community that “good” code is easily readable and maintainable by other developers, but I think that this notion needs to be expanded. Good code should also be easily supportable by people who aren’t developers. Next time you write a new feature take some time to think about how that feature might need to be supported in production. Are there things that would be relatively easy to add that might pay big dividends down the road in terms of improving the supportability of the system in production?

Posted On Monday, December 31, 2012 12:28 PM | Comments (0)
Thursday, December 27, 2012
Finding Memory Leaks in .NET Compact Framework Applications

This post is a collection of my findings from a recent effort to eliminate memory leaks in a .NET Compact Framework (CF .NET) application. All of the information in this post is available elsewhere on the web but after spending a lot of time pulling it all together for my own purposes I thought I might write up my findings and maybe save someone else a few hours of time down the road. The majority of this post will focus on the initial setup needed to even get to the point that you can start to profile memory usage and find leaks in a running CF .NET application as I found that to be the most confusing and frustrating part of the entire process.

I’ve found that the .NET Compact Framework Remote Performance Monitor tool is the most memory profiling tool available for the purposes of finding memory leaks. Here are a couple of very informative blog posts by Steven Pratschner describing how this tool can be used:

Analyzing Device Application Performance With The .ENT Compact Framework Remote Performance Monitor

Finding Memory Leaks using the .NET CF Remote Performance Monitor

1. Setup A Windows Mobile Emulator Image

If you already use a Windows Mobile emulator image for doing CF .NET development then you should be able to just use that same image to profile memory usage and find leaks. If you don’t currently use an emulator image you’ll need to set one up and get to the point that you can deploy and run your app on the emulator. You might at this point ask why you can’t do this on a physical device. You can do it on a physical device but I’ve opted to use an emulator for a few reasons:

  1. Most physical devices that I have access to run Windows CE and I have not yet been able to get the memory profiling tool I prefer to use to work with Windows CE.
  2. I wanted to make this easy to setup for other members of the development team that I work on so that others will be able to work on memory leak issues in the future.
  3. Generally speaking I’ve found it’s easier to fire up an emulator image and start working than to mess with cradling a physical device.

If you need to setup an emulator image to work with, start out by grabbing some from here: Windows Mobile 6.1.4 Emulator Images. Download the ‘Professional Images’ package from that site and install it. Once installed you should be able to start, cradle, and deploy your application to a running emulator image.

2. Install Hotfix KB 2436709 for .NET Compact Framework 3.5

Once you have an emulator that can run the application you want to profile, you’ll need to install a hotfix to the Compact Framework in order to let the Remote Performance Monitor take snapshots of the garbage collected heap while the application is running. These snapshots are what ultimately let you see what objects are leaking as the application runs. You can request and download this hotfix here: http://support.microsoft.com/kb/2436709

Once downloaded, cradle the emulator and copy the .cab file over to the emulator and run it on the device to install over the existing .NET Compact Framework on the device.

3. Install The .NET Compact Framework Remote Performance Monitor

The Remote Performance Monitor tool comes with the Power Toys for .NET Compact Framework 3.5 package. You can download this package here: http://www.microsoft.com/en-us/download/details.aspx?id=13442

It’s worth nothing here that I also took a look at the CLR Profiler tool which also ships with the Power Toys package. This tool was easier to get setup than the Remote Performance Monitor, but I found it really hard to parse the resulting output for an app as large and complicated as the one I was profiling. Also, taking snapshots with this tool was painfully slow to the point that it wasn’t really practical for me to use. As always, your mileage may vary.

4. Deploy And Profile Your Application

Once you have an emulator image that can run your application and has the properly hot-fixed version of the .NET Compact Framework installed you’re ready to start profiling your application with the remote performance monitor. The blog posts that I linked to at the top of this article have a lot of detailed and helpful information about this, but I’ll summarize it here as well.

First, make sure that your emulator is cradled and that the application you want to profile is available on the emulator. Also be sure that no applications are currently running on the emulator.

Next, start the Remote Performance Monitor and click the green ‘Play’ icon to start and profile an application. This brings up the ‘Launch Application’ dialog:

image

From this dialog you’ll select the device that you want to profile the application on. If you’ve setup an emulator for this you should have an option like ”Pocket_PC (ActiveSync)” in this list. You’ll also need to specifiy the path to the application that you want to profile on the device. If your app lives under the “Program Files” path in a sub folder called “Myapp” you would enter: “\Program Files\MyApp\MyApp.exe”. You can optionally specify a path on the desktop from which to copy the application, but I’ve always found it easier to just have the application already deployed and available on the emulator. Finally, enter any startup arguments that your application might need and then click ‘OK’ to launch and start profiling the app.

5. Take and Compare GC Heap Snapshots

Once your app is running and being profiled the main Performance Monitor Window will start being constantly updated with a wide variety of metrics. There are a lot of interesting metrics here and most of them line up with counters that you can setup and monitor in PerfMon for Windows applications on a desktop or server. For the purposes of finding memory leaks, however, we’re primarily concerned with the ‘GC Heap’ button that will become available in the main toolbar of the Performance Monitor.

Clicking the ‘GC Heap’ button will force a garbage collection on the running application and then take a snapshot of the garbage collected heap. This snapshot lets you see all objects that the garbage collector wasn’t able to clean up. As you use your application and keep taking snapshots you can start to see if certain types of objects keep getting higher and higher instance counts which is a pretty good indication that you have a leak somewhere. The Finding Memory Leaks using the .NET CF Remote Performance Monitor blog post that I linked to at the start of this post has a lot more information and a good example of how to do this.

Posted On Thursday, December 27, 2012 6:26 AM | Comments (0)
Wednesday, December 19, 2012
C# to VB .NET Conversion Issues

In my last post I discussed the approach we used at my job to convert an ASP .NET web forms application written in VB .NET to C#. In that post I alluded to some of the issues that we encountered during the conversion but didn’t delve into any of the technical details. I decided to take the technical specifics of some of the more interesting conversion issues and make a separate post of them. If you ever embark on a conversion project like ours these might be of use to you.

Option Strict and Option Explicit

This isn’t a conversion issue per se, but if you want to convert a VB .NET project to C# you definitely want to have these compiler settings turned on in your VB .NET project prior to doing so. Turning these compiler flags on forces you to both explicitly declare all variables before using them (Option Explicit) and disallows some of the implicit data type conversions that VB .NET will do for you (Option Strict). Having both of these flags turned on will make the VB .NET code behave more like its C# equivalent with respect to variable declaration and type conversion. That said, if you’ve been developing in VB .NET without these settings on you might not be the type of person that would really want to convert that code to C# anyway.

Implicit Data Type Conversions

Even with Option Strict turned on VB .NET is more permissive with implicit data type conversions than C#. For example, in VB .NET the following code will compile with Option Strict turned on:

   1:   
   2:  'Enumeration for defining columns in a grid
   3:  Public Enum OrderGridColumn
   4:        OrderDate
   5:        OrderedBy
   6:        ShipDate
   7:  End Enum
   8:   
   9:  'When binding data to the grid:
  10:  gridRow.Columns(OrderGridColumn.OrderDate).Text = data.OrderDate.ToString("g", CultureInfo.CurrentCulture)

Here we’re using an enumeration to name the indexes of the columns in a grid for displaying order data. At runtime we’re using these values to access the grid’s columns and do special formatting for the OrderDate field. This is a somewhat trivial example, but we had lots of code like this scattered throughout our VB .NET project. The C# conversion tool converted line 10 of the above snippet to:

gridRow.Columns[OrderGridColumn.OrderDate].Text = data.OrderDate.ToString("g", CultureInfo.CurrentCulture);
Since the indexer on the Columns field of the gridRow takes an integer as a parameter, the C# compiler complains about not being able to implicitly cast the OrderGridColumn value to an int. VB .NET, on the other hand, had no issue with that implicit conversion. Immediately after using the automated conversion tool on our VB .NET project we had tons of C# compiler errors related to this implicit data conversion. Thankfully this was pretty easily fixed with a regular expression and Visual Studio’s find/replace feature to shim in the explicit cast to an int:
gridRow.Columns[(int)OrderGridColumn.OrderDate].Text = data.OrderDate.ToString("g", CultureInfo.CurrentCulture);

For the record, there isn’t much that I like about VB .NET, but being able to implicitly cast enum values to integers is one thing that I miss.

Query-style LINQ expressions and Lambdas

VB .NET supports query-style LINQ expressions just like C# but with slightly different syntax. The automated conversion tool completely whiffed on converting these queries from VB .NET to C#. I ended up re-writing them to use the LINQ extension methods and the converter did better with those but still got choked up on some of the more complicated lambdas. If I had it to over again I would probably have hunted down these bits of VB .NET code with query expressions/heavy lambda use and re-written them to use simple loops prior to the conversion. After the conversion I could have then just let ReSharper do the LINQ refactoring for me.

Explicit Return Statements

In VB .NET you can return a value from a function or property getter by assigning the return value to the name of the property or method, like this:

   1: Public Function Add(ByVal firstNumber As Integer, ByVal secondNumber As Integer) As Integer
   2:   Dim result as Integer = firstNumber + secondNumber
   3:   Add = result
   4:   Exit Function 'You don't technically need the explicit Exit Function in this case
   5: End Function

Obviously C# doesn’t use this convention and requires an explicit return statement for all possible execution paths of a method or property getter. The automated conversion tool we used didn’t always properly convert to explicit return statements so I ended up changing places in VB .NET where we were using this convention to use explicit return statements instead.

When Clauses In Catch Blocks

VB .NET allows the use of ‘When’ clauses when catching exceptions to define multiple catch blocks that are used only when the condition in the When clause is true. C# doesn’t really have an equivalent way of doing this, so you have to pay some special attention to any Catch…When blocks when converting VB .NET to C#. In most cases the same run-time behavior can be achieved in C# by catching the desired exception types and then using switch or if/else statements to branch the handling logic.

For example, if you had the following VB .NET code:

   1:  Try
   2:       'Some code that might throw an exception
   3:  Catch ex As SqlException When ex.ErrorCode = SomeErrorCode
   4:        'handle this error code
   5:  Catch ex As SqlException When ex.ErrorCode = SomeOtherCode
   6:       'handle the other error code
   7:  End Try

The equivalent C# would be:

   1:  try
   2:  {
   3:     //some code that might throw an exception
   4:  }
   5:  catch (SqlException ex)
   6:  {
   7:      if(ex.ErrorCode == SomeErrorCode)
   8:          //handle this error code
   9:     else if (ex.ErrorCode == SomeOtherCode)
  10:         //handle the other error code
  11:     else
  12:       throw;
  13:  }

Note that we need to re-throw the exception in the ‘else’ condition to completely mirror the VB .NET catch blocks, as the exception would not have been caught at all if it didn’t meet one of the error code conditions.

Divide By Zero Errors

The last issue I want to discuss in this post was by far the strangest and least expected of all of the issues that we encountered while reviewing and testing the converted code.

Consider the following snippet of VB .NET code:

   1:  Dim dividend As Integer = 1
   2:  Dim divisor As Integer = 0
   3:  Dim quotient As Double = dividend / divisor
   4:  Console.WriteLine(quotient)

This code will print ‘Infinity’ to the console. Now consider the following equivalent C# code:

   1:  int dividend = 1;
   2:  int divisor = 0;
   3:  double quotient = dividend/divisor;
   4:  Console.WriteLine(quotient);

This code will throw a System.DivideByZeroException. The reason this happens is because of the difference in the way that VB .NET and C# interpret the “/” mathematical operator. In C# the “/” operator performs division that is consistent with the types of the dividend and divisor. In the C# snippet above the dividend and divisor are both typed as integers so the “/” operator is performing integer division and therefore throwing an exception. In VB .NET on the other hand the “/” operator always does division that will return the full quotient including any remainder value in the fractional portion (see http://msdn.microsoft.com/en-us/library/25bswc76(v=vs.80).aspx). When the division operation returns a floating point number it has the flexibility to indicate that the value approaches infinity (even if that’s not technically what the result of that division is).

To help deal with this VB .NET also exposes a “\” operator that will always return an integer result. If we were to re-write the VB .NET snippet above to use the the “\” operator for division instead it would also through a DivideByZeroException. Similarly if we were to tweak the C# code to type the dividend and/or divisor as a ‘double’ it would write ‘Infinity’ to the console. Note that the the ‘Infinity’ value is actually the PositiveInfinity constant exposed by the double data type.

If you’re interested there are a couple of questions on Stackoverflow and the Programmers Stack Exchange site that explore different behaviors for dividing by zero in programming languages:

http://programmers.stackexchange.com/questions/119987/should-integer-divide-by-zero-halt-execution

http://stackoverflow.com/questions/6563788/divide-by-zero-infinite-nan-or-zero-division-error

Posted On Wednesday, December 19, 2012 9:42 PM | Comments (0)
Thursday, December 13, 2012
Converting VB .NET to C#: A Post Mortem

For the past few years I’ve been working with a fairly large ASP .NET web application. The app began its life as Classic ASP and was later ported to ASP .NET web forms written in VB .NET. Over the years the language preference of the development team shifted and we started finding ways to leverage C# alongside the existing VB .NET web forms project. This shift lead to a collection of class libraries, Windows services, and assorted utilities written in C# that were built up around the aging VB .NET code base. We found ways to push a lot of code out of the VB .NET web forms project, but still found ourselves having to add or edit code in the VB .NET project often enough to make it painful. In addition, building our main solution containing the VB .NET web forms project and our various C# class libraries took forever, apparently because Visual Studio didn’t like our mixed-language projects. During our first annual DevCon (a yearly week-long meeting of our distributed team) in early 2011 we identified the presence of VB .NET as our primary source of pain on the development team and committed ourselves to eliminating it.

Why?

While the majority of this post will be focused on how the conversion was done I think it’s important to briefly talk about why we did it. Taking a large and stable codebase and changing the language it’s written in might seem foolhardy. At a past job I had explicitly decided against converting a much newer and smaller VB .NET application to C# based solely on the fact that it didn’t add any value for our users. I was working as a contractor at the time and couldn’t tell our client that we wanted to spend 2-3 weeks doing a “re-write” that wasn’t going to add any of the new features that were needed. At my current job, however, we strive to devote a portion of our development time each sprint toward eliminating developer pain. We understand that alleviating this pain may not immediately add value to our product, but it will increase productivity and reduce attrition in the future. Some teams use the term “technical debt” when referring to the painful areas of their codebase and recognize that, just like any debt, it’s most cost effective to pay it off quickly.


You could probably make the argument that programming language choice is not really technical debt, but technical debt or not we knew one thing: everyone hated working with VB .NET and that alone was reason enough to get rid of it.

Our Approach

Our VB .NET web forms project consisted of roughly 300,000 lines of code spread over about 600 different pages (.aspx files). Due to the sheer size of the project we opted to use an automated conversion tool to do the brunt of the work for us. There are a number of VB .NET –> C# converters out there, but we ultimately decided to use the VBConversions VB .NET to C# Converter because it offered the fastest, easiest, and highest quality conversion of any of the tools that we evaluated. That said, no automated conversion tool will produce 100% perfect results for all conversions and no one on the team was comfortable blindly trusting an automated tool, so we also had to manually review every converted file to clean up compiler errors and other incorrectly converted code that we found. The code review was followed by a regression test of the application before finally releasing it to the wild.

Conversion Preparation

While we we identified the elimination of VB .NET as a top technical debt priority at our DevCon in early 2011 we didn’t actually complete the C# conversion project until our DevCon in early 2012. While the project spanned roughly 1 year, it certainly didn’t take a year’s worth of developer hours to complete. Any time spent on the project in the months between the 2011 and 2012 DevCon meetings was spent in small increments (i.e. a few hours here and there) doing preparation work.


The preparation work for this project consisted mostly of identifying potential issues with the VB .NET code that would trip up the converter. The VBConversions tool will generate list of suggested changes for the source VB .NET code that help it perform a more accurate conversion. In this way it encourages you to run a conversion, fix potential issues, and run the conversion again to see the updated results. Our initial pass at converting the VB .NET code yielded a C# project with well over 5,000 compiler errors. By the time we were ready to make the final conversion and begin our manual review process we were down to several hundred compiler errors. If we had taken more time with this step we likely would have ended up with an even cleaner conversion, but I don’t think that we would have ever had a 100% perfect result.


Prior to conversion we froze all other development efforts because we knew that we need to focus solely on the conversion until it was completely finished before trying to pick up any new work. Trying to balance new development work with the conversion and the potential bug fixes that would be needed following the conversion would be too difficult, so we had to get buy-in from our product management to let us tackle this conversion effort while putting other work on-hold for awhile. Once the conversion process began the only changes going into source control were related to the conversion itself.

Conversion

Because the entire development team would be in the same conference room at the same time for a week during our 2012 DevCon meeting we decided to finally pull the trigger and get the conversion done during that week. Over the weekend prior to the meeting I ran the final conversion and committed the converted C# project to source control alongside the VB .NET project. At this point we had code in source control that was known to be broken. This is normally a big no-no, but having everything in source control was going to make collaboration much easier. Also, we left all of the existing VB .NET code and build scripts untouched meaning that all of our CI builds and QA deployment scripts continued to work even though we had a big pile of broken code in source control. We started the week with two Visual Studio solutions that were under the same top level folder in source control:

  • Original Website Solution
    • Core Class Libraries and Services (already written in C#, left untouched during the conversion)
    • Original VB .NET Web Forms project (also left untouched during the conversion)
  • C# Website Solution
    • Core Class Libraries and Services (same projects referenced in the original solution)
    • Converted C# Web Forms project (the converted project that needed to be reviewed)

Having both the original VB .NET and converted C# exist side-by-side in source control allowed us to open both projects for side-by-side comparison during the review process. We were also able to make commits of the cleaned up C# code after each file was reviewed.
We used a shared Google Docs spreadsheet during the conversion to help keep track of our progress. This spreadsheet had the following columns:

  • File Path – Relative path and file name of the VB .NET code file (since the conversion produced a one-to-one mapping of code files we ended up with one row per file to be reviewed)
  • Lines (approx) - Rough count of the lines of code in the VB .NET file (We came up with this count using a simple Powershell script that you can find here: https://gist.github.com/1674457)
  • Reviewer – As developers picked up files to review they would put their names in this column to ensure that we didn’t duplicate our efforts.
  • Status- After items were reviewed we’d put ‘Reviewed’ in this column. We’d also use color highlighting in this column to indicate items that might need a second review or that were particularly troublesome to clean up.


We used the following general approach while reviewing each converted file:

  • Check the ‘Conversion Issues’ output from the conversion tool for any issues reported with that file (this output was dumped into another shared Google docs spreadsheet after the initial conversion was complete)
  • Fix any compiler errors (ReSharper was very helpful as it was easy to see how many compiler errors were present in any given file).
  • Force a regeneration of the .designer.cs files by making a minor no-op tweak (e.g. add a whitespace and then remove it) to the .aspx markup and saving it again.
  • Use ReSharper to clean up any ‘using’ statements that were importing unneeded namespaces (the conversion pulled a ton of unneeded using statements into each code-behind file).
  • If any bit of code seems odd, compare it with the original VB .NET

Our DevCon meetings take a Monday-Friday schedule and we began the conversion review and cleanup on Monday morning. By Wednesday night we had the project compiling successfully and were able to update our build scripts to start pushing the newly converted code out to our QA machines for regression testing.

Regression Testing

The regression testing effort turned into an all-company affair as we ended up enlisting members of the support and product management team in addition to developers and QA analysts to get it all done. Even with the added resources, however, it wasn’t feasible to fully regression test all 600 pages in the application. We did some quick brainstorming and came up with a prioritized list of the pages and services exposed by the web application that would need the most rigorous regression testing. We used the two pretty simple criteria to determine what areas of the application needed the most attention:

  • Does the page/service alter data or just display it? (i.e. is there a potential for data corruption?)
  • How often does the page get used (Google Analytics, IIS logs, and some home-grown analytics tools helped us answer these questions?)

Areas of the application that both alter data and/or are used a lot got the most attention from our QA team while those that only read data but are also used frequently were the second highest priority. For areas of the site that were not used very frequently and had low risk for data corruption were only “smoke-tested” to ensure that they executed without throwing any errors. We added a couple of columns to the shared Google Docs spreadsheet that we used to review each file to keep track of files that had been tested and whether or not any issues had been found. One nice side effect of this exercise was the identification of a few pages in our website that were not being used anymore and could be deleted.


One of the biggest challenges we had during the test effort was figuring out how to test certain areas of the code that are not easily accessible. For example, code-behind files for shared user controls (.ascx) had to be converted and reviewed, but our QA team didn’t necessarily know what pages those controls were used on or how to exercise their code. In some cases a user control might only appear when a certain configuration is enabled and that required that a developer work with them to figure out whether or not each bit of shared code had been exercised adequately.


The regression testing effort ended up spilling over into the week following the DevCon meeting, but we were ready to deploy the converted code by the middle of the week following.

Rollout

We knew that we hadn’t been able to fully regression test every bit of the converted code and were certain that issues would crop up once the code was out and being used in our production environment. We maintain a production environment that many of our customers use in a SaaS model and several of our customers self-host our application on their own hardware. Our production environment has a load-balanced web farm which we decided to use to our advantage for the initial rollout of the converted code.


Once our regression testing was finished we created a release branch of the converted code just like we would for any other normal deployment. We then took some of the servers in the farm out of the load balancer rotation and modified our deployment scripts to push the converted code from the release branch to those now offline web servers. Once the deployment was done, we swapped them back into the load-balancer rotation and removed the ones with the older VB .NET code.


When issues were found and reported, we were able to quickly swap out the servers and get the older reliable VB .NET code running in production again while we patched the code in the release branch, re-deployed to the offline servers and swapped them back in. I don’t recall the exact numbers, but I don’t’ think we had to do this more than 5-6 times. Because we had created a release branch we were able to let most of the development team get back to new development work against trunk while a portion of the team focused on fixing any issues that cropped up with the conversion in the release branch. Any fixes that were made in the release branch were merged back to trunk. By the end of the week following our DevCon the conversion project was officially finished and the entire team returned to our normal flow of development work. I think we still fielded a few minor bugs related to the conversion here and there over the next couple of weeks, but most of the issues shook out within the first week.

Lessons Learned

I have a few key takeaways from this effort:

  • We had no idea how long this would take us before we started and we didn’t bother trying to do much in the way of estimating. Instead, we just committed ourselves to getting it finished and powered through for as long as it took. We had to make some adjustments along the way (e.g. pulling in the support team to help with testing and only keeping 2-3 developers to fix the conversion issues after the initial rollout), but I think it all came together as well as it could schedule-wise.
  • I wish I had done more to track how many hours we spent on the conversion project. I could estimate based on my (likely faulty) recollection of the project, but it would be nice now to have a feel for how much the conversion really cost us in terms of hours.
  • The process of identifying every endpoint in our web application and, at a minimum, smoke-testing each one is an excellent housekeeping exercise. We identified a few areas of our application that are not used anymore and identified a few long-standing bugs that were not related to the conversion effort at all. I’d like to have us go through this process again, maybe once every 2 years or so. Also, by enlisting the majority of the company in this effort were ended up exposing some folks to areas of the app that they normally never see and ended up sharing a lot of valuable application-specific knowledge with each other.
  • I don’t think we could have done this as quickly as we did if we didn’t have the entire team in the same room for a week. I might even go a step further and say that we might not have even been able to pull it off at all.
  • I think that the collective confidence of the team in a successful outcome for this project ebbed and flowed quite a bit throughout the project. There were a few occasions when we encountered a very odd issue that gave us pause and made us wonder if we were in for a disaster. Having a boss who really understood the value of the project and encouraged us to press on and complete it despite the bumps in the road was crucial to the success of the project.
  • Because I did the initial conversion and committed it to source control many of the files in our website project now only have a single commit in their history (with my name on it). We moved the original VB .NET project to an “archive” location in our source control tree so we can still go digging and find earlier revision history for a file if we need to, but until then all questions of, “who last worked on this file” usually end up being answered with my name. If I were to do this over again I might investigate a method to migrate the commit history of each of the original .vb files to their .cs counterparts, but at this point it’s not worth the effort.
Within a few months of finishing the conversion project, it was hard to imagine that we used to have so much VB .NET in our application. By doing this conversion we were able to remove the, “VB .NET (but we’re phasing it out)” bullet point from our developer job postings and reduced the various VB .NET related grumblings in our team chat logs by 100%. It was a big job, but I think it was worth it in the end.
Posted On Thursday, December 13, 2012 10:08 AM | Comments (0)
Saturday, November 10, 2012
API Message Localization

In my post, “Keep Localizable Strings Close To Your Users” I talked about the internationalization and localization difficulties that can arise when you sprinkle static localizable strings throughout the different logical layers of an application. The main point of that post is that you should have your localizable strings reside as close to the user-facing modules of your application as possible. For example, if you’re developing an ASP .NET web forms application all of the localizable strings should be kept in .resx files that are associated with the .aspx views of the application. In this post I want to talk about how this same concept can be applied when designing and developing APIs.

An API Facilitates Machine-to-Machine Interaction

You can typically think about a web, desktop, or mobile application as a collection “views” or “screens” through which users interact with the underlying logic and data. The application can be designed based on the assumption that there will be a human being on the other end of the screen working the controls. You are designing a machine-to-person interaction and the application should be built in a way that facilitates the user’s clear understanding of what is going on. Dates should be be formatted in a way that the user will be familiar with, messages should be presented in the user’s preferred language, etc.

When building an API, however, there are no screens and you can’t make assumptions about who or what is on the other end of each call. An API is, by definition, a machine-to-machine interaction. A machine-to-machine interaction should be built in a way that facilitates a clear and unambiguous understanding of what is going on. Dates and numbers should be formatted in predictable and standard ways (e.g. ISO 8601 dates) and messages should be presented in machine-parseable formats.

For example, consider an API for a time tracking system that exposes a resource for creating a new time entry. The JSON for creating a new time entry for a user might look like:

   1: { 
   2:   "userId": 4532,
   3:   "startDateUtc": "2012-10-22T14:01:54.98432Z", 
   4:   "endDateUtc": "2012-10-22T11:34:45.29321Z"
   5: }

 

Note how the parameters for start and end date are both expressed as ISO 8601 compliant dates in UTC. Using a date format like this in our API leaves little room for ambiguity. It’s also important to note that using ISO 8601 dates is a much, much saner thing than the \/Date(<milliseconds since epoch>)\/ nonsense that is sometimes used in JSON serialization.

Probably the most important thing to note about the JSON snippet above is the fact that the end date comes before the start date! The API should recognize that and disallow the time entry from being created, returning an error to the caller. You might inclined to send a response that looks something like this:

   1: {
   2:     "errors": [ {"message" : "The end date must come after the start date"}]
   3: }

 

While this may seem like an appropriate thing to do there are a few problems with this approach:

  1. What if there is a user somewhere on the other end of the API call that doesn’t speak English? 
  2. What if the message provided here won’t fit properly within the UI of the application that made the API call?
  3. What if the verbiage of the message isn’t consistent with the rest of the application that made the API call?
  4. What if there is no user directly on the other end of the API call (e.g. this is a batch job uploading time entries once per night unattended)?

The API knows nothing about the context from which the call was made. There are steps you could take to given the API some context (e.g.allow the caller to send along a language code indicating the language that the end user speaks), but that will only get you so far. As the designer of the API you could make some assumptions about how the API will be called, but if we start making assumptions we could very easily make the wrong assumptions. In this situation it’s best to make no assumptions and simply design the API in such a way that the caller has the responsibility to convey error messages in a manner that is appropriate for the context in which the error was raised. You would work around some of these problems by allowing callers to add metadata to each request describing the context from which the call is being made (e.g. accepting a ‘locale’ parameter denoting the desired language), but that will add needless clutter and complexity. It’s better to keep the API simple and push those context-specific concerns down to the caller whenever possible. For our very simple time entry example, this can be done by simply changing our error message response to look like this:

   1: {
   2:     "errors": [ {"code": 100}]
   3: }

 

By changing our error error from exposing a string to a numeric code that is easily parseable by another application, we’ve placed all of the responsibility for conveying the actual meaning of the error message on the caller. It’s best to have the caller be responsible for conveying this meaning because the caller understands the context much better than the API does. Now the caller can see error code 100, know that it means that the end date submitted falls before the start date and take appropriate action. Now all of the problems listed out above are non-issues because the caller can simply translate the error code of ‘100’ into the proper action and message for the current context. The numeric code representation of the error is a much better way to facilitate the machine-to-machine interaction that the API is meant to facilitate.

An API Does Have Human Users

While APIs should be built for machine-to-machine interaction, people still need to wire these interactions together. As a programmer building a client application that will consume the time entry API I would find it frustrating to have to go dig through the API documentation every time I encounter a new error code (assuming the documentation exists and is accurate). The numeric error code approach hurts the discoverability of the API and makes it painful to integrate with. We can help ease this pain by merging our two approaches:

   1: {
   2:     "errors": [ {"code": 100, "message" : "The end date must come after the start date"}]
   3: }

 

Now we have an easily parseable numeric error code for the machine-to-machine interaction that the API is meant to facilitate and a human-readable message for programmers working with the API. The human-readable message here is not intended to be viewed by end-users of the API and as such is not really a “localizable string” in my opinion. We could opt to expose a locale parameter for all API methods and store translations for all error messages, but that’s a lot of extra effort and overhead that doesn’t add a lot real value to the API. I might be a bit of an “ugly American”, but I think it’s probably fine to have the API return English messages when the target for those messages is a programmer. When resources are limited (which they always are), I’d argue that you’re better off hard-coding these messages in English and putting more effort into building more useful features, improving security, tweaking performance, etc.

Posted On Saturday, November 10, 2012 8:56 PM | Comments (0)
Sunday, October 21, 2012
On StringComparison Values

When you use the .NET Framework’s String.Equals and String.Compare methods do you use an overloStringComparison enumeration value? If not, you should be because the value provided for that StringComparison argument can have a big impact on the results of your string comparison. The StringComparison enumeration defines values that fall into three different major categories:

  • Culture-sensitive comparison using a specific culture, defaulted to the Thread.CurrentThread.CurrentCulture value (StringComparison.CurrentCulture and StringComparison.CurrentCutlureIgnoreCase)
  • Invariant culture comparison (StringComparison.InvariantCulture and StringComparison.InvariantCultureIgnoreCase)
  • Ordinal (byte-by-byte) comparison of  (StringComparison.Ordinal and StringComparison.OrdinalIgnoreCase)

There is a lot of great material available that detail the technical ins and outs of these different string comparison approaches. If you’re at all interested in the topic these two MSDN articles are worth a read:

Those articles cover the technical details of string comparison well enough that I’m not going to reiterate them here other than to say that the upshot is that you typically want to use the culture-sensitive comparison whenever you’re comparing strings that were entered by or will be displayed to users and the ordinal comparison in nearly all other cases. So where does that leave the invariant culture comparisons? The “Best Practices For Using Strings in the .NET Framework” article has the following to say:

“On balance, the invariant culture has very few properties that make it useful for comparison. It does comparison in a linguistically relevant manner, which prevents it from guaranteeing full symbolic equivalence, but it is not the choice for display in any culture. One of the few reasons to use StringComparison.InvariantCulture for comparison is to persist ordered data for a cross-culturally identical display. For example, if a large data file that contains a list of sorted identifiers for display accompanies an application, adding to this list would require an insertion with invariant-style sorting.”

I don’t know about you, but I feel like that paragraph is a bit lacking. Are there really any “real world” reasons to use the invariant culture comparison? I think the answer to this question is, “yes”, but in order to understand why we should first think about what the invariant culture comparison really does.

The invariant culture comparison is really just a culture-sensitive comparison using a special invariant culture (Michael Kaplan has a great post on the history of the invariant culture on his blog: http://blogs.msdn.com/b/michkap/archive/2004/12/29/344136.aspx). This means that the invariant culture comparison will apply the linguistic customs defined by the invariant culture which are guaranteed not to differ between different machines or execution contexts. This sort of consistently does prove useful if you needed to maintain a list of strings that are sorted in a meaningful and consistent way regardless of the user viewing them or the machine on which they are being viewed.

Example: Prototype Names

Let’s say that you work for a large multi-national toy company with branch offices in 10 different countries. Each year the company would work on 15-25 new toy prototypes each of which is assigned a “code name” while it is under development. Coming up with fun new code names is a big part of the company culture that everyone really enjoys, so to be fair the CEO of the company spent a lot of time coming up with a prototype naming scheme that would be fun for everyone to participate in, fair to all of the different branch locations, and accessible to all members of the organization regardless of the country they were from and the language that they spoke.

  • Each new prototype will get a code name that begins with a letter following the previously created name using the alphabetical order of the Latin/Roman alphabet. Each new year prototype names would start back at “A”.
  • The country that leads the prototype development effort gets to choose the name in their native language. (An appropriate Romanization system will be used for countries where the primary language is not written in the Latin/Roman alphabet. For example, the Pinyin system could be used for Chinese).
  • To avoid repeating names, a list of all current and past prototype names will be maintained on each branch location’s company intranet site.

Assuming that maintaining a single pre-sorted list is not feasible among all of the highly distributed intranet implementations, what string comparison method would you use to sort each year’s list of prototype names so that the list is both meaningful and consistent regardless of the country within which the list is being viewed?

Sorting the list with a culture-sensitive comparison using the default configured culture on each country’s intranet server the list would probably work most of the time, but subtle differences between cultures could mean that two different people would see a list that was sorted slightly differently. The CEO wants the prototype names to be a unifying aspect of company culture and is adamant that everyone see the the same list sorted in the same order and there’s no way to guarantee a consistent sort across different cultures using the culture-sensitive string comparison rules. The culture-sensitive sort would produce a meaningful list for the specific user viewing it, but it wouldn’t always be consistent between different users.

Sorting with the ordinal comparison would certainly be consistent regardless of the user viewing it, but would it be meaningful? Let’s say that the current year’s prototype name list looks like this:

  • Antílope (Spanish)
  • Babouin (French)
  • Čahoun (Czech)
  • Diamond (English)
  • Flosse (German)

If you were to sort this list using ordinal rules you’d end up with:

  • Antílope
  • Babouin
  • Diamond
  • Flosse
  • Čahoun

This sort is no good because the entry for “C” appears the bottom of the list after “F”. This is because the Czech entry for the letter “C” makes use of a diacritic (accent mark). The ordinal string comparison does a byte-by-byte comparison of the code points that make up each character in the string and the code point for the “C” with the diacritic mark is higher than any letter without a diacritic mark, which pushes that entry to the bottom of the sorted list. The CEO wants each country to be able to create prototype names in their native language, which means we need to allow for names that might begin with letters that have diacritics, so ordinal sorting kills the meaningfulness of the list.

As it turns out, this situation is actually well-suited for the invariant culture comparison. The invariant culture accounts for linguistically relevant factors like the use of diacritics but will provide a consistent sort across all machines that perform the sort. Now that we’ve walked through this example, the following line from the “Best Practices For Using Strings in the .NET Framework” makes a lot more sense:

One of the few reasons to use StringComparison.InvariantCulture for comparison is to persist ordered data for a cross-culturally identical display

That line describes the prototype name example perfectly: we need a way to persist ordered data for a cross-culturally identical display. While this example is 100% made-up, I think it illustrates that there are indeed real-world situations where the invariant culture comparison is useful.

Posted On Sunday, October 21, 2012 12:44 PM | Comments (0)
Meta
Tag Cloud