Monday, May 21, 2012
Much has been written about the technical ins and outs of localizing .NET applications with ResX files, but I think that most treatments of the topic understate the difficulty involved in localizing a project of substantial size and complexity. You probably already know how to define and consume global and local resources in a “Hello World” sized ASP .NET application and how to use the CurrentUICulture to ensure that the right version of a string is pulled from a ResX file to be shown to a user. Even if you don’t, those technical details are easy to look up and understand. In my opinion, the the difficult part of making software available in other languages is keeping your localizable strings (text that your application presents to human users) organized.
Example Application
For the sake of this post let’s talk about a “real world” web application that’s composed of more than just a few pages. Consider a project with the following structure:
- A UI layer consisting of some ASP .NET views
- A layer for “presentation logic” containing presenters/controllers, view models, input validation, etc..
- A layer for “backend” stuff like business rules, workflow orchestration, data access, etc.
How would we ensure that a newly added feature to this application could be translated and used by people in other countries in the future if needed?
(Note that for the purposes of this post I’m only discussing string localization and translation which is only one aspect of preparing software for use in other countries and cultures)
Application Feature: Sprocket Order Entry
You work for a software company building and maintaining the web application described above. Your software isn’t currently used anywhere but the United States, but everyone has always known that an international user base was a possibility in the future. This application has been around for awhile and currently contains dozens of views and many shared code libraries that make up its “back end”. You are tasked with creating a new page for entering sprocket orders (we’ll refer to this as the Sprocket Order Entry feature). You might build this feature as follows:
- Create a new view and start laying out how you want things to look. You might use a mock up tool, or you might be spiking some HTML and CSS to see how things will come together. At this point you’re probably just using hard-coded English text for input prompts and other user-facing messages.
- You might then decide to try and wire up the view to the presentation logic. At this point you might be thinking about how to structure data that will be passed back and forth between the view and the server.
- Next you might tackle some of the back end stuff like enforcing business rules, coming up with a data access strategy, data normalization, raising domain events for other parts of the system to respond to etc.
Keeping in mind that this page might be used in other countries some day, you know that you should prepare this new feature for localization. To that end you start looking for localizable strings that should be moved into ResX files. Here’s what you might find:
- Labels, prompts, column headers, etc that are needed in the view that are now residing in a ResX file that is specific to the view in which they are used (e.g. SprocketOrderEntry.aspx.resx).
- Messages that can be returned by the presenter’s input validation logic. When you were writing the presenters you quickly hard-coded these messages in whenever validation failed and some way to pass the strings back to the view. Now that you’re looking at setting up ResX files you decide to just create a “SprocketOrderEntryPresenterResources.resx” file that can live in the presentation library.
- A warning message for a condition that can be raised from the “back end” classes when no subscribers are listening for the domain event that you’re raising. This isn’t an error condition per se, but you want to somehow inform your salespeople that the fulfillment system might not have received the order that they just entered, but that it should find it as soon as it comes back online. You don’t have a lot of user-facing messages in the “back end” library, so you’ve opted to have one “MiscResources.resx” file there that can collect the one-off strings that sometimes need to be returned from the back end processing.
The feature is done and the localizable strings are all in ResX files so you’re done now, right? Technically yes, but you’ve spread your localizable strings for this feature over three different ResX files spanning several logical layers of your application. Putting aside the fact that user-facing messages are primarily a UI concern and probably belong exclusively in the UI layer, having your localizable strings spread out across different logical layers of your application can cause a lot of pain when the translation process begins. This pain can stem from a variety of scenarios. Let’s begin by looking at what might happen when the Sprocket Order Entry feature needs to be translated into a new language.
Translating A Single Application Feature
The sprocket order entry form you built has been in-use internally at your company for a few months (or years) when you learn of the plan to open a branch office in Brazil. The office staff there really needs the order entry form in Brazilian Portuguese so you start figuring out what it will take to get the page translated. It’s been awhile since you’ve looked at or thought about the sprocket order entry feature (or maybe you aren’t even the developer that originally built it), but you know that the view for this page lives in a file called ‘SprocketOrderEntry.aspx’ and you see that there’s a SprocketOrderEntry.aspx.resx file in the project that appears to have the same text that you see when you bring up the page in your local development environment. There are, of course, many other .resx files in the codebase, but there’s not enough time or money to get everything translated right now so you just gather up the strings from resx file that belongs to this one page and send them out to get translated. When the translations come back you create a new "SprocketOrderEntry.aspx.pt-BR.resx file containing the translated messages and add it to your project. You take a quick peak at the page running under the pt-BR culture and see that it appears to be in Portuguese. It all looks good so you ship it off to Brazil to be used.
After a couple of days you get a report back from Brazil that there’s some English text showing up on the order entry screen sometimes and no one knows what it means. When it shows up the orders don’t appear to be getting saved and people are getting frustrated. The report includes a screenshot of the message: “You checked the box for ‘Include Delivery Notes’ but did not enter anything into the ‘Delivery Notes’ field.’”. The orders aren’t being saved because a basic validation rule is failing, but the validation error message can’t be understood by the users. You fire off a reply to the manager in Brazil to explain the message’s meaning and behavior and set out to fix the errant English message.
Your initial thought is that the translators might have missed a string or two in the SprocketOrderEntry.aspx.resx file so you crack it open to have a look. To your surprise you don’t see that English message defined there so you move into the code for the view to see if it’s been hard-coded. After a few minutes with no luck you decide to fire off a “Find in files…” command in your IDE to try and track down where that message is coming from. The exact message text turns up in a file called ‘SprocketOrderEntryPresenterResources.resx’ in the presentation class library. You open that file to discover a handful of messages related to the sprocket order entry form that have not been translated. You assemble those strings and send them along to the translators. The strings that need to be translated this time around only represent a fraction of the strings that were translated in the first round, but the translators have a minimum fee so the overall effort ends up being more expensive than it had to be.
At this point you have translated the strings for the Sprocket Order Entry feature from two out of the three logical layers in your application. Of course it’s only a matter of time before one of the strings defined in the “backend” layer makes its way onto the page resulting in another round-trip to the translators.
How could this have been avoided? In this over-simplified example it might not have been that difficult to manually walk through and review all of the code in each possible execution path that the order entry from can initiate, but that could become a very time consuming activity in an application of even moderate size and complexity. These reviews would also likely have to be repeated from time to time as the application changes and grows with new requirements and features. If you started calculating how much it would cost in developer resources to do these reviews, you might conclude that it would be cheaper and easier to just gather up all of the ResX files in your application and sending them to he translators. Deciding to translate all of the localizable strings at once can certainly help avoid the translation churn described above (providing that no hard-coded localizable strings have been left behind by developers) but it’s not a silver bullet.
Understanding Context
After awhile the translation churn on the Sprocket Order Entry feature has died down and the company starts selling sprockets like mad in the Brazilian market. Soon enough there’s talk of establishing a new sprocket fulfillment center in Brazil to serve all of the orders coming in, which means that the fulfillment center staff will need to be able to use the Sprocket Fulfillment Center module in Portuguese. Having learned your lesson with the Sprocket Order Entry page, you spend start to carefully walk through all of the execution paths of the web pages that make up the Sprocket Fulfillment module to ensure that all of the localizable strings for the feature get translated in one pass. After only a few minutes it becomes clear to you that the Sprocket Fulfillment module relies on many other shared modules in your application several of which contain localizable strings that may or may not end up on the Sprocket Order Fulfillment UI. Rather than spend the time and resources to track down each and every possible string, you decide it’s easier to just send all of the ResX files in the application out to the Portuguese translators. While translating everything will take longer and cost more, it seems like a better alternative than manually tracking down all of the strings that might get used and risk missing some and requiring multiple round trips to the translators.
After a few days you get an e-mail from the translators asking for some clarifications about some of the strings to get translated. They have screenshots of the all of the screens in the application to help provide context for the strings that they are translating, but they don’t have any easy way to determine the context in which the strings from some of the “shared” libraries are used. Of course, you don’t know where each of the strings in the shared libraries are used off the top of your head either, so you’re left to grep through code in your IDE to trace the usage of the more context-sensitive English messages so that the Portuguese translators can come up with good translations. This is exactly the kind of exercise that you were hoping to avoid by opting to translate all of the ResX files instead of just the ones for the specific feature set that was needed. The Sprocket Fulfillment module needs to be available in Portuguese ASAP, so you have no choice but to spend the hours (or days?) needed to provide the translators with the context for the shared strings.
This exercise is tedious, but you take some solace in the fact that it’s not something that you’ll need to do all the time; once you’ve determined where all of these strings are used you can save that information for the next time context related questions come up. You send off answers to the translators questions and happily return to whatever it was that you were working on when the questions originally came in hoping to not have to spend any more time on translation-related work for awhile. Unfortunately, the next day you get some unexpected bad news from the translators.
Message Reuse
When finding all of the places in the application where the shared localizable strings are used you found that many of them can surface on several different screens. This didn’t surprise or concern you at the time, as you figured that the translators could just translate these strings once and the translations would appear on all of the screens in which the strings are used. It turns out, however, that the Portuguese translation for some of these shared strings needs to differ slightly between the various places that the string is used. You now have to sink even more developer time into translations to split these shared messages into different UI-specific versions so that the proper Portuguese translation can be shown on each screen. Having localizable strings defined in shared code libraries has encouraged message reuse which can cause all kinds of headaches during the translation process.
From the MediaWiki documentation on localization:
“Although two concepts can be expressed with the same word in English, this doesn't mean they can be expressed with the same word in every language. "OK" is a good example: in English this is used for a generic button label, but in some languages they prefer to use a button label related to the operation which will be performed by the button. If you are adding multiple identical messages, please add message documentation to describe the differences in their contexts. Don't worry too much about the extra work for translators. Translation memory helps a lot in these while keeping the flexibility to have different translations if needed.”
The “Right Way”
Most, if not all, of the issues described in this post could be alleviated by ensuring that all localizable strings were stored in the UI layer of the application. If each view/screen in the application defined all of the localizable strings that it needed, it would be very easy to translate individual pages without translating the entire application, easily provide translators with the context in which each string is used, and ensure that each user-facing message is defined on its own and can be translated on its own.
Anytime I need to add a localizable string to an application, I ask myself this question:
Where will the message represented by this string originate and where will it ultimately be consumed?
Note the difference between the terms “message” and a “string” here: A message is a piece of information that the application needs to convey to the user while a string is just bit of text that can be used to convey that message. The Sprocket Order Entry application might convey the message that the ‘PO Authorized By’ field was left empty using the string “This customer requires that all POs be authorized prior to placing an order. Please enter a value for the ‘PO Authorized By’ field.” It might make perfect sense to have this message originate from a shared library in the business logic layer of the application, but the actual string won’t be consumed until control returns from that shared library all the way back to the view that originated the request.
It’s best to represent the message using an invariant number or code (e.g. an integer or enumeration value) for as long as possible until it needs to be interpreted by a human user on the other end of a display. By using invariant data to represent the message you’ll help ensure that all potential callers into the shared module will be able to convert that message into a string that is appropriate for the environment in which it will be displayed. For example, if the message needs to be displayed to a user in a web browser, it might make sense to show users a full sentence or two explaining what the message means and what action they might need to take. If the message needs to be displayed to a user on a handheld, device, then the a much shorter and/or specialized message might be more appropriate. Put another way, the closer the localizable strings are to the users of the application, the better.
Tuesday, January 24, 2012
Anyone who has worked with Visual Basic for any length of time should be familiar with the VB Type Conversion Functions like CBool, CInt, and CStr. These functions can provide a short and easy way to coerce the value of one type into another. I’ve been working with ~300,000 line VB .NET project for the past couple of years that is littered with calls to these functions. We never had any issues with them until we sat down to convert the project to C#.
Converting a 300,000 line project from VB .NET to C# is a pretty big undertaking and I plan on putting together a post-mortem at some point in the future, but I wanted to briefly outline one subtle issue we hit with the CStr function and enumerations.
Consider the following enumeration (declared in C#):
1 public enum ReviewStatus
2 {
3 NotStarted = 0,
4 InProgress,
5 Complete
6 }
Let’s say you want to use this enumeration in the query string of a web page that displays a report. In VB .NET you could easily use the CStr function to take an instance of the ReviewStatus enumeration and convert it to a string:
1 Dim selectedStatus As ReviewStatus = GetSelectedReviewStatus()
2 Request.QueryString.Add("reviewStatus", CStr(selectedReviewStatus))
This results in a URL that looks like:
/somePage.aspx?reviewStatus=1
When provided with a Numeric Data Type the CStr function will return a string representing the number. The use of CStr here makes me a bit nervous because it doesn’t provide any means of specifying an IFormatInfo, but if globalization isn’t a concern using CStr here is probably “good enough”.
When we set out to convert our project to C# we found dozens of places where we were using CStr to convert an enumeration into a string. Because there is no direct equivalent of CStr in C#, we opted to use ToString instead. The VB .NET code above turned into:
1 var selectedStatus = GetSelectedReviewStatus();
2 Request.Querystring.Add("reviewStatus", selectedStatus.ToString());
At first we didn’t see any issues with this conversion until we started testing the application. We soon discovered that the URL in the converted C# project looked like this:
/somePage.aspx?reviewStatus=InProgress
The code that consumes the ‘reviewStatus’ query string variable was expecting an integer that could be cast to the appropriate enumeration member was was throwing exceptions trying to parse the ‘InProgress’ value.
To resolve this issue, we ended up having to first cast the enumeration value to an integer and then calling ToString on the newly cast integer.
But What About Convert.ToString ?
If you search Google for the phrase, “CStr equivalent in C#” you’re likely to find forum posts indicating that Convert.ToString should be used in place of CStr. Convert.ToString might be an appropriate substitute for CStr in many instances, but using Convert.ToString on an enumeration in C# will yield the same result as .ToString’ing the enumeration value, and therefore didn’t help with our problem.
Monday, January 02, 2012
About a week ago we received a support call from one of our customers asking why she couldn’t open PDF files on our website anymore. She was using Chrome and the browser tab would just hang with the ‘Loading’ animation and a gray background after requesting an invoice PDF. After a little digging we figured out that this hang-up was occurring while Chrome was opening the PDF document. We were easily able to reproduce this issue ourselves in the production environment, but only while using Chrome. We were not, however, able to reproduce it in any of our QA / test environments. All environments are running the same version of ASP .NET (3.5) and IIS (7.5).
After trying dozens of different combinations of browser settings, PDF file sizes, and IIS settings, I finally noticed that the content-length header that we set explicitly before writing the PDF file bytes out the response stream was being replaced with a ‘transfer-encoding: chunked’ header in our production environment (but not in any of our QA environments). In some circumstances Chrome chokes on file downloads using the transfer-encoding: chunked header. Initial research into the issue yielded a post reporting that ASP .NET will set the transfer-encoding: chunked header for you when calling ‘Flush’ manually without explicitly setting a content-length header. Unfortunately for me we were explicitly setting the content-length header, so it was back to the drawing board.
After banging my head against the wall for a few hours comparing the raw IIS configuration XML between our QA and production servers I finally remembered that we had seen some very odd response related issues that stemmed from an HTTP Response Filter we were using on some pages. Sure enough, the filter was the cause of our issues in this instance as well.
The reason we were only seeing this issue in the production environment was that we use an HTTP Module to profile page requests. One of the things we measure is the size of the ‘viewstate’ field that is sent down for each request to detect pages that might be viewstate abusers. In order to track the size of the viewstate data sent down we attach a stream to the ‘Filter’ property of the HttpResponse that intercepts and records the value of the viewstate hidden field. Disabling this viewstate tracking filter for the page that generates and sends PDF files down to the browser resolved the issue. This module was enabled in our production environment but nowhere else.
It seems unlikely that anyone else will ever stumble across this same issue, but if I can save someone else a few hours of troubleshooting time with this blog post then it’ll be well worth the short time it took me to write this up. In summary, the following circumstances lead to the replacement of the content-length header with a transfer-encoding: chunked header:
- Site is hosted on IIS 7.5 (I never tested this on IIS 7, though I do know that the Visual Studio development web server does NOT exhibit the same issue)
- A stream is set on the ‘Filter’ property of the HttpResponse. What the filter actually does seems immaterial. I was able to reproduce the problem with a filter that simply passed the raw bytes through without modifying them at all.
- A content-length header is explicitly set.
- ‘Flush’ is called on the HttpResponse directly after writing some or all of the file contents to the response. (in our application we were calling Flush multiple times sending down about 1kb of data at a time as our files can be quite large sometimes.)
- (Optional) Calling ‘End’ on the HttpResponse after sending all of the file contents. (This is not explicitly required for seeing the transfer-encoding: chunked header, but this does seem to be a requisite for having Chrome choke on opening the PDF.
I created a quick and dirty web forms app to reproduce the issue that you can grab on GitHub. I was also able to deploy and reproduce the same issue using this sample app on AppHarbor:
Tuesday, November 15, 2011
It’s no secret that daylight savings time can wreak havoc on systems that rely heavily on dates. The system I work on is centered around recording dates and times, so naturally my co-workers and I have seen our fair share of date-related bugs. From time to time, however, we come across something that we haven’t seen before. A few weeks ago the following error message started showing up in our logs:
“The supplied DateTime represents an invalid time. For example, when the clock is adjusted forward, any time in the period that is skipped is invalid.”
This seemed very cryptic, especially since it was coming from areas of our application that are typically only concerned with capturing date-only (no explicit time component) from the user, like reports that take a “start date” and “end date” parameter. For these types of parameters we just leave off the time component when capturing the date values, so midnight is used as a “placeholder” time. How is midnight an “invalid time”?
Globalization Is Hard
Over the last couple of years our software has been rolled out to users in several countries outside of the United States, including Brazil. Brazil begins and ends daylight savings time at midnight on pre-determined days of the year. On October 16, 2011 at midnight many areas in Brazil began observing daylight savings time at which time their clocks were set forward one hour. This means that at the instant it became midnight on October 16, it actually became 1:00 AM, so any time between 12:00 AM and 12:59:59 AM never actually happened.
Because we store all date values in the database in UTC, always adjust any “local” dates provided by a user to UTC before using them as filters in a query. The error we saw was thrown by .NET when trying to convert the Brazilian local time of 2011-10-16 12:00 AM to UTC since that local time never actually existed. We hadn’t experienced this same issue with any of our US customers because the daylight savings time changes in the US occur at 2:00 AM which doesn’t conflict with our “placeholder” time of midnight.
Detecting Invalid Times
In .NET you might use code similar to the following for converting a local time to UTC:
1: var localDate = new DateTime(2011, 10, 16); //2011-10-16 @ midnight
2: const string timeZoneId = "E. South America Standard Time"; //Windows system timezone Id for "Brasilia" timezone.
3: var localTimeZone = TimeZoneInfo.FindSystemTimeZoneById(timeZoneId);
4: var convertedDate = TimeZoneInfo.ConvertTimeToUtc(localDate, localTimeZone);
The code above throws the “invalid time” exception referenced above. We could try to detect whether or not the local time is invalid with something like this:
1: var localDate = new DateTime(2011, 10, 16); //2011-10-16 @ midnight
2: const string timeZoneId = "E. South America Standard Time"; //Windows system timezone Id for "Brasilia" timezone.
3: var localTimeZone = TimeZoneInfo.FindSystemTimeZoneById(timeZoneId);
4: if (localTimeZone.IsInvalidTime(localDate))
5: localDate = localDate.AddHours(1);
6: var convertedDate = TimeZoneInfo.ConvertTimeToUtc(localDate, localTimeZone);
This code works in this particular scenario, but it hardly seems robust. It also does nothing to address the issue that can arise when dealing with the ambiguous times that fall around the end of daylight savings. When we roll the clocks back an hour they record the same hour on the same day twice in a row. To continue on with our Brazil example, on February 19, 2012 at 12:00 AM, it will immediately become February 18, 2012 at 11:00 PM all over again. In this scenario, how should we interpret February 18, 2011 11:30 PM?
Enter Noda Time
I heard about Noda Time, the .NET port of the Java library Joda Time, a little while back and filed it away in the back of my mind under the “sounds-like-it-might-be-useful-someday” category. Let’s see how we might deal with the issue of invalid and ambiguous local times using Noda Time (note that as of this writing the samples below will only work using the latest code available from the Noda Time repo on Google Code. The NuGet package version 0.1.0 published 2011-08-19 will incorrectly report unambiguous times as being ambiguous) :
1: var localDateTime = new LocalDateTime(2011, 10, 16, 0, 0);
2: const string timeZoneId = "Brazil/East";
3: var timezone = DateTimeZone.ForId(timeZoneId);
4: var localDateTimeMaping = timezone.MapLocalDateTime(localDateTime);
5:
6: ZonedDateTime unambiguousLocalDateTime;
7: switch (localDateTimeMaping.Type)
8: {
9: case ZoneLocalMapping.ResultType.Unambiguous:
10: unambiguousLocalDateTime = localDateTimeMaping.UnambiguousMapping;
11: break;
12: case ZoneLocalMapping.ResultType.Ambiguous:
13: unambiguousLocalDateTime = localDateTimeMaping.EarlierMapping;
14: break;
15: case ZoneLocalMapping.ResultType.Skipped:
16: unambiguousLocalDateTime = new ZonedDateTime(
17: localDateTimeMaping.ZoneIntervalAfterTransition.Start, timezone);
18: break;
19: default:
20: throw new InvalidOperationException(
21: string.Format("Unexpected mapping result type: {0}",
22: localDateTimeMaping.Type));
23: }
24: DateTime convertedDateTime = unambiguousLocalDateTime.ToInstant().ToDateTimeUtc();
Let’s break this sample down:
I’m using the Noda Time ‘LocalDateTime’ object to represent the local date and time. I’ve provided the year, month, day, hour, and minute (zeros for the hour and minute here represent midnight). You can think of a ‘LocalDateTime’ as an “invalidated” date and time; there is no information available about the time zone that this date and time belong to, so Noda Time can’t make any guarantees about its ambiguity.
The ‘timeZoneId’ in this sample is different than the ones above. In order to use the .NET TimeZoneInfo class we need to provide Windows time zone ids. Noda Time expects an Olson (tz / zoneinfo) time zone identifier and does not currently offer any means of mapping the Windows time zones to their Olson counterparts, though project owner Jon Skeet has said that some sort of mapping will be publicly accessible at some point in the future.
I’m making use of the Noda Time ‘DateTimeZone.MapLocalDateTime’ method to disambiguate the original local date time value. This method returns an instance of the Noda Time object ‘ZoneLocalMapping’ containing information about the provided local date time maps to the provided time zone.
The disambiguated local date and time value will be stored in the ‘unambiguousLocalDateTime’ variable as an instance of the Noda Time ‘ZonedDateTime’ object. An instance of this object represents a completely unambiguous point in time and is comprised of a local date and time, a time zone, and an offset from UTC. Instances of ZonedDateTime can only be created from within the Noda Time assembly (the constructor is ‘internal’) to ensure to callers that each instance represents an unambiguous point in time.
The value of the ‘unambiguousLocalDateTime’ might vary depending upon the ‘ResultType’ returned by the ‘MapLocalDateTime’ method. There are three possible outcomes:
- If the provided local date time is unambiguous in the provided time zone I can immediately set the ‘unambiguousLocalDateTime’ variable from the ‘Unambiguous Mapping’ property of the mapping returned by the ‘MapLocalDateTime’ method.
- If the provided local date time is ambiguous in the provided time zone (i.e. it falls in an hour that was repeated when moving clocks backward from Daylight Savings to Standard Time), I can use the ‘EarlierMapping’ property to get the earlier of the two possible local dates to define the unambiguous local date and time that I need. I could have also opted to use the ‘LaterMapping’ property in this case, or even returned an error and asked the user to specify the proper choice. The important thing to note here is that as the programmer I’ve been forced to deal with what appears to be an ambiguous date and time.
- If the provided local date time represents a skipped time (i.e. it falls in an hour that was skipped when moving clocks forward from Standard Time to Daylight Savings Time), I have access to the time intervals that fell immediately before and immediately after the point in time that caused my date to be skipped. In this case I have opted to disambiguate my local date and time by moving it forward to the beginning of the interval immediately following the skipped period. Again, I could opt to use the end of the interval immediately preceding the skipped period, or raise an error depending on the needs of the application.
The point of this code is to convert a local date and time to a UTC date and time for use in a SQL Server database, so the final ‘convertedDate’ variable (typed as a plain old .NET DateTime) has its value set from a Noda Time ‘Instant’. An 'Instant’ represents a number of ticks since 1970-01-01 at midnight (Unix epoch) and can easily be converted to a .NET DateTime in the UTC time zone using the ‘ToDateTimeUtc()’ method.
This sample is admittedly contrived and could certainly use some refactoring, but I think it captures the general approach needed to take a local date and time and convert it to UTC with Noda Time. At first glance it might seem that Noda Time makes this “simple” code more complicated and verbose because it forces you to explicitly deal with the local date disambiguation, but I feel that the length and complexity of the Noda Time sample is proportionate to the complexity of the problem. Using TimeZoneInfo leaves you susceptible to overlooking ambiguous and skipped times that could result in run-time errors or (even worse) run-time data corruption in the form of a local date and time being adjusted to UTC incorrectly.
I should point out that this research is my first look at Noda Time and I know that I’ve only scratched the surface of its full capabilities. I also think it’s safe to say that it’s still beta software for the time being so I’m not rushing out to use it production systems just yet, but I will definitely be tinkering with it more and keeping an eye on it as it progresses.
Sunday, October 02, 2011
In a previous post I explored some of the issues that can arise when parsing date and time values provided by users from different cultures and regions of the world. Correctly parsing and formatting different date formats is a very common internationalization concern in most software projects, but in this post I want to explore another area where parsing and formatting can cause some headaches.
Numbers Are The International Language, Right?
If your application deals with quantities of any kind then you probably have to both accept numeric input from users and later display that numeric input back to the user. Consider the following numbers:
What do these numbers represent to you? If you’re from the United States you’d probably interpret these numbers as:
-
2.782 = a decimal representing two whole units plus 782/1000 partial units
-
7,482 = an integer (whole number) representing seven thousand four hundred and eighty two whole units
.NET software running on my servers in the United States would interpret those numbers the same way:
//default current culture to en-US
Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");
var firstNumber = decimal.Parse("2.782");
//must explicitly allow for the thousands separator
var secondNumber = int.Parse("7,482", NumberStyles.AllowThousands);
The code above runs successfully and parses both numeric values without any loss of precision. So what happens if this same code is running under a culture other than en-US?
Thread.CurrentThread.CurrentCulture = new CultureInfo("de-DE");
var firstNumber = decimal.Parse("2.782");
var secondNumber = int.Parse("7,482", NumberStyles.AllowThousands);
Now that the culture has been changed to de-DE (German),something interesting happens. The number formatting rules for the de-DE culture in .NET specify that the decimal symbol is a comma and the thousands separator is a period (basically the opposite of the en-US rules). When I was first experimenting with number parsing in an internationalized application I would have expected both of these parse calls to throw an exception due to improper formatting. As I stated in my last post about parsing, a thrown exception would be a good thing in this case, as it would call attention to some parsing code that is possibly not ready to deal with values from different cultures. Unfortunately in this case, the first parsing call does not throw an exception. Instead, the first call parses the value “2.782” and interprets the “.” as a thousands separator. The resulting .NET decimal value is now two-thousand seven-hundred and eighty-two meaning that the originally intended value has been corrupted and lost.
How Do You Deal With This?
The moral of my last post about parsing and formatting boiled down to: “Know Your Audience”, and this post about parsing and formatting is no different. It’s critical to know where numeric values are coming from and culture the source of the values belongs to. It also helps to use the most precise parsing logic that you can so that there is little room for ambiguity. In the example above, if we use the decimal.Parse overload that accepts a NumberStyles enumeration value and provide NumberStyles.Float, the call to parse running under the German culture will fail, as it will be expecting a proper decimal separator. It’s easy to ignore all of those “extraneous” parameters to the Parse and Format methods in .NET, but they really can have a big impact on how your code behaves. When in doubt, look it up or fire up a quick LINQpad session and test out some different variations. If there’s a silver-bullet solution to problems like this I certainly haven’t found it and every application/scenario likely calls for a slightly different approach. The point of post was to call attention to an oddity of decimal parsing that can cause issues in multi-cultural applications.
What’s Next?
While being more restrictive in parsing logic is good for eliminating ambiguity and mitigating the risk of data being improperly parsed, it can lead to a poor user experience. Disallowing the use of thousands separators or using a consistent decimal separator across all cultures is a bit user-hostile (and I’ve certainly been guilty of it). In a future post I want to explore some strategies for allowing users to enter the same data in a variety of ways that make sense to them while still maintaining strict parsing and validation.
Tuesday, June 07, 2011

If your application accepts input from users you’re likely going to want to perform some validation on that input. If you allow users to create data within your system that is displayed to other users (i.e. it should be “human-readable”) you might consider implementing a validation routine that ensures that all input for this data consist only of letters and whitespace. While I generally tend to shy away from overly aggressive validation of user input, I think that this kind of validation can make sense in some scenarios where you wouldn’t want someone’s cat running across their keyboard to create a document titled “((@(8#jk(@(‘’299” in your system.
The Simple Solution
But how should you go about this validation? A simple regular expression (is there such a thing?) should do the trick, right? A quick Google search for “regex for letters and whitespace” might yield something like this: ^[a-zA-Z ]+$ (loosely based on the question and accepted answer of a Stack Overflow question). In short, this regular expression would match any string consisting of the letters A-Z (upper or lower case) and a white space.
This regex would allow the user to enter “My Document”, but not “ZOMG The. Best. Document. EVERRRR!!!!!!11!!1one :-)”. This is great and does what you want until your application needs to support someone who doesn’t speak English. (Side note: I normally wouldn’t care if someone wanted to name a document something silly. This is an admittedly contrived example that I put together to illustrate a point. Be nice to your users and always validate responsibly.)
Your Alphabet Is Not The Only Alphabet
The alphabets used in other languages (if they even have an alphabet) often make use of diacritical marks. In the Czech alphabet, for example, there use some familiar looking letters like ‘A’, ‘E’, and ‘J’, but there are also letters like ‘Č’ and ‘Ů’. It’s important to note here that the diacritical marks above the ‘Č’ and ‘Ů’ are not just “accent marks”. That is to say that they aren’t just modifiers that are applied to the “normal” ‘C’ and ‘U’ letters; they are completely different letters.
So when a user goes to create a document called ‘Můj dokument’ (which is likely an atrocious translation of ‘My Document’… my apologies to the Czechs), your regular expression will reject it. You could be an arrogant American and tell them to just make it ‘Muj dokument’, but that’s not very nice. The point of your validation routine was to help ensure that input remained human-readable, and using characters in foreign alphabets is a perfectly human thing to do.
In this particular scenario you’d likely be better suited using the regular expression shorthand for “word character”, which is “\w”. In the .NET flavor of regex, this will allow for Unicode letters, meaning letters with diacritics are fair game. It’s important to note that it also allows digits (0-9) and the underscore character but not whitespace. In my opinion, the digits 0-9 and the underscore are perfectly acceptable things to allow in a human-readable string, but you could always modify your validation routine to expressly check for digits and underscores if there’s some really good reason to disallow them.
Why Should I Care?
If your application is and will always be for a target audience of English-speaking users, then maybe you don’t need to care. If you think there’s any remote chance that your application might someday be used by folks that don’t speak English, do yourself a favor and take a second to think about how best to validate input from you users.
Monday, May 30, 2011
TL;DR: The WebImage.GetImageFromRequest() is case-sensitive with regard to the extensions of the image file names. More info here: http://forums.asp.net/t/1682158.aspx/1
Lately I’ve been working on a little side project in my spare time that requires that I allow users to upload an image from their computer. I’ve been using ASP .NET MVC 3 with Razor views and have been having a great time with it so far.
I was excited to find the microsoft-web-helpers package on NuGet, as it contains what appeared to be a really useful helper class called ‘WebImage’ for easily accepting uploaded images and performing simple manipulation like resizing and cropping. I was happily using this little library for awhile before discovering that is has a few pretty significant issues. First of all, the resize functionality completely breaks under certain circumstances, but that issue is beyond the scope of this post. The issue that really put the breaks on my momentum was much more fundamental: the ‘GetImageFromRequest’ method, which is used for grabbing an image from the POSTed form data, was returning ‘null’ for seemingly valid images.
I had the basic functionality of this little application more or less done and had been testing it locally my own computer without issue, but had also recruited some friends and family to try it out as well. I was getting really odd reports that it wasn’t working for some pictures that were taken on a digital camera. This was really puzzling and I spent hours pouring over the properties of the jpeg files that were failing, but couldn’t find any smoking gun.
Today I finally did what I should have done days ago and did a little Googling to see if this was a known issue. Within a couple of minutes I had found a post on the ASP .NET forums that explained the issue: the ‘GetImageFromRequest’ method has problems if the extension of the image file name is capitalized (e.g. ‘GIF’ instead of ‘gif’). When the images were being saved to the computer from the digital camera they were coming across with .JPG extensions, which was causing the issue. Thankfully there are a couple of good suggestions on that forum post with good workarounds. I ended up creating a substitute ‘GetImageFromRequest’ method as advocated in one of the responses which seems to be working great so far.
I’ve been a bit disappointed in the WebImage helper class. These issues I’ve encountered really prevent me from using it in the way I wanted to and, in my opinion, make it unusable for any “real” application. I really hope that these issues get addressed in a future version.
Tuesday, April 05, 2011
If your typical workweek is anything like mine you might have connections to multiple SQL Server instances open in SQL Server Management Studio at any given time. For me, some of these instances are on my local machine (i.e. I use them for development work), and some of them are on remote (read: test or production) machines. In a perfect world I’d never need to jump into a production database to investigate an issue, but the world I live in is far from perfect so I end up looking at “live” databases from time to time.
Because I can sometimes have multiple connections going at once I always have a little voice in the back of my head asking, “Am I about to run this query against the wrong instance?” I usually like to keep connections to remote machines open only for as long as I need them to help mitigate that risk but in my imperfect world sometimes I get distracted by a phone call or IM. I’ve done a pretty good job of conditioning myself to take a quick peak at the status bar at the bottom of my SSMS query window to make sure that the server says what I think it should before kicking off any query that will alter data or cause locking.

I was doing some screen-sharing with a co-worker the other day and noticed that he had a little colored bar at the top of his query window. Turns out that he was using the Window Connection Coloring feature in the SSMS Toolpack (which is a great add-in for SQL Server Management Studio). This feature lets you have a color-coded bar present in every query window that changes based on what server that window is pointed at.
If you have the SSMS Toolpack installed just go to the “SSMS Tools –> Window Connection Coloring –> Options” menu to bring up the dialog for configuring your connection highlight settings. I set up mine to look something like this:

Rather than setting up a different entry for every server I connect to, I setup two entries:
- One for ‘localhost’ that uses a green connection color.
- One using a regex for any server name that isn’t ‘localhost’ that uses a red connection color. The regex I used is: ^((?!localhost).*)$ (also be sure to check the ‘IsRegex’ column).
This dialog UI is a bit wonky, so make sure that you click the ‘Save’ button once you get everything the way that you want it before clicking ‘OK’.
With these settings I now have a bright green bar at the top of every query window that is pointed at ‘localhost’ and a bright red one that is pointed at every other server I connect to. Now I know that I don’t want those red windows open for very long!
Happy Green Window (localhost):

Scary Red Window (remote machine):

Wednesday, January 26, 2011
As mentioned in some earlier posts, I’ve been hacking on Shopify for the past six months for an e-commerce project. My work with Shopify introduced me to Heroku, which is a simple “cloud-based” service for hosting and running Ruby applications. It’s dead simple to get started and, for the most part, seems to “just work”.
While I enjoy playing with new languages and platforms, I’m by no means a Ruby developer. Heroku is a great service, but seemed somewhat inaccessible to me given that I’d have to work with an unfamiliar language in order to use it. It seems that Microsoft’s Azure platform is trying to fill this same niche for .NET developers, but my (admittedly limited) experiences with Azure have been far from smooth.
When I heard AppHarbor being referred to as a “Heroku For .NET / Azure Done Right” I headed over to their site and signed up for an account right away. I’ve been playing around with it for the past several days to see if it lives up to the hype. In short: I’m pretty impressed with what I’ve seen so far. This post outlines some of my initial findings.
Setup
The AppHarbor site is fairly minimalist in its design, which makes it very easy to get started. Upon landing on their home page visitors are immediately greeted with the message: “.NET Deployment Done Right … git push appharbor master”. Awesome.
If you’ve used a service like github before then getting up and running on AppHarbor will feel very natural. If you’ve never used github or git before it’ll take a minute to grok, but is completely worth it. For the rest of this post I’ll assume at least a basic familiarity with using git and the concept of remote repositories. There are tons of excellent resources out there for learning about git (I recommend the TekPub git series), so I won’t bother repeating that information here.
After signing up for a free account, head over to the msysgit project page to grab the latest and greatest version. I use git for some side projects and had a few issues trying to push to AppHarbor initially that were cleared up after upgrading to the latest version.
Creating And Deploying a ‘Hello World’ Application
Since I was digging into a new hosting platform I decided to go ahead and dig into the latest MVC 3 bits at the same time. The Web Platform Installer got me everything that I needed and I had a new boiler plate MVC 3 application up and running locally within a few minutes.
From here, initialize a new git repo in the solution folder where this solution and web project live. AppHarbor requires that the code you push to them contains a single solution file with a single web project (note that you can have other class library projects within that same solution for encapsulating code). I do most of my development this way so this approach is fine with me, but I did notice at least one thread in their support forum asking for multi-solution support. If I were them, this wouldn’t be at the top of my priority list for new features, so I have a feeling the single solution approach is probably going to stick for awhile.
Once you’ve initialized the git repo, added all of your files, and committed them to your local master repo, you’re ready to setup things on the AppHarbor side. Log into your AppHarbor account and go to the ‘Applications’ screen. Creating a new application is as easy as providing an application name. After creating the application you’re provided with a few quick steps for getting your code deployed and running in the AppHarbor environment.
Jump back into git and run the following command against your local master repo to create AppHarbor as a remote repository for your application (note that the git endpoint for your application can be found on the application details screen in AppHarbor):
git remote add appharbor <your application endpoint>
Now, deploying your application is as easy as:
git push appharbor master
You’ll be prompted for a password (which is the same as your AppHarbor account password) and provided with a status message informing you that your application has been queued for building. If you check the application page in AppHarbor you’ll be able to check on the status of the build:

When the build completes successfully, the code is automatically deployed and available to be run at your application’s URL (http://<app_name>.apphb.com).
You can view past builds and deployments and even choose to deploy an earlier build at any time, though if you had a database attached to your application that quick rollback could have some adverse side effects.
You can also view the details of each build and even get detailed console output to troubleshoot issues if they arise.
Environment Specifics
If you’re like me, you prefer having your entire development stack run locally when doing development work. This means that you like having your IDE, light-weight web server, and database all play nicely together on your development machine when you’re cranking out new features. This is great, but the configuration you use when developing locally is almost never the same configuration you need when the application is being used or tested by others.
AppHarbor has a built-in mechanism for allowing your code to determine which environment it’s running in and change configuration settings accordingly. All you have to do is add the following line to the ‘appSettings’ section of your web.config file:
<add key=”Environment” value=”Debug”/>
After you deploy your application, this value is automatically set to the value “Release”. Similarly, you can add the same line to the app.config file in your unit test project and it will have it’s value set to “Test” while AppHarbor runs your unit tests (more on that in the next section).
In the past I’ve maintained separate configuration files for each environment I work with and had my build process choose the right file for the environment being targeted. I don’t currently see any way to make that work with AppHarbor, so in a future post I’ll be digging into some potential alternatives. I’m thinking about defining a separate assembly in my solution that will be responsible for managing all of the various environment specific configuration values (e.g. connection strings) and return the proper value to the calling code based on the ‘environment’ value found in the main configuration file.
Unit Tests
Outside of being able to use git to deploy my applications, the other thing that really piqued my interest in AppHarbor is their ability to automatically run unit tests immediately following each build prior to deploying the application.
To make use of this feature, all you need to do is add a unit test project to your application and reference one of the supported unit test framework libraries. As of this writing, recent versions of nUnit and xUnit were supported.
AppHarbor finds these references and automatically runs all of the tests contained in those assemblies prior to deploying your code. If all of the tests pass, the code is automatically deployed. If one or more tests failed, the application remains on the last successful build.

Potential
While I’ve really only scratched the surface, I see a lot of potential for AppHarbor as a hosting platform. I love their convention-based approach to hosting and deploying applications: one solution with a web project, automatically running tests for any assembly that references a test framework, automatically grabbing and changing the ‘Environment’ application setting, etc. These conventions make it very easy to get up and running. Furthermore, these are all conventions that, for the most part, make a lot of sense. I haven’t seen anything about AppHarbor that would force you to write your application in a way that would prevent it from being run on a dedicated, semi-dedicated, virtual, or shared hosting environment with no changes. To me, that makes choosing this platform a lot more attractive than some of the alternatives out there. I plan on digging in a bit deeper in a future post to explore some more advanced concerns like file system storage (if that’s even an option), working with databases, and session state.
Sunday, January 09, 2011
One year ago I came to work for a company where the entire development team is 100% “remote”; we’re spread over 3 time zones and each of us works from home. This seems to be an increasingly popular way for people to work and there are many articles and blog posts out there enumerating the advantages and disadvantages of working this way. I had read a lot about telecommuting before accepting this job and felt as if I had a pretty decent idea of what I was getting into, but I’ve encountered a few things over the past year that I did not expect. Among the most surprising by-products of working from home for me has been a dramatic reduction in the amount of paper that I use on a weekly basis.
Hoarding In The Workplace
Prior to my current telecommute job I worked in what most would consider pretty traditional office environments. I sat in cubicles furnished with an enormous plastic(ish) modular desks, had a mediocre (at best) PC workstation, and had ready access to a seemingly endless supply of legal pads, pens, staplers and paper clips. The ready access to paper, countless conference room meetings, and abundance of available surface area on my desk and in drawers created a perfect storm for wasting paper. I brought a pad of paper with me to every meeting I ever attended, scrawled some brief notes, and then tore that sheet off to keep next to my keyboard to follow up on any needed action items. Once my immediate need for the notes was fulfilled, that sheet would get shuffled off into a corner of my desk or filed away in a drawer “just in case”. I would guess that for all of the notes that I ever filed away, I might have actually had to dig up and refer to 2% of them (and that’s probably being very generous). That said, on those rare occasions that I did have to dig something up from old notes, it was usually pretty important and I ended up being very glad that I saved them. It was only when I would leave a job or move desks that I would finally gather all those notes together and take them to shredding bin to be disposed of. When I left my last job the amount of paper I had accumulated over my three years there was absurd, and I knew coworkers who had substance-abuse caliber paper wasting addictions that made my bad habit look like nail-biting in comparison.
A Product Of My Environment
I always hated using all of this paper, but simply couldn’t bring myself to stop. It would look bad if I showed up to an important conference room meeting without a pad of paper. What if someone said something profound! Plus, everyone else always brought paper with them. If you saw someone walking down the hallway with a pad of paper in hand you knew they must be on their way to a conference room meeting. Some people even had fancy looking portfolio notebook sheaths that gave their legal pads all the prestige of a briefcase. No one ever worried about running out of fresh paper because there was an endless supply, and there certainly was no shortage of places to store and file used paper. In short, the traditional office was setup for using tons and tons of paper; it’s baked into the culture there. For that reason, it didn’t take long for me to kick the paper habit once I started working from home.
In my home office, desk and drawer space are at a premium. I don’t have the budget (or the tolerance) for huge modular office furniture in my spare bedroom. I also no longer have access to a bottomless pit of office supplies stock piled in cabinets and closets. If I want to use some paper, I have to go out and buy it. Finally (and most importantly), all of the meetings that I have to attend these days are “virtual”. We use instant messaging, VOIP, video conferencing, and e-mail to communicate with each other. All I need to take notes during a meeting is my computer, which I happen to be sitting right in front of all day. I don’t have any hard numbers for this, but my gut feeling is that I actually take a lot more notes now than I ever did when I worked in an office. The big difference is I don’t have to use any paper to do so. This makes it far easier to keep important information safe and organized.
The Right Tool For The Job
When I first started working from home I tried to find a single application that would fill the gap left by the pen and paper that I always had at my desk when I worked in an office. Well, there are no silver bullets and I’ve evolved my approach over time to try and find the best tool for the job at hand. Here’s a quick summary of how I take notes and keep everything organized.
- Notepad++ – This is the first application I turn to when I feel like there’s some bit of information that I need to write down and save. I use Launchy, so opening Notepad++ and creating a new file only takes a few keystrokes. If I find that the information I’m trying to get down requires a more sophisticated application I escalate as needed.
- The Desktop – By default, I save every file or other bit of information to the desktop. Anyone who has ever had to fix their parents computer before knows that this is a dangerous game (any file my mother has ever worked on is saved directly to the desktop and rarely moves anywhere else). I agree that storing things on the desktop isn’t a great long term approach to keeping organized, which is why I treat my desktop a bit like my e-mail inbox. I strive to keep both empty (or as close to empty as I possibly can). If something is on my desktop, it means that it’s something relevant to a task or project that I’m currently working on. About once a week I take things that I’m not longer working on and put them into my ‘Notes’ folder.
- The ‘Notes’ Folder – As I work on a task, I tend to accumulate multiple files associated with that task. For example, I might have a bit of SQL that I’m working on to gather data for a new report, a quick C# method that I came up with but am not yet ready to commit to source control, a bulleted list of to-do items in a .txt file, etc. If the desktop starts to get too cluttered, I create a new sub-folder in my ‘Notes’ folder. Each sub-folder’s name is the current date followed by a brief description of the task or project. Then all files related to that task or project go into that sub folder. By using the date as the first part of the folder name, these folders are automatically sorted in reverse chronological order. This means that things I worked on recently will generally be near the top of the list. Using the built-in Windows search functionality I now have a pretty quick and easy way to try and find something that I worked on a week ago or six months ago.
- Dropbox – Dropbox is a free service that lets you store up to 2GB of files “in the cloud” and have those files synced to all of the different computers that you use. My ‘Notes’ folder lives in Dropbox, meaning that it’s contents are constantly backed up and are always available to me regardless of which computer I’m using. They also have a pretty decent iPhone application that lets you browse and view all of the files that you have stored there. The free 2GB edition is probably enough for just storing notes, but I also pay $99/year for the 50GB storage upgrade and keep all of my music, e-books, pictures, and documents in Dropbox. It’s a fantastic service and I highly recommend it.
- Evernote – I use Evernote mostly to organize information that I access on a fairly regular basis. For example, my Evernote account has a running grocery shopping list, recipes that my wife and I use a lot, and contact information for people I contact infrequently enough that I don’t want to keep them in my phone. I know some people that keep nearly everything in Evernote, but there’s something about it that I find a bit clunky, so I tend to use it sparingly.
- Google Tasks – One of my biggest paper wasting habits was keeping a running task-list next to my computer at work. Every morning I would sit down, look at my task list, cross off what was done and add new tasks that I thought of during my morning commute. This usually resulted in having to re-copy the task list onto a fresh sheet of paper when I was done. I still keep a running task list at my desk, but I’ve started using Google Tasks instead. This is a dead-simple web-based application for quickly adding, deleting, and organizing tasks in a simple checklist style. You can quickly move tasks up and down on the list (which I use for prioritizing), and even create sub-tasks for breaking down larger tasks into smaller pieces.
- Balsamiq Mockups – This is a simple and lightweight tool for creating drawings of user interfaces. It’s great for sketching out a new feature, brainstorm the layout of a interface, or even draw up a quick sequence diagram. I’m terrible at drawing, so Balsamiq Mockups not only lets me create sketches that other people can actually understand, but it’s also handy because you can upload a sketch to a common location for other team members to access.
I can honestly say that using these tools (and having limited resources at home) have lead me to cut my paper usage down to virtually none. If I ever were to return to a traditional office workplace (hopefully never!) I’d try to employ as many of these applications and techniques as I could to keep paper usage low. I feel far less cluttered and far better organized now.