Posts
48
Comments
146
Trackbacks
0
October 2011 Entries
Parsing Concerns–Part 2

In a previous post I explored some of the issues that can arise when parsing date and time values provided by users from different cultures and regions of the world. Correctly parsing and formatting different date formats is a very common internationalization concern in most software projects, but in this post I want to explore another area where parsing and formatting can cause some headaches.

Numbers Are The International Language, Right?

If your application deals with quantities of any kind then you probably have to both accept numeric input from users and later display that numeric input back to the user. Consider the following numbers:

  • 2.782

  • 7,482

What do these numbers represent to you? If you’re from the United States you’d probably interpret these numbers as:

  • 2.782 = a decimal representing two whole units plus 782/1000 partial units

  • 7,482 = an integer (whole number) representing seven thousand four hundred and eighty two whole units

.NET software running on my servers in the United States would interpret those numbers the same way:

//default current culture to en-US 
Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");             
var firstNumber = decimal.Parse("2.782");
//must explicitly allow for the thousands separator
var secondNumber = int.Parse("7,482", NumberStyles.AllowThousands); 

 

The code above runs successfully and parses both numeric values without any loss of precision. So what happens if this same code is running under a culture other than en-US?

 

Thread.CurrentThread.CurrentCulture = new CultureInfo("de-DE");
var firstNumber = decimal.Parse("2.782");
var secondNumber = int.Parse("7,482", NumberStyles.AllowThousands);

 

Now that the culture has been changed to de-DE (German),something interesting happens. The number formatting rules for the de-DE culture in .NET specify that the decimal symbol is a comma and the thousands separator is a period (basically the opposite of the en-US rules). When I was first experimenting with number parsing in an internationalized application I would have expected both of these parse calls to throw an exception due to improper formatting. As I stated in my last post about parsing, a thrown exception would be a good thing in this case, as it would call attention to some parsing code that is possibly not ready to deal with values from different cultures. Unfortunately in this case, the first parsing call does not throw an exception. Instead, the first call parses the value “2.782” and interprets the “.” as a thousands separator. The resulting .NET decimal value is now two-thousand seven-hundred and eighty-two meaning that the originally intended value has been corrupted and lost.

How Do You Deal With This?

The moral of my last post about parsing and formatting boiled down to: “Know Your Audience”, and this post about parsing and formatting is no different. It’s critical to know where numeric values are coming from and culture the source of the values belongs to.  It also helps to use the most precise parsing logic that you can so that there is little room for ambiguity. In the example above, if we use the decimal.Parse overload that accepts a NumberStyles enumeration value and provide NumberStyles.Float, the call to parse running under the German culture will fail, as it will be expecting a proper decimal separator. It’s easy to ignore all of those “extraneous” parameters to the Parse and Format methods in .NET, but they really can have a big impact on how your code behaves. When in doubt, look it up or fire up a quick LINQpad session and test out some different variations. If there’s a silver-bullet solution to problems like this I certainly haven’t found it and every application/scenario likely calls for a slightly different approach. The point of post was to call attention to an oddity of decimal parsing that can cause issues in multi-cultural applications.

What’s Next?

While being more restrictive in parsing logic is good for eliminating ambiguity and mitigating the risk of data being improperly parsed, it can lead to a poor user experience. Disallowing the use of thousands separators or using a consistent decimal separator across all cultures is a bit user-hostile (and I’ve certainly been guilty of it). In a future post I want to explore some strategies for allowing users to enter the same data in a variety of ways that make sense to them while still maintaining strict parsing and validation.

Posted On Sunday, October 2, 2011 1:15 PM | Comments (0)
Meta
Tag Cloud