Much has been written about the technical ins and outs of localizing .NET applications with ResX files, but I think that most treatments of the topic understate the difficulty involved in localizing a project of substantial size and complexity. You probably already know how to define and consume global and local resources in a “Hello World” sized ASP .NET application and how to use the CurrentUICulture to ensure that the right version of a string is pulled from a ResX file to be shown to a user. Even if you don’t, those technical details are easy to look up and understand. In my opinion, the the difficult part of making software available in other languages is keeping your localizable strings (text that your application presents to human users) organized.
For the sake of this post let’s talk about a “real world” web application that’s composed of more than just a few pages. Consider a project with the following structure:
- A UI layer consisting of some ASP .NET views
- A layer for “presentation logic” containing presenters/controllers, view models, input validation, etc..
- A layer for “backend” stuff like business rules, workflow orchestration, data access, etc.
How would we ensure that a newly added feature to this application could be translated and used by people in other countries in the future if needed?
(Note that for the purposes of this post I’m only discussing string localization and translation which is only one aspect of preparing software for use in other countries and cultures)
Application Feature: Sprocket Order Entry
You work for a software company building and maintaining the web application described above. Your software isn’t currently used anywhere but the United States, but everyone has always known that an international user base was a possibility in the future. This application has been around for awhile and currently contains dozens of views and many shared code libraries that make up its “back end”. You are tasked with creating a new page for entering sprocket orders (we’ll refer to this as the Sprocket Order Entry feature). You might build this feature as follows:
- Create a new view and start laying out how you want things to look. You might use a mock up tool, or you might be spiking some HTML and CSS to see how things will come together. At this point you’re probably just using hard-coded English text for input prompts and other user-facing messages.
- You might then decide to try and wire up the view to the presentation logic. At this point you might be thinking about how to structure data that will be passed back and forth between the view and the server.
- Next you might tackle some of the back end stuff like enforcing business rules, coming up with a data access strategy, data normalization, raising domain events for other parts of the system to respond to etc.
Keeping in mind that this page might be used in other countries some day, you know that you should prepare this new feature for localization. To that end you start looking for localizable strings that should be moved into ResX files. Here’s what you might find:
- Labels, prompts, column headers, etc that are needed in the view that are now residing in a ResX file that is specific to the view in which they are used (e.g. SprocketOrderEntry.aspx.resx).
- Messages that can be returned by the presenter’s input validation logic. When you were writing the presenters you quickly hard-coded these messages in whenever validation failed and some way to pass the strings back to the view. Now that you’re looking at setting up ResX files you decide to just create a “SprocketOrderEntryPresenterResources.resx” file that can live in the presentation library.
- A warning message for a condition that can be raised from the “back end” classes when no subscribers are listening for the domain event that you’re raising. This isn’t an error condition per se, but you want to somehow inform your salespeople that the fulfillment system might not have received the order that they just entered, but that it should find it as soon as it comes back online. You don’t have a lot of user-facing messages in the “back end” library, so you’ve opted to have one “MiscResources.resx” file there that can collect the one-off strings that sometimes need to be returned from the back end processing.
The feature is done and the localizable strings are all in ResX files so you’re done now, right? Technically yes, but you’ve spread your localizable strings for this feature over three different ResX files spanning several logical layers of your application. Putting aside the fact that user-facing messages are primarily a UI concern and probably belong exclusively in the UI layer, having your localizable strings spread out across different logical layers of your application can cause a lot of pain when the translation process begins. This pain can stem from a variety of scenarios. Let’s begin by looking at what might happen when the Sprocket Order Entry feature needs to be translated into a new language.
Translating A Single Application Feature
The sprocket order entry form you built has been in-use internally at your company for a few months (or years) when you learn of the plan to open a branch office in Brazil. The office staff there really needs the order entry form in Brazilian Portuguese so you start figuring out what it will take to get the page translated. It’s been awhile since you’ve looked at or thought about the sprocket order entry feature (or maybe you aren’t even the developer that originally built it), but you know that the view for this page lives in a file called ‘SprocketOrderEntry.aspx’ and you see that there’s a SprocketOrderEntry.aspx.resx file in the project that appears to have the same text that you see when you bring up the page in your local development environment. There are, of course, many other .resx files in the codebase, but there’s not enough time or money to get everything translated right now so you just gather up the strings from resx file that belongs to this one page and send them out to get translated. When the translations come back you create a new "SprocketOrderEntry.aspx.pt-BR.resx file containing the translated messages and add it to your project. You take a quick peak at the page running under the pt-BR culture and see that it appears to be in Portuguese. It all looks good so you ship it off to Brazil to be used.
After a couple of days you get a report back from Brazil that there’s some English text showing up on the order entry screen sometimes and no one knows what it means. When it shows up the orders don’t appear to be getting saved and people are getting frustrated. The report includes a screenshot of the message: “You checked the box for ‘Include Delivery Notes’ but did not enter anything into the ‘Delivery Notes’ field.’”. The orders aren’t being saved because a basic validation rule is failing, but the validation error message can’t be understood by the users. You fire off a reply to the manager in Brazil to explain the message’s meaning and behavior and set out to fix the errant English message.
Your initial thought is that the translators might have missed a string or two in the SprocketOrderEntry.aspx.resx file so you crack it open to have a look. To your surprise you don’t see that English message defined there so you move into the code for the view to see if it’s been hard-coded. After a few minutes with no luck you decide to fire off a “Find in files…” command in your IDE to try and track down where that message is coming from. The exact message text turns up in a file called ‘SprocketOrderEntryPresenterResources.resx’ in the presentation class library. You open that file to discover a handful of messages related to the sprocket order entry form that have not been translated. You assemble those strings and send them along to the translators. The strings that need to be translated this time around only represent a fraction of the strings that were translated in the first round, but the translators have a minimum fee so the overall effort ends up being more expensive than it had to be.
At this point you have translated the strings for the Sprocket Order Entry feature from two out of the three logical layers in your application. Of course it’s only a matter of time before one of the strings defined in the “backend” layer makes its way onto the page resulting in another round-trip to the translators.
How could this have been avoided? In this over-simplified example it might not have been that difficult to manually walk through and review all of the code in each possible execution path that the order entry from can initiate, but that could become a very time consuming activity in an application of even moderate size and complexity. These reviews would also likely have to be repeated from time to time as the application changes and grows with new requirements and features. If you started calculating how much it would cost in developer resources to do these reviews, you might conclude that it would be cheaper and easier to just gather up all of the ResX files in your application and sending them to he translators. Deciding to translate all of the localizable strings at once can certainly help avoid the translation churn described above (providing that no hard-coded localizable strings have been left behind by developers) but it’s not a silver bullet.
After awhile the translation churn on the Sprocket Order Entry feature has died down and the company starts selling sprockets like mad in the Brazilian market. Soon enough there’s talk of establishing a new sprocket fulfillment center in Brazil to serve all of the orders coming in, which means that the fulfillment center staff will need to be able to use the Sprocket Fulfillment Center module in Portuguese. Having learned your lesson with the Sprocket Order Entry page, you spend start to carefully walk through all of the execution paths of the web pages that make up the Sprocket Fulfillment module to ensure that all of the localizable strings for the feature get translated in one pass. After only a few minutes it becomes clear to you that the Sprocket Fulfillment module relies on many other shared modules in your application several of which contain localizable strings that may or may not end up on the Sprocket Order Fulfillment UI. Rather than spend the time and resources to track down each and every possible string, you decide it’s easier to just send all of the ResX files in the application out to the Portuguese translators. While translating everything will take longer and cost more, it seems like a better alternative than manually tracking down all of the strings that might get used and risk missing some and requiring multiple round trips to the translators.
After a few days you get an e-mail from the translators asking for some clarifications about some of the strings to get translated. They have screenshots of the all of the screens in the application to help provide context for the strings that they are translating, but they don’t have any easy way to determine the context in which the strings from some of the “shared” libraries are used. Of course, you don’t know where each of the strings in the shared libraries are used off the top of your head either, so you’re left to grep through code in your IDE to trace the usage of the more context-sensitive English messages so that the Portuguese translators can come up with good translations. This is exactly the kind of exercise that you were hoping to avoid by opting to translate all of the ResX files instead of just the ones for the specific feature set that was needed. The Sprocket Fulfillment module needs to be available in Portuguese ASAP, so you have no choice but to spend the hours (or days?) needed to provide the translators with the context for the shared strings.
This exercise is tedious, but you take some solace in the fact that it’s not something that you’ll need to do all the time; once you’ve determined where all of these strings are used you can save that information for the next time context related questions come up. You send off answers to the translators questions and happily return to whatever it was that you were working on when the questions originally came in hoping to not have to spend any more time on translation-related work for awhile. Unfortunately, the next day you get some unexpected bad news from the translators.
When finding all of the places in the application where the shared localizable strings are used you found that many of them can surface on several different screens. This didn’t surprise or concern you at the time, as you figured that the translators could just translate these strings once and the translations would appear on all of the screens in which the strings are used. It turns out, however, that the Portuguese translation for some of these shared strings needs to differ slightly between the various places that the string is used. You now have to sink even more developer time into translations to split these shared messages into different UI-specific versions so that the proper Portuguese translation can be shown on each screen. Having localizable strings defined in shared code libraries has encouraged message reuse which can cause all kinds of headaches during the translation process.
From the MediaWiki documentation on localization:
“Although two concepts can be expressed with the same word in English, this doesn't mean they can be expressed with the same word in every language. "OK" is a good example: in English this is used for a generic button label, but in some languages they prefer to use a button label related to the operation which will be performed by the button. If you are adding multiple identical messages, please add message documentation to describe the differences in their contexts. Don't worry too much about the extra work for translators. Translation memory helps a lot in these while keeping the flexibility to have different translations if needed.”
The “Right Way”
Most, if not all, of the issues described in this post could be alleviated by ensuring that all localizable strings were stored in the UI layer of the application. If each view/screen in the application defined all of the localizable strings that it needed, it would be very easy to translate individual pages without translating the entire application, easily provide translators with the context in which each string is used, and ensure that each user-facing message is defined on its own and can be translated on its own.
Anytime I need to add a localizable string to an application, I ask myself this question:
Where will the message represented by this string originate and where will it ultimately be consumed?
Note the difference between the terms “message” and a “string” here: A message is a piece of information that the application needs to convey to the user while a string is just bit of text that can be used to convey that message. The Sprocket Order Entry application might convey the message that the ‘PO Authorized By’ field was left empty using the string “This customer requires that all POs be authorized prior to placing an order. Please enter a value for the ‘PO Authorized By’ field.” It might make perfect sense to have this message originate from a shared library in the business logic layer of the application, but the actual string won’t be consumed until control returns from that shared library all the way back to the view that originated the request.
It’s best to represent the message using an invariant number or code (e.g. an integer or enumeration value) for as long as possible until it needs to be interpreted by a human user on the other end of a display. By using invariant data to represent the message you’ll help ensure that all potential callers into the shared module will be able to convert that message into a string that is appropriate for the environment in which it will be displayed. For example, if the message needs to be displayed to a user in a web browser, it might make sense to show users a full sentence or two explaining what the message means and what action they might need to take. If the message needs to be displayed to a user on a handheld, device, then the a much shorter and/or specialized message might be more appropriate. Put another way, the closer the localizable strings are to the users of the application, the better.