Bare Minimum Internationalization of Software

Internationalization isn’t only about dealing with other nationalities and languages. It’s about creating software for a multicultural world. Even if the software you’re testing won’t be translated entirely into another language, it still should meet some basic requirements for international visitors.

I love internationalization (or “ i18n”). I think one of the key promises of the Internet is that it brings every kind of information or service to everybody anywhere in the world, regardless of location or nationality. Companies that are marketing their software strictly to a domestic audience often think they don't have to be concerned with i18n. However, while the Internet allows us to reach out and connect with the rest of the world, it also allows the rest of the world to connect to us. While your efforts to sell software may only be domestic, the people it will find its way to—and the data that will find its way to it—are global.

Moreover, the notion of i18n as something that's only a concern when dealing with other nationalities is mistaken. Many a software team has had their first brush with character sets and internationalization when they discovered that their software turns the directional quotation marks (or “smart quotes”) emitted by Microsoft Word into gibberish. Indeed, the term “internationalization” is rather a misnomer. It's not preparing your software for going abroad; it's making it ready to work in a multilingual, multicultural world.

So, if you are testing a piece of software that's not going to be multilingualized or translated (at least, not yet), what is the bare minimum i18n it still needs to do?

First and foremost, the application must not crash or malfunction when fed Unicode or multibyte data. It may sound outrageous in this day and age, but some software reacts quite angrily to non-ASCII data. If you put an application on the public Internet, you can be certain that it will be accessed from all over the planet, whether that is your intent or not. And, as alluded to above, even in the case of an application deployed only on a company's intranet or desktop software with limited geographical distribution, "international" data will find its way into your application sooner or later. It needs to be able to cope.

Second, users must be able to use Unicode or multibyte data anywhere they can enter text into the application. This includes not only straightforward data fields such as the user's name, address, and so forth, but also things like passwords—especially since giving users a larger range of potential characters to select from increases the security of their passwords. In fact, any means by which textual data can be entered into the application needs to be Unicode aware. For instance, if the software you are testing can receive email or import Microsoft Word documents, those avenues of data ingress need to be tested as well. Things to watch for here include mojibake (for instance, characters landing in the database wrong), filtering out multibyte or non-ASCII characters, or input that's silently ignored altogether. Autocompletion mechanisms are often problematic. When fed non-ASCII text, they may offer incorrect suggestions or none at all. With passwords in particular, it's a good idea to make sure that users can create a word that includes non-ASCII characters, can log in with that same password, and can't log in with some common failure modes, like a blank password, the password with all non-ASCII characters stripped out, and so forth.

Third, users must be able to get out of the application whatever data they put into it, intact. Non-ASCII text must display correctly, be able to be edited, and be processed by the software without getting mangled. Mojibake is the most common issue. In addition to text that's displayed directly by your program, be sure to check output avenues like sending email or exporting

About the author

Rick Scott's picture Rick Scott

Rick Scott is a Canadian philosopher-geek who's profoundly interested in how we can collaborate to make technology work better for everyone. He's an incorrigible idealist, an open source contributor, and a staunch believer in testing, universal access, and the hacker ethic. When he's not in front of a computer, you'll find him out on the hiking trails, in the kitchen turning out cupcakes, or cleaning up his viola technique in the basement.

AgileConnection is one of the growing communities of the TechWell network.

Featuring fresh, insightful stories, is the place to go for what is happening in software development and delivery.  Join the conversation now!