Usually when starting a new web site, a company will design the site for use in only one language. As companies grow, however, it is often prudent to internationalize the site and make it accessible in many languages. If the site is to become a large commercial success in countries around the world, this step is necessary and crucial.
Still too often companies show their naïvety by failing to prepare for internationalization and later incur large costs when the entire site has to be retrofitted to support multiple languages. Of course if one were creating a site with something like Django, the site would be completely ready for translation into other languages and would require no retrofit down the line.
One interesting aspect of designing a multilingual web site is deciding how to represent the language choice in the URLs. In this article I want to explore several possible methods for indicating a language choice for a given resource with a URL on a multilingual web site and decide on the best one.
Below is a non-exhaustive list of methods for selecting a language based on my own brainstorming and collaboration with my friend Ted. All methods involve accessing the resource at /bar/baz on the domain example.com.
One can use two-letter language codes as sub-domains on the main site’s domain name.
The directory structure of the resources on the site can contain the language code, whether these are actual directories or achieved through some programmatic means.
It’s possible to just set a language variable in the querystring of the URL.
A different domain for each country representing a language can be purchased.
One could just have a UI element on the site itself to allow the selection of a language preference which is then stored in a cookie on the user’s computer. This removes the need for representation of the language in the URL itself.
lang=en-US)lang=en-GB)lang=de)Accept-Language HTTP Header (6)Most user agents (such as web browsers) send an HTTP header, Accept-Language, that will indicate which natural languages the agent is ready to support. Often, international users will have their native language set first in this list of languages sent by the agent.
Accept-Language: en-us or enAccept-Language: en-gb Accept-Language: deAccept-Language HTTP header.According to RFC 3986 – URI: Generic Syntax § 3.3, a path in a URI (or a URL) can contain semi-colons to specify path parameters and values to each segment of the path. One could use this in the simplest way possible to tack on a language code to the end of the path to a resource.
RFC 3986 also states that one can use a comma-delimited path parameter on segments of the path in the URL. Furthermore, it recommends that these parameters be used when only a value (and not a key) needs to be supplied.
Update (2007-02-15):
Mike Schinkel has outlined five methods which fall into two distinct categories for my purposes: including the language code in the resource name (now #9) and using a modified directory structure (my #2).
Given the resource at /bar/baz, the resource name is baz. This method suggests modifying the resource name to include the language code, whether it is added as a prefix or suffix and what the delimiter is (as long as the delimiter is a valid URL path character) are inconsequential.
This added method does not change my conclusions below.
As I said before, this list is non-exhaustive and any additions should be emailed to me. There isn’t going to be any “silver bullet” approach that gives a perfect clean solution to the entire dilemma, but we can learn from the methods that seem to solve the problem in the best way with the most ease. It’s also possible that a combination of some of these techniques together will achieve the best overall solution. Before making a decision, it seems prudent to see what other sites are using that have already solved this problem.
These observations were made quickly and without detailed investigation. It is quite possible that these sites use combinations of (5) and (6) as well, but those are harder to test for. As with the list of methods, feel free to comment with other web sites and the methods they use.
After examining this list of possibilities, it is clear to me that Comma Path Parameter at End of Path (8) is the preferred method and I will certainly use that in future projects of my own. Semi-colon Path Parameter at End of Path (7) is similar, but there is no compelling reason I can see for using it over (8). After those two, the closest one is probably Language-specific Sub-domains (1). This method is used with great success by both Yahoo! and Wikipedia. It does have its fair share of issues, however, and that’s what takes it out of the running for my ideal solution.
Originally published:
January 12, 2007
Archived at:
http://h3h.net/technology/designing-urls-for-multilingual-websites
Let me know what you think by emailing me.