Designing URLs for Multilingual Websites

Usually when starting a new web site, a company will design the site for use in only one language. As companies grow, however, it is often prudent to internationalize the site and make it accessible in many languages. If the site is to become a large commercial success in countries around the world, this step is necessary and crucial.

Still too often companies show their naïvety by failing to prepare for internationalization and later incur large costs when the entire site has to be retrofitted to support multiple languages. Of course if one were creating a site with something like Django, the site would be completely ready for translation into other languages and would require no retrofit down the line.

One interesting aspect of designing a multilingual web site is deciding how to represent the language choice in the URLs. In this article I want to explore several possible methods for indicating a language choice for a given resource with a URL on a multilingual web site and decide on the best one.

Below is a non-exhaustive list of methods for selecting a language based on my own brainstorming and collaboration with my friend Ted. All methods involve accessing the resource at /bar/baz on the domain example.com.

Language-specific Sub-domains (1)

One can use two-letter language codes as sub-domains on the main site’s domain name.

Examples

  • en.example.com/bar/baz
  • de.example.com/bar/baz

Evaluation

  • It makes further use of sub-domains for other purposes cumbersome and unpleasant — think of api.de.example.com vs. de.api.example.com and the other restrictions it puts on future use of sub-domains in general.
  • Requires DNS management, which could get complicated.
  • Clean and simple.
  • Allows for direct permalinks.

Modified Directory Structure (2)

The directory structure of the resources on the site can contain the language code, whether these are actual directories or achieved through some programmatic means.

Examples

  • example.com/en-US/bar/baz
  • example.com/en-GB/bar/baz
  • example.com/de/bar/baz

Evaluation

  • Not very semantic — the resource being accessed at /bar/baz is not underneath the language code in a hierarchical sense.
  • May be difficult to maintain if site content is split apart in mirrored sections under each language directory.
  • Aesthetically confusing and ugly.
  • Allows for direct permalinks.

Language Code in Querystring (3)

It’s possible to just set a language variable in the querystring of the URL.

Examples

  • example.com/bar/baz?lang=en-US
  • example.com/bar/baz?lang=en-GB
  • example.com/bar/baz?lang=de

Evaluation

  • Issues with proper caching of pages.
  • Search engines will not store the querystring in their links to the resource.
  • Not semantic when /bar/baz is considered as a static resource — one shouldn’t pass querystring variables for processing to a static resource.
  • Messy and difficult to maintain when using other (properly employed) querystring variables.
  • Ugly.
  • Allows for direct permalinks.

Country-specific TLDs (4)

A different domain for each country representing a language can be purchased.

Examples

  • example.us/bar/baz
  • example.co.uk/bar/baz
  • example.de/bar/baz

Evaluation

  • Expensive — recurring registration fees.
  • Country-specific TLDs speak more to localization than simple language selection — there are several countries where more than one language could be selected; also brings up geographic concerns with respect to server locations, etc.
  • May be difficult to manage inter-language linking (e.g. example.us/bar/baz to example.de/bar/quux).
  • Requires DNS management.
  • Allows for direct permalinks.

Pure Cookie-based Preference (5)

One could just have a UI element on the site itself to allow the selection of a language preference which is then stored in a cookie on the user’s computer. This removes the need for representation of the language in the URL itself.

Examples

  • example.com/bar/baz with cookie (lang=en-US)
  • example.com/bar/baz with cookie (lang=en-GB)
  • example.com/bar/baz with cookie (lang=de)

Evaluation

  • Won’t work with user agents that don’t support cookies.
  • Doesn’t allow for direct permalinks to a resource in a specific language.
  • Invisible and therefore clean.

Use of Accept-Language HTTP Header (6)

Most user agents (such as web browsers) send an HTTP header, Accept-Language, that will indicate which natural languages the agent is ready to support. Often, international users will have their native language set first in this list of languages sent by the agent.

Examples

  • example.com/bar/baz with Accept-Language: en-us or en
  • example.com/bar/baz with Accept-Language: en-gb
  • example.com/bar/baz with Accept-Language: de

Evaluation

  • Won’t work with user agents that don’t send the Accept-Language HTTP header.
  • Makes the control of and transitioning between languages very difficult or impossible from the site’s point of view.
  • Won’t cater to e.g. German users who wish to read a site in English.
  • Also doesn’t allow for direct permalinks to a resource in a specific language.
  • Invisible and therefore clean.

Semi-colon Path Parameter at End of Path (7)

According to RFC 3986 – URI: Generic Syntax § 3.3, a path in a URI (or a URL) can contain semi-colons to specify path parameters and values to each segment of the path. One could use this in the simplest way possible to tack on a language code to the end of the path to a resource.

Examples

  • example.com/bar/baz;en-US
  • example.com/bar/baz;en-GB
  • example.com/bar/baz;de

Evaluation

  • RFC 3986 recommends using semi-colon parameters along with values, like example.com/bar/baz;lang=en-US, which is uglier and seems unnecessary.
  • Use of semi-colon parameters is unfamiliar to many developers and users, so therefore might fail in some agents or URL libraries.
  • Semantically rich — a resource is being retrieved at /bar/baz and extra information is being supplied via a parameter intended for such a purpose. This is different from the querystring approach because the semi-colon parameterized URL will be used as-is by search engines and other consumers of URLs.
  • Allows for direct permalinking.
  • Lends itself to easy generic full-stack handling in web application code — the language codes can be stripped on the way in and added on the way out, so as not to interfere with the internal URL handling in the least.

Comma Path Parameter at End of Path (8)

RFC 3986 also states that one can use a comma-delimited path parameter on segments of the path in the URL. Furthermore, it recommends that these parameters be used when only a value (and not a key) needs to be supplied.

Examples

  • example.com/bar/baz,en-US
  • example.com/bar/baz,en-GB
  • example.com/bar/baz,de

Evaluation

  • Also unfamiliar to developers and users, though could cause fewer problems with URL libraries because of its slightly wider usage.
  • Recommended for precisely this use by RFC 3986.
  • Semantically rich for the same reasons as the semi-colon approach.
  • Allows for direct permalinking.
  • Also lends itself to easy generic full-stack handling in web application code.

Update (2007-02-15):

Mike Schinkel has outlined five methods which fall into two distinct categories for my purposes: including the language code in the resource name (now #9) and using a modified directory structure (my #2).

Language Code in the Resource name (9)

Given the resource at /bar/baz, the resource name is baz. This method suggests modifying the resource name to include the language code, whether it is added as a prefix or suffix and what the delimiter is (as long as the delimiter is a valid URL path character) are inconsequential.

Examples

  • example.com/bar/baz.en-US
  • example.com/bar/baz-en-GB
  • example.com/bar/de.baz

Evaluation

  • Poor semantics: when one wishes to request the resource at /bar/baz, the path component of the URL should be exactly /bar/baz, not something entirely different that must be modified to reflect the true resource being accessed.
  • Not differentiable as a resource from the same resource in a different language. In other words, /bar/baz.en-US is not an entirely different resource from /bar/baz.de, but rather a different way of serving the same resource, so the URL path components should be identical.
  • Allows for direct permalinking.

This added method does not change my conclusions below.

As I said before, this list is non-exhaustive and any additions should be emailed to me. There isn’t going to be any “silver bullet” approach that gives a perfect clean solution to the entire dilemma, but we can learn from the methods that seem to solve the problem in the best way with the most ease. It’s also possible that a combination of some of these techniques together will achieve the best overall solution. Before making a decision, it seems prudent to see what other sites are using that have already solved this problem.

Case Studies: High-profile Multilingual Web Sites

  • amazon.com: Country-specific TLDs (4)
  • google.com: Modified Directory Structure (2), Country-specific TLDs (4)
  • yahoo.com: Language-specific Sub-domains (1)
  • ebay.com: Country-specific TLDs (4)
  • wikipedia.com: Language-specific Sub-domains (1)

These observations were made quickly and without detailed investigation. It is quite possible that these sites use combinations of (5) and (6) as well, but those are harder to test for. As with the list of methods, feel free to comment with other web sites and the methods they use.

After examining this list of possibilities, it is clear to me that Comma Path Parameter at End of Path (8) is the preferred method and I will certainly use that in future projects of my own. Semi-colon Path Parameter at End of Path (7) is similar, but there is no compelling reason I can see for using it over (8). After those two, the closest one is probably Language-specific Sub-domains (1). This method is used with great success by both Yahoo! and Wikipedia. It does have its fair share of issues, however, and that’s what takes it out of the running for my ideal solution.

Originally published:
January 12, 2007

Archived at:
http://h3h.net/technology/designing-urls-for-multilingual-websites