Proposal:Names localization
Names localization | |
---|---|
Proposal status: | Draft (under way) |
Proposed by: | Xificurk |
Tagging: | lang=* |
Applies to: | , , , |
Definition: | How to deal with names in different languages. |
Statistics: |
|
Draft started: | 2012-07-31 |
Identifying the problem
Current status quo in specifying the name of an object is the simply use name=*, and if you want to specify translation into a different language use name:lang=*. Especially in areas, where more than one language is commonly used, it's not really clear what the bare name tag should contain. One way of resolving an eventual dispute is to go with the well established "on the ground" rule.
The unclear meaning of the bare name tag is also problematic from data consumer point of view. There are several posibilities how you might want to present the data:
- You don't care about different languages, you simply want some "default" name. For this purpose the bare name tag is fine.
- You would like to present the data primarily in a specific language, and only fallback to the "default" if it's not available. Again no problem - use name:lang=*, if not available fallback to name=*.
The problem arises when you would like to mix those two approaches, i.e. you want to provide the local names and if possible their translation into a specific language as well. For example - "London (Londres)" - because local signs will say "London", but in French it's "Londres". This might seem easy - just use name=* and where possible add name:lang=*, it's not! You obviously don't want to get "Paris (Paris)", because the local name is French and it doesn't make sense to repeat it twice. Now, the bare name tag is not that helpful, because you're not able to decide what language(s) it contains. In case of Paris, you could decide just on the basis that name value is equal to name:fr, but consider e.g. Brussels Brussels. Again, you don't want to repeat the French name twice like this "Bruxelles - Brussel (Bruxelles)".
To summarize, the problem is that the bare name tag without the knowledge of what language(s) it contains is not very useful.
Proposed solution
Generally speaking, users should be encouraged to use the tags with language specification, even though the bare name tag contains only one language, take a look e.g. at London London contains name:en=London in addition to its name=London.
Option 1: the simple one
Introduce a new tag lang=*, that would hold semicolon separated list of local languages. Note, that the object should have name:lang=* tags for all the listed languages. The content of the bare name tag should hold the names in all of those languages.
Examples:
- Brussels Brussels: name=Bruxelles - Brussel, name:fr=Bruxelles, name:nl=Brussel, lang=fr;nl, ...
- Prague Prague: name=Praha, name:cs=Praha, lang=cs, ...
Pros:
- Very simple, easy to understand solution.
- No changes to current tagging scheme.
- For objects that has only one name (in local language), it makes sense to continue using simply name=The Road without any additional tags.
Cons:
- Prone to errors, because you need to repeat the same values.
Option 2: don't repeat yourself
A more radical approach would be to completely deprecate bare name=*. The lang=* would hold a "template" from which the default name could be generated by a single regular expression (substituting all the language codes by appropriate name:lang=* values). The list of local languages could be obtained from the template again by a single regular expression (instead of splitting the string on semicolon as in Option 1, you split on regular expression "[^a-z_]+").
Examples:
- Brussels Brussels: lang=fr - nl, name:fr=Bruxelles, name:nl=Brussel, ..., name=Bruxelles - Brussel (deprecated, but should be left in place for now for compatibility reasons)
- Prague Prague: lang=cs, name:cs=Praha, ..., name=Praha (deprecated, but should be left in place for now for compatibility reasons)
Pros:
- No repetition of data => arguably less error prone, a tiny bit less load on OSM infrastructure.
- The algorithm for generating the default name could be implemented e.g. in osm2pgsql, thus its users that only care about the default names, would not have to change a thing in their code.
- You can apply the same logic to other language specific tags.
Cons:
- Less intuitive for data producers.
- A lot of objects have only one name (in local language), this solution complicates a bit their tagging, because it suggests using e.g. name:en=The Road, lang=en, instead of simply name=The Road.
Notes on transliterations
As this proposal basically relies on name:lang=*, it can accommodate the solution for transliterations as well, for details see Multilingual names.