Proposal talk:Default Language Format

From OpenStreetMap Wiki
Jump to navigation Jump to search

Language:default or default:language?

This is from the author of the proposal: I've found out that language:xx has been used in the form language:en=main to describe the main language taught at a language school. It's not a very common tag, but there may be a small risk of confusion in using the same "namespace." Also, there is a proposal to have all defaults (eg maxspeed for highways) start with "default:xx", though this seems to be stalled or abandoned.

Please comment if you think it would be better to change the order of this tag to default:language=** instead of language:default=**, thanks!--Jeisenbe (talk) 13:03, 27 September 2018 (UTC)

Minor phrasing nitpick, scripts

"If the language can be written in more than one script, a qualifier can be added to specify the script format." Usually it depends on the script whether it can express a certain language. It should be rephrased "If more than one script is common in the area, a qualifier..." --Dieterdreist (talk) 13:25, 24 September 2018 (UTC)

Thank you for the good suggestion. I don't quite understand what you mean by "Usually it depends on the script whether it can express a certain language." It is certainly possible to write an English word in Arabic script or Japanese Katekana, and Chinese can be written in Latin script (i.e. Pinyin). But I agree with your rephrasing. --Jeisenbe (talk) 00:22, 25 September 2018 (UTC)
‘’”It is certainly possible to write an English word in Arabic script or Japanese Katekana, and Chinese can be written in Latin script ”’’ - that’s what I meant, you can write any language in more than one script (or maybe almost any)—Dieterdreist (talk) 14:54, 25 September 2018 (UTC)

Multiple values

"More than one language code can be listed, separated with a semicolon, if the local community uses more than one language on signs or by consensus." There was another idea (by Christoph Hormann) to tag typical format strings, which would IMHO be very useful for data consumers and would render the semicolon multivalue fields superfluous. --Dieterdreist (talk) 13:25, 24 September 2018 (UTC)

This proposal is meant to be nearly the same as Christoph Hormann's idea (http://blog.imagico.de/you-name-it-on-representing-geographic-diversity-in-names). The only difference is that the tag is language:default instead of language_format (from Christoph's blog), and I've also added details about the option to use more than one language tag, and to tag boundary=aboriginal_lands in addition to boundary=administrative. I'm not sure how Christoph's proposal makes it superfluous to have more than one langauge tagged. In Brussels they are going to need to be able to specifiy more than one language in the default language format, for example.
I did consider putting the language code in the value; eg tag language:nl=default and language:fr=default for Brussels. But this leads to adding more tags, and I believe it is much harder to search for hundreds of distinct values (langauge:<iso code> for hundreds of languages) than to select one tag with multiple values (language:default=<iso code>)--Jeisenbe (talk) 00:22, 25 September 2018 (UTC)

Is this also about the spoken language?

Currently you seem to focus on signs, but what about places without signs, or without signs but local use of a specific language? Is spoken language relevant as well? --Dieterdreist (talk) 13:28, 24 September 2018 (UTC)

You are correct, in many places the name of a feature will only be used orally, including in my part of Indonesia where street signs are rare and no mountain or river has a sign. I had previously mentioned this but removed it to try to make the proposal page shorter and because several people on the tagging list wanted the focus to be on the names used on signs, for developed countries. But I'll add it back in now. --Jeisenbe (talk) 00:25, 25 September 2018 (UTC)

Why not adding tag variants for official language/s?

Official languages are easier to verify and merit tagging as well. The use of the tag will be more clearly restricted to administrative entities. The systematics are likely the same. I'd suggest to add something like official_language:default=code. --Dieterdreist (talk) 13:31, 24 September 2018 (UTC)

As stated in the "comments" section at the bottom of the page: "I initially included language:official=code and language:local=code in the draft proposal, but removed them to simplify the discussion." I agree that a tag like language:official=code would be useful, as would language:local=code for locally dominant non-official languages. I will be happy to propose those tags, after this proposal has been voted on. It would not be sufficient to tag official languages only; for example, neither the UK or the USA have an official language, but they clearly have a "default" national language (English), while some smaller administrative units have a different default language (Spanish in Puerto Rico, Navajo in northeastern Arizona on the reservation)--Jeisenbe (talk) 00:34, 25 September 2018 (UTC)

Tag format: name:language=* instead of language:default=*

I think the tag format should rather be name:language=* because the tag is supposed to specify the language (--> suffix) of the name (--> namespace) tags in a region, see Namespace. Note that TagInfo name:language=* has already been used 2,090 times.

By the way, thanks a lot for having created this proposal! In my opinion this tag would be very helpful. --SelfishSeahorse (talk) 17:29, 24 September 2018 (UTC)

Thank you for the link! I was not aware of this tag. It looks like usage is limited to Italy, Germany and Belgium, based on the map on taginfo. I think name:language=* could be confusing, because the other tags in the format name:xxx are actual names. A feature on a border, eg Mont Blanc, would then be tagged like this: "name:fr=Mont_Blanc", "name:it=Monte_Bianco", "name:language=fr_,_it" - this uses the same "namespace" for two different things.--Jeisenbe (talk) 00:43, 25 September 2018 (UTC)
I checked the pages for name:language, and the similar language:name; both proposed tags failed to be approved last year in voting, though both had some support. It's not easy to pick a perfect tag for this concept, unfortunately. "default:language", "language_format", "language:format", and "language:default" all could work, with default:language or language:default my main choices.
Would you agree to the use of default:language or language:default?--Jeisenbe (talk) 00:13, 28 September 2018 (UTC)
I could live with both, however default:language does have the advantage that it would get grouped with other default tags that follow the default:* syntax. --SelfishSeahorse (talk) 12:09, 28 September 2018 (UTC)

Please explain the difference with default_language key

Hi, thanks for working on multilingual support. Please add a section to explain how this effort is different from the default_language=* tag that's already in use on many large regions. Thanks! --Yurik (talk) 03:02, 26 September 2018 (UTC)

I'm sorry that I missed your wiki page. Perhaps I didn't see it because it has not been formally proposed?
I believe this proposal is substatially similar to your idea (and several other previous attempts), but I do think that support for multilingual names is important due to the situation on the ground in Brussels, Morocco, Hong Kong and other bilingual places that currently show two different names in the name=* tag.
I will ask the tagging list if we should change the name of this tag to default_language=code or language_default=code, because I've also discovered that language:code=main/yes is already used for language schools.--Jeisenbe (talk) 10:2, 25 September 2018 (UTC)

Avoid duplication

One thing that I would like to avoid at all cost is people going out and adding, for every street in Germany, a "name:de" tag that duplicates the current "name" tag. I have read through the proposal looking fro language that would discourage this, but didn't find any. On the contrary, the proposal expresses the hope that at some point in the future, a "name" tag would be obsolete. I think we need to find a way to stop people from adding such duplication that adds no value. --Woodpeck (talk) 06:35, 26 September 2018 (UTC)

This avoidance of duplication should be done on linguistic grounds, not simply on comparing strings. The capital of France is Paris in many languages; the presence of name=Paris is no reason to suppress name:de=Paris, although it could be a reason to consider suppressing name:fr=Paris. ----Csmale (talk) 07:29, 26 September 2018 (UTC)
How are database users to know what language the name=* tag is in? Should they assume that it is in Germany in Germany? This assumption is not always correct. It's a much less safe to assume that the name=* is in the official language in the USA (where regions have names in Spanish, French, Navajo, etc), or Brazil, Indonesia, Nigeria etc. This proposal will help, by adding a tag that specifies the default language in each area, but it won't be effective 100% of the time; some individual features have a name in a foreign language or a rare local language.
It can be useful to have name=* in an "international format"; for example name=Mississippi and name:en=Mississippi River instead of name=Mississippi River; in this case it makes sense to have a name= and name:en= tag as well.
However, I will edit the page to to clarify the situation where a name:code= actually needs to be added: in places with multilingual names, or where many of the names are not in a default language. I hope this is satisfactory--Jeisenbe (talk) 10:07, 25 September 2018 (UTC)

Language or locale?

What if a place has multiple names in the same language? Can the language code be extended to be a locale (language + territory)? I am thinking of the example of Londonderry/Derry in Northern Ireland. Both names are in English, but one could be said to use the Irish locale (en_IR) and the other the en_GB locale (yes I know it is in UK but not in GB). Also Lille (France) is called Rijsel in Dutch by the Flemish (nl_BE) but the Dutch know it as Lille (nl_NL). One language, two names. Big assumption here is that the name is a function of the territory and not the language variant.... --Csmale (talk) 07:46, 26 September 2018 (UTC)

I'm afraid this proposal won't solve the Londonderry-Derry naming dispute. But it won't make it any worse. It looks like this is a dispute about the name of the town in English, not really a dialectal or language difference? But yes, it is possible to have two language codes for two dialects of English, or the Flemish and Netherland Dutch languages / dialects.
This proposal is not making any new language codes. The plan is to re-use the same codes used for the name:code=* tags, with whatever qualifiers are necessary to show script or dialect variants, as already decided by the local community, generally the ISO code.
The default language format tag will indicate what language or languages are used locally. Since Lille is in France, I assume the name:fr=* will be the default, conversely I expect the name=* for Lille is in French, unless this right on the border? But it's certainly important to add all the variant names in each language, including both Dutch and Flemish. This is a good example of how this tag is useful: database users from Flanders and the Netherlands could choose to display the name in French or their own language, or both names.-- Jeisenbe (talk) 08:31, 26 September 2018 (UTC)

Consider using RFC 5646

Consider using RFC 5646 rather than ISO 639. The RFC is essentially a superset of the ISO, so all existing ISO usages comply and are valid. However, the RFC is the result of long experience of delivering written text in all the languages, dialects and scripts of the world over the Internet, whereas the ISO is more of a philological/linguistic classification ignoring many subtleties of actual usage. Since we're concerned with the written word, especially when "painting the label" it is a good idea to know which script is in use with a particular language in any given locality.

The RFC also offers the ability to specify IPA pronunciations, which is of utility on devices with text-to-speech (so you can tap a button and have your phone say the name to a local person). An example would be en-scotland-fonipa (English, pronounced as spoken in Scotland). A contrived, fictitious example from the RFC is "tlh-Kore-AQ-fonipa" - the pronunciation of a Klingon name written in a Korean script, as used in Antarctica.

At the very least, consider permitting the -fonipa suffix on ISO 639 codes to allow for IPA values, but if you're going to go that far you might as well go the whole way. Brian de Ford (talk) 14:30, 27 September 2018 (UTC)

I'm sorry that the wording of the section on language codes was not clear. This proposal does not seek to change the way that language codes are currently used in name:code=* tags. The idea is that the code in the default language format tag should match that used in the majority of name:code=* tags in the area. So the text in this proposal is supposed to match what is said on the Names page and the Multilingual Names page.
I agree that the linked proposal is a good way to make new codes for language formats that have a non-standard script or dialect, and therefore don't have an ISO code.
Would you like to edit the Names or Multilingual Names pages to add this suggestion about making new codes? Perhaps ask on the Tagging mailing list for opinions first? If this proposal is approved, the Wiki page for the new tag will have a section about the language codes which matches the advice in Names and Multilingual Names.
Re: IPA pronunciation, this has been proposed in name:pronounciation. I don't think it affects this proposal directly, but it's a good idea! If you have the time, perhaps you can turn it into a full proposal and get it approved?
Note that the default language format proposal would also help with pronouncing names, even without adding a separate name tag in IPA, because many names and destination sign values can be properly pronounced by knowing the language and script alone. Eg Spanish, Italian and other Romance languages, and Malaysian, Indonesian, Tagolog in Asia have very standardized pronounciations. But the IPA based name:xx:pronounciation tag would be helpful for many English names and some other languages.--Jeisenbe (talk) 00:45, 28 September 2018 (UTC)

Individual features may be lines/areas

Please note that individual features that may be named may also be lines/areas, so the default:language=* tag should not be limited to points and relations. Rmikke (talk) 17:54, 21 October 2018 (UTC)

What feedback do you have from communities in 2- or 3-language regions?

First of all I think it is good that you try to overcome problems in the tagging the default language, so that the name:code=* can be assigned accordingly by renderers or data analysis. In my perception that is not so much a problem in regions, where only one language is default and the name=* is in the default language.

However I wonder, whether the communities in dual- or multi-language regions were involved sufficiently in the discussion. There, by the lack of better tagging, often name=* is used for the names in the default languages, e.g. separated by a "-". So for the acceptance of your proposal not only in the voting, but in real tagging life it is important that those communities apply the new tagging of default:language=*. What feedback did you get especially from multilingual regions?

--Rainero