Semi-colon value separator: Difference between revisions
Minh Nguyen (talk | contribs) m (→Relations) |
Minh Nguyen (talk | contribs) (→Space character padding: Updated this section to reflect developments since 2010) |
||
Line 30: | Line 30: | ||
=== Space character padding === |
=== Space character padding === |
||
Normally, the delimiter is a semicolon without a space before or after, for example, {{Tag|ref||B500;B550}}. However, the [[Key:opening hours/specification#normal rule separator|opening hours]] and [[conditional restrictions]] syntaxes require a space after the semicolon. |
|||
Often we use semi-colon separated values without any additional spacing (example: {{Tag|ref||B500;B550}}), however it is possible to add a space character after each of the <code>;</code> characters (example: {{Tag|ref||B500; B550}}). This is particularly true when tagging [[Opening times]]. This is currently an inconsistency between JOSM and Potlatch (both versions) in their approach to automatic value separating. |
|||
An older tagging style placed a space character after each of the <code>;</code> characters in other contexts for human-readability, for example, {{Tag|ref||B500; B550}}. [[Potlatch]] automatically introduces a <code>; </code> when merging two ways. [https://github.com/openstreetmap/potlatch/blob/40b208026d0fe3261856c0fb5cba4266dbd1147c/way.as#L907][https://github.com/openstreetmap/potlatch2/blob/a38498b2a9433a405e3bf1a0be49ee77f6087b0b/net/systemeD/halcyon/connection/Entity.as#L459][https://github.com/systemed/potlatch3/blob/b2e550058af01c0ea159830e9a19ee846df06b70/net/systemeD/halcyon/connection/Entity.as#L464] However, this usage became insignificant by 2013. [https://github.com/openstreetmap/iD/issues/941] iD automatically removes a space after a semicolon, except in keys that require it. This is currently an inconsistency between JOSM and Potlatch (both versions) in their approach to automatic value separating.{{clarify}} |
|||
=== Escaping with ';;' === |
=== Escaping with ';;' === |
Revision as of 19:31, 5 January 2023
We use a semi-colon value separator (the ;
character) in our tag values in some situations, but avoid them in others. This can be necessary when a single element needs to take multiple values for the same key.
Current applications (OSM "data consumers") can handle such semiColon Separated Values (CSV) without problems, as long as they are used appropriately. Older software from the "early days" of OSM had more problems. When such semi-colon separation is used in tags where they are not expected, software might handle them in unintended ways, such as treating the whole string as one value or considering only the first part of the concatenation.
Examples for established uses
- Sections of a road that are designated multiple references, e.g. ref=B500;B550 for a road signposted as both B500 and B550. You would only do this if the identical section of road carries both ref values. However, if there is any point on this road section where the ref changes from one to the other, then you would place a node and split the way at that point.
- Complex values that evidently cannot be represented using subkeys (notably when they are unordered lists of items) may use semicolons, e.g.:
- opening_hours=Tu-Fr 08:00-18:00;Mo 09:00-18:00;Sa 09:00-12:00;closed Aug
- turn=* lanes on roads can have several turn directions for the same lane, e.g. turn:lanes=left;through|through|through;right
- In the case of additional, describing tags, there is often no better way to tag diverse properties and combinations of them.
Here, semicolons are in wide use for these 'detail' tags where several values are common, e.g.:
When NOT to use
On important "top-level" tags that define what an element is avoid ;
separated values whenever possible. Examples are highway=*, amenity=*, leisure=*, landuse=*, and natural=coastline.
Don't use them in your mapping, and don't propose them on the wiki if there are better ways of representing things. This is because use of semi-colons as value separators is contrary to the aim of keeping it simple both for data contributors (mappers) and data users. For the sake of new contributors and anyone trying to use the data (people building software for rendering, searching, "find my nearest cafe" mobile apps, etc) we should keep at least basic data directly usable.
In situations where you have multiple values, there are normally a couple of alternative approaches:
- Choose one of the values: Take the overriding "primary" value, and go with that. Example: You're mapping something which is a cafe but also a bar. It's much more helpful to just pick amenity=cafe or amenity=bar (look at the cafe/bar, and make a choice: Is it primarily a cafe, or primarily a bar?) It is not a good idea to map it as amenity=cafe;bar.
- Split the element: Separate things out into distinct features to allow them to be tagged separately with normal tags. Example: You're mapping a library which has a cafe inside it. Place a node for the cafe, and then either represent the library (a larger building) as an area instead, or just as a separate node. It is not a good idea to map it as amenity=library;cafe
In both examples, if you use ;
in the amenity value, then that isn't going to show up in a "find my nearest cafe" mobile app any time soon. Even though it is entirely possible for systems to parse the value, and split it by the ;
character, almost all existing systems don't.
Syntax details
Space character padding
Normally, the delimiter is a semicolon without a space before or after, for example, ref=B500;B550. However, the opening hours and conditional restrictions syntaxes require a space after the semicolon.
An older tagging style placed a space character after each of the ;
characters in other contexts for human-readability, for example, ref=B500; B550. Potlatch automatically introduces a ;
when merging two ways. [1][2][3] However, this usage became insignificant by 2013. [4] iD automatically removes a space after a semicolon, except in keys that require it. This is currently an inconsistency between JOSM and Potlatch (both versions) in their approach to automatic value separating.
Escaping with ';;'
If a semi-colon exists in the actual value of the data mappers should enter it as two consecutive semicolons ;;
. This is an "escape character" approach used in computer programming and data formats. As this situation pretty much never occurs, it's really only mentioned here as a curiosity. Very few tools that use OpenStreetMap data will understand this.
Older separators
Prior to a community consensus on the use of the semi-colon ;
several other characters were suggested to separate values. These included: "/" (solidus), " " (space), "-" (hyphen), and "#" (number sign). The semicolon is now widely accepted as the character to use, and is supported by Potlatch and JOSM. Older variants can now be replaced.
Software support
Supporting CSV lists in software is not complicated, it mostly requires some text processing by handling of substrings and regular expressions which is available in every programming language. However it needs to be implemented proactively by the developer, thus it can only expected to be implemented when the usage is reasonably expected.
Data consumers
Query tools
- The current Overpass Query Language supports CSVs in tag values by
- supporting regular expressions with the tilde
~
operator, e.g. when searching a sub-string is unambiguous,node["cuisine"~"italian"]
finds a cuisine=german;italian;mexican - List Represented Set Operators such as
lrs_in
, e.g.way[highway=primary](if:lrs_in("B 1",t["ref"]));
finds all primary ways tagged with ref="B 1" including ref="B 1;B 5" (but excluding ref="B 158" that would be found with a simple substring queryref~"B 1"
), see LRS for further explanation.
- supporting regular expressions with the tilde
- The historic XAPI, retired in 2017 (development ceased in 2012), apparently did not support regular expressions and substrings, causing users difficulties handling CSVs in the past.
Renderers
- OSM Carto as the style for the general map focuses on primary tags which rarely have CSVs. For the road shields generated from the ref=* tag, the values are pre-processed in SQL, replacing semicolons with a newline character, so that the individual refs show in separate lines on the shield.
- Mapbox Streets replaces
;
with a spaced em dash (—
) in any name=* or name:*=* tag. For primary keys such as amenity=* or shop=*, it considers only the portion up to the first semicolon and drops the rest. - MapQuest Open used to interpret ref=* by placing each semicolon-delimited value on a separate shield (however free access to open tiles has been discontinued in 2016).
- OsmAnd supports CSV lists correctly in the following examples:
- In the map view, it alternates the different ref=* values on road shields.
- For refs in the "current road" widget and navigation instructions, it replaces the semicolon with a comma and space for good readability.
- It sends such comma to the text-to-speech engine to allow voice structuring of multiple refs.
- It parses complex opening hours, presents them in a convenient form and calculates from the current time if the facility is open or closed.
- It parses the turn:lanes and presents them in graphical form; while navigating it highlights the lane to choose.
- It nicely reformats the cuisine of a restaurant with multiple values, i.e. showing cuisine=german;italian;mexican as "German • Italian • Mexican".
- The Mapbox Directions API returns text and voice instructions that include the first or most relevant road name, ref, and destination and omit the rest from the sentence for brevity. The omitted names, refs, and destinations remain in other fields.
OSM Editors
Editors for OpenStreetMap data need a process to handle different values of the same key, when two or more objects are merged.
- iD prevents you in some cases from merging two elements with different values for a key (e.g. for highways). For some other tags it merges them using the semi-colon (e.g. leisure=park and leisure=water_park).
- JOSM presents the user a modal warning box that values are conflicting, followed by a dialogue to resolve the conflict by choosing a particular value. Only when the user explicitly selects to keep "all" tag values, they are merged into a CSV list.
- Potlatch 1 (maintained until 2010), and Potlatch 2 (maintained until 2011) all join tag values with semicolons when merging ways which have tag with the same keys. In most cases this creates invalid tagging and needs to be manually replaced by a single, valid value.
Alternatives
If you're proposing a new scheme which would seem to require values splitting with semicolons, consider converting it to multiple tags with yes/no values.
Simple "yes/no" tags
Most "properties" or "attributes" of features are described with a simple key, without namespacing:
- lit=yes/no - to specify whether a street or parking lot is lit at night lit=yes/no is added
- oneway=yes/no - to specify whether a highway is oneway oneway=yes/no is added to highway=*
- drive_through=yes/no - specifies whether a feature such as a bank or restaurant offers drive-through service
Namespaced tags
It can be helpful to use a namespace if the property or attribute needs to be specifically related to a single feature, however, this isn't always necessary.
For example, a hypothetical scheme for describing the books and items a library offers could be expressed as:
- amenity=library
- library:stock=books;newspapers;recorded_music
But it's probably better to rewrite the scheme to express the concepts as:
- amenity=library
- library:stock:books=yes
- library:stock:newspapers=yes
- library:stock:recorded_music=yes
payment=* and fuel=* are good examples of this second approach. Boolean-valued tags such as these can be extended with extra values later on if necessary, or even sub-namespaced meaningfully.
Relations
Relations inherently support many-to-many relationships. For example, a highway=* way can be a member of multiple type=route relations. Each of these relations can store information about the respective routes in structured format, avoiding the need to devise a nested syntax for this information within the highway=* way's semicolon-delimited ref=* tag.
Other uses of semicolons
Occasionally, semicolons are used for purposes other than delimiting the values in a list:
- census:population=* uses a semicolon to delimit the population from the census year.