Data items

From OpenStreetMap Wiki
Revision as of 10:24, 17 October 2021 by Minh Nguyen (talk | contribs) (→‎How to help: Documented items w/o sitelinks)
Jump to navigation Jump to search

Data items are a way to document all OSM metadata like keys and tags in every language on this wiki in a structured way, useful to both humans and tools.

  • Tools, such as iD editor and Taginfo are now able to get tag information by parsing data items instead of parsing the wiki markup. Eventually the data may include translations, tag suggestions, validation rules, common pitfalls, presets, and more.
  • Data consumers are able to get structured metadata to help process main OSM database.
  • This wiki can now show data as info cards and tables, without information duplication and complicated template hackery.
  • All metadata can be analyzed using Sophox queries (see query examples) (Sophox is down as of December 2020[1]).

This page documents how to store structured tag metadata on this wiki using data items provided by the Wikibase extension - the same software that runs Wikidata (initial discussion). This project's goal is NOT to replace the OSM Wiki pages (including infoboxes), nor to use opaque IDs instead of the human readable key=value strings to tag features. We are only trying to improve metadata documentation, making it more useful to various tools.

Where to find them

Data items are located in the Item: namespace. Each data item has a page title consisting of a "Q" followed by a numeric ID.

Every Key: and Tag: page has a corresponding data item. Follow the "Data item" link in the "Tools" section of the page's sidebar, or enter the page's title into Special:ItemByTitle/wiki/.

How to help

  • Add tag descriptions and translations. See the following 3 minute video.
How to add tag descriptions
Looking for volunteers...  click expand -->
Add descriptions and translations
Community and content
  • Set up a wiki portal, possibly similar to Wikidata's community portal (but simpler), where community can:
    • propose new properties
    • write guidelines/docs
    • discuss Wikibase data structures
  • Create Lua modules to generate tag tables, such as {{Template:Bridge:movable}}, {{Map Features:highway}}, or {{Template:Religions}}.
    • Implementation note: Wikibase only links Tags to the corresponding Key, but Keys do not list all possible Tags. To generate a table, we must have a list of items somewhere. We could create a new WB key property that lists all tags, and use a bot to maintain it, or we could list all needed tags as a template parameter, e.g. for highway, {{...|motorway|trunk|primary|secondary|...}}. List as a template parameter does not need to be localized, and it could specify proper ordering of items (not available in WB). Lua code would use mw.wikibase.getEntityIdForTitle("Key:highway=motorway") to find the right data.
Technical
  • Add Wikibase support to external tools. Simple usage: get key/tag localized description. Complex usage: allow user to add missing or even edit description, especially when user is creating a new key.
  • Port simple validation rules, e.g. regex-based, to use Wikibase data.
  • Help parse various tables of tag data. Even if you can only generate plain files with data, user:Yurik can quickly import them.
tasks in progress
done!

Tag keys

Each OSM Key is stored as a separate page in the Item namespace. For example, see bridge:movable (Q104) that describes a bridge:movable=*:

property type value example description
description string en - The mechanism by which a movable bridge moves to clear the way below.
uk - Механізм, що приводить в дію рухомий міст для вивільнення шляху під ним.
This is the primary way to describe the key using proper sentences that end with a period, and whose first word is capitalized. Must not contain any wiki markup or HTML. Must be less than 250 symbols. When translating, it is usually enough to add just the description to the item. Any key:... and tag:...=... will be automatically shown as links.
label string en - bridge:movable Label usage is still a bit undecided for the key/tag data items, so it is best not to use it for anything. No longer active bot was seting the English label to the key's value, exactly the same as P16 below. Some languages have nativekey (localized key) that was added to the labels as well. Do not add a copy of the English label to any other languages. Note that same as "en", the localized label must be unique in that language.
sitelink string Key:bridge:movable Links to the Key:... pages, even if the page does not exist. The sitelink is shown on top of the page on the other side as the page title.
instance of (P2)
that class of which this subject is a particular example and member (subject typically an individual member with a proper name label); different from P3 (subclass of)
Item key (Q7) Indicate the type of the item. Set to Q7 for keys.
permanent key ID (P16)
A string representing the key ID. Once set on a key data item, this value should never be changed.
string bridge:movable Shows the exact form of the key as used in OSM. Must never be changed once the item is created. Due to technical limitations, keys "Key:water tap", "Key:water_tap", and "Key:water_tap_" have identical wiki pages/sitelinks - "Key:water tap". In this case, set this property to multiple strings, but mark one as Preferred rank "Preferred rank".
applies to nodes (P33)
Indicates if this key or tag should be used on nodes. Use P26 qualifier to limit to a specific language region. Use P11 reference to link to discussions.

applies to ways (P34)

Indicates if this key or tag should be used on ways. Use P26 qualifier to limit to a specific language region. Use P11 reference to link to discussions.

applies to areas (P35)

Indicates if this key or tag should be used on areas (closed ways incl. multipolygons). Use P26 qualifier to limit to a specific language region. Use P11 reference to link to discussions.

applies to relations (P36)

Indicates if this key or tag should be used on relations (excluding multipolygon relations). Use P26 qualifier to limit to a specific language region. Use P11 reference to link to discussions.
Item is applicable (Q8000)
or

is not applicable (Q8001)

Sets if this key is allowed on nodes/ways/areas/relations. In the future we may want to use other statuses like approved (Q15), but this is not yet supported. See also limiting per locale below.
image (P28)  [ image caption (P47) ]
Image of a relevant illustration of the subject. (without "File:")
[A qualifier to add to an image to specify image caption in a specific language.]
string Noexit.jpg An image stored either on Wikimedia Commons (preferable) or on the OSM wiki, without the File: prefix. If people in different regions would only recognize the key when illustrated using different images, add multiple images and qualify each image with limited to region (P48). If the key's image contains anything language-specific, add another value and set limited to language (P26). Make sure to set Preferred rank Preferred rank status for the default image. Use image caption (P47) qualifier to indicate image caption for any language (will show English if not found, or any other if EN does not exist).
group (P25)
Indicates which group the given tag or a key belongs to. Target must have instance-of = group.
Item bridges (Q4712) The group this item belongs to. In the current model, each key belongs to just one group. In theory we could use it to attach multiple groups, changing the meaning of the "group" to something like a "label"/"meta-tag".
status (P6)  [ proposal discussion (P11) ]
Community acceptance status. Use reference to link to the proposal discussion page (P11).
[Link to the key or tag proposal page. Can be used as reference for status (P6).]
Item approved (Q15)
  reference link
community's approval status, together with a reference link to the discussion page (optional)
key type (P9)
Type of the key entity, e.g. enum, external id. Do not use this for groups or statuses.
Item well-known values (Q8) Describes the type of values this key is expected to have. If there is a well known list of values, use Q8. Other types are TBD.
Wikidata concept (P12)
this key or tag represents a concept described by the given Wikidata item more precisely than any other key or tag
Item Q787417 A link to the Wikidata item, stored as an external ID (string). Must be a Q-number.
value validation regex (P13)
Regular expression to test the validity of the tag's value. May also be used for role names. The wrapping ^( and )$ are assumed. Do not use for enum-like values, e.g. noexit=yes should be a tag, not a regex.
string [0-9]+ A regular expression that can be used to validate the value of this key. In this case the value must be one or more digits. Validators will require the entire expression to match the string, i.e. they will add ^( in front and )$ at the end.
See population (Q574) example.
documentation wiki pages (P31)
Wiki pages for this data item in different languages. There should be no more than one value per language. Use Special:SetSiteLink to set the item’s “wiki” sitelink to the English page title.
multilingual string Key:bridge:movable (English)

Cs:Key:bridge:movable (Czech)
...

Each value is the name of the wiki page in a specific language.

Note: After setting this property, use Special:SetSiteLink to set a sitelink to the English page name on the site wiki.

implies (P45)
Any feature with the current key/tag also implies this tag, even if they are not explicitly set.
Item access=no (Q4822)

Tag values

For keys like Key:highway, there is a list of the well-known values such as highway=residential, highway=service, highway=footway. These values are stored similarly to keys. See bridge:movable=bascule (Q888) that describes a bridge:movable=bascule. See all items that link to bridge:movable.

property type value example description
description string en - A type of movable bridge, a bascule bridge contains one or two spans, one end of which is free and swings upwards. A counterweight at the pivoting end of the span or spans balances the weight as the free end rises.
pl - Most zwodzony jest to rodzaj mostu w którym co najmniej jedno przęsło jest podnoszone. Mosty zwodzone mogą być jedno- lub dwuskrzydłowe.
This is the primary way to describe a tag using proper sentences (first word capitalized, ending with a period). Must not contain any wiki markup or HTML. Must be less than 250 symbols. When translating, it is usually enough to add just the description to the item. Any key:... and tag:...=... will be automatically shown as links.
label string en - bridge:movable=bascule Label usage is still a bit undecided for the key/tag data items, so it is best not to use it for anything. No longer active bot was setting the English label to the tag's value, exactly the same as P19 below. Some languages have nativekey=nativevalue (localized key/value) that was added to the labels as well. Do not add a copy of the English label to any other languages. Note that same as "en", the localized label must be unique in that language.
sitelink string Tag:bridge:movable=bascule Links to the Tag:... pages, even if the page does not exist. The sitelink is shown on top of the page on the other side as the page title.
instance of (P2) Item tag (Q2) Indicate the type of the item. Set to Q2 for tags.
permanent tag ID (P19) string bridge:movable=bascule Shows the exact form of the tag as used in OSM. Must never be changed once the item is created. Due to technical limitations, tags "Tag:water tap=yes", "Tag:water_tap=yes", and "Tag:water_tap=yes_" have identical wiki pages/sitelinks - "Tag:water tap=yes". In this case, set this property to multiple strings, but mark one as Preferred rank "Preferred rank" (small up arrow left of value).
key for this tag (P10) Item bridge:movable (Q104) Every tag item links to the corresponding key item, making it easier to query and validate.

Tags may also use applies to nodes (P33), applies to ways (P34), applies to areas (P35), applies to relations (P36), image (P28), group (P25), status (P6), value validation regex (P13), documentation wiki pages (P31), implies (P45). See their description in Tag Key section above.

Relations

Similar to keys and tags, here is an example restriction relation (Q16054) copied from Relation:restriction.

property type value example description
description string ... This is the primary description of the relation, using proper sentences that end with a period with the capitalized first word. Must not contain any wiki markup or HTML. Must be less than 250 symbols. When translating, it is usually enough to add just the description to the item. Any key:... and tag:...=... will be automatically shown as links.
label string en - restriction relation Short relation description. Do not add a copy of the English label to any other languages.
sitelink string Relation:restriction Links to the corresponding Relation:... wiki page, even if the page does not exist. The sitelink is shown on top of the page on the other side as the page title.
instance of (P2) Item relation type (Q6) Indicate the type of the item. Must be set to Q6 for relations.
permanent relation type ID (P41)
A string representing the relation type ID, e.g. "route". Once set on a relation data item, this value should never be changed.
string restriction Shows the exact form of the relation type as used in OSM. Must never be changed once the item is created. Due to technical limitations, sitelinks "Relation:destination sign", "Relation:destination_sign", and "Relation:destination_sign_" have identical wiki sitelink - "Relation:destination sign". In this case, set this property to multiple strings, but mark one with a Preferred rank Preferred rank.
tag for this relation type (P40)
For a given relation item, links to the corresponding tag item, e.g. type=multipolygon.
Item type=restriction (Q16013) Every relation item links to the corresponding type=* tag item, making it easier to query and validate.

Relations may also use image (P28), group (P25), status (P6), documentation wiki pages (P31). See their description in the Tag Key section above.

Relation roles

Members of the relation could be labeled with "roles", e.g. "inner" and "outer" ways in the multipolygon relation. Each role for each relation type has its own data item. Example for boundary=admin_centre (Q16060).

property type value example description
description string ... This is the primary description of the relation role, using proper sentences that end with a period with the capitalized first word. Must not contain any wiki markup or HTML. Must be less than 250 symbols. When translating, it is usually enough to add just the description to the item. Any key:... and tag:...=... will be automatically shown as links.
label string en - boundary admin center role Short relation role description. Do not add a copy of the English label to any other languages.
sitelink string Relation:boundary=admin centre Links to the Relation:<relation>=<role> wiki page, even if the page does not exist. If the role is empty, use Relation:relation=  form. If the role has a variable portion, e.g. route=platform:<number>, set sitelink to the fixed part -- Relation:route=platform:, and use value validation regex (P13) to validate the variable part. The sitelink is shown on top of the page on the other side as the page title.
instance of (P2) Item relation member role (Q4667) Indicate the type of the item. Must be set to Q4667 for the relation member roles.
permanent relation role ID (P21)
A string in a "relationtype=role" format. Should only be set on relation role items. Once set on a role item, the value should never be changed.
string boundary=admin_centre Shows the exact form of the relation role as used in OSM but preceded by "<relationtype>=". Must never be changed once the item is created. Due to technical limitations, sitelinks "Relation:boundary=admin_centre", "Relation:boundary=admin centre", and "boundary=admin_centre_" have identical wiki sitelink - "Relation:boundary=admin centre". In this case, set this property to multiple strings, but mark one with a Preferred rank Preferred rank.
belongs to relation type (P43)
For a given relation role (e.g. "inner"), links to the corresponding relation type (e.g. "multipolygon")
Item boundary relation (Q16019) Every relation member role links to the corresponding relation item, making it easier to query and validate.
value validation regex (P13)
Regular expression to test the validity of the tag's value. May also be used for role names. The wrapping ^( and )$ are assumed. Do not use for enum-like values, e.g. noexit=yes should be a tag, not a regex.
string platform:[0-9]+ A regular expression that can be used to validate the variable part of the role. In this case the value must be one or more digits, e.g. for the route=platform:<number> role. Validators will convert regex expression into the ^(platform:[0-9]+)$ form (for the given example).

Relation member roles may also use applies to nodes (P33), applies to ways (P34), applies to areas (P35), applies to relations (P36), image (P28), group (P25), status (P6), documentation wiki pages (P31). See their description in the Tag Key section above.

Storing geographical differences

A phone booth looks very different depending on the geographical region, e.g. a country. To indicate that an image, or any other value of the data item is specific to a location, use limited to region (P48) qualifier with a geographical region item.

A geographical region item is a data item with the instance of (P2) = geographic region (Q19531), and it contains a geographic code (P49) property set to one or more country codes.

The limited to region (P48) qualifier should eventually replace the limited to language (P26).

Storing locale differences

Most translated Key:... and Tag:... pages tend to have mismatching parameters like status, group, or the types of elements it should be used on. While some were deliberate results after a careful local community evaluation (see noexit (Q501)), many other cases are simply stale and need to be fixed, or possibly removed from the template's parameters to let it use the underlying data item.

All locale differences are stored using limited to language (P26). The value with no qualifiers is the default. It should have the Preferred rank Preferred rank, but it is OK to keep Normal rank Normal rank when there are no other values for the property. All language-specific values must use limited to language (P26) and have Normal rank Normal rank. Each value must be used only once, possibly with multiple qualifier values (e.g. a property access:lhv (Q33) can have only one is applicable (Q8000) and one is not applicable (Q8001)). Each language qualifier can only be used once for the whole property. Language must not be listed if it is the same as the default. If there is no value without qualifiers, it means that the default is not set (e.g. English page has no onRelation= parameter).

Property Rank Value Qualifier Meaning
group (P25) Preferred rank bridges (Q4712) no qualifiers This value is used for English page and all other language pages except those explicitly listed below.
Normal rank properties (Q4671) limited to language (P26)
Italian-language documentation (Q7798)
Finnish-language documentation (Q7791)
This value is only used for the Italian and Finnish pages.
Normal rank placement (Q4707) limited to language (P26)
Czech-language documentation (Q7785)
This value is only used for the Czech pages.

Meta item

There are many data items which are neither a Key nor a Tag:

OSM Concepts
element (Q9), key (Q7), tag (Q2), status (Q11), group (Q12)
Statuses of type status (Q11)
de facto (Q13), in use (Q14), approved (Q15), rejected (Q16), voting (Q17), draft (Q18), abandoned (Q19), proposed (Q20), obsolete (Q5060), deprecated (Q5061), discardable (Q7550), imported (Q21146)
Statuses of type applicability of a key/tag to a specific element type (Q8010)
is applicable (Q8000), is not applicable (Q8001)
Special
OpenStreetMap concept (Q10), sandbox (Q2761)

Item creation process

A bot has created all significantly used keys and tags, and continued creating these items when they were detected in the OSM database (taginfo API) or on the wiki. The bot used to:

  • create an item for any key with 10+ usages if it matches ^[a-z0-9]+([-:_\.][a-z0-9]+)*$ (i.e. sequence of one or more words separated by single dashes, colons, underscores or periods, where words contain only lower case English letters and numbers), or for any 1000+ usages regardless of the key syntax (see talk page)
  • set item's label to be the same as the key
  • set item's description from the corresponding wiki page's info card (if available, from all languages)
  • set used-on, recommended tags, implies, and any other easy-to-figure-out data from the info cards.
  • will NOT update any fields modified by a user, e.g. if description in FR has been changed by a user, it should not be changed by the bot.

Eventually, it would be better for OSM tools (iD, JOSM, ...) to ask the user for the metadata, and use MW API to create new items.

Current methods for creating new items

  • Special:NewItem -- please use Wiki search in namespace "Item" only (do not prefix with `Key:` or `Tag:) and Special:ItemByTitle/wiki/ (use the `Key:` or `Tag:` prefix, e.g. `Key:highway` or `Tag:highway=residential`) to do a search so that duplicate items are not added.
  • The JOSM `​osmwiki-dataitem` plugin -- right click on the tag entry, `Create new WikiData item for <key or tag>`, and follow the prompts. If only `Open WikiData for <key or tag>` is visible, then the item already exists in wiki data or the key is not marked as a well-defined key (for example, `note` or `name` will not be marked as `P9=Q8` (key type=well defined)). The wikidata plugin currently only supports `Tag` and `Key` types.

While the JOSM wikidata plugin is currently in alpha, it does add more information than the Special:ItemByTitle/wiki page, so it may be easier and better to use the plugin over the special page at this time.

In the event that Special:ItemByTitle/wiki is used, please add the following information:

Item deletion process

Data items can be deleted manually by administrators. They might delete a Data Item if

  • there is no regular wiki page here (aside from discussions or user pages) that describe a key/tag/rel/... and
  • there is no proposal associated with the Data Item and
  • the Data Item does not qualify for creation according to the Item Creation Process.

API access and querying

  • The easiest way for an external tool to get all the data about a key is to use this API call:
https://wiki.openstreetmap.org/w/api.php?action=wbgetentities&sites=wiki&titles=Key:bridge:movable&languages=en|fr
Use languages to filter labels and descriptions to the needed languages.
Add &format=json&formatversion=2 to get the actual JSON instead of HTML.
Due to MediaWiki limitations, the titles value should be ("Key:" + key).replace('_', ' ').trim(). Use permanent key ID (P16) to get the actual format of the key. Make sure to get the "preferred" value, just in case more than one value is present.

Tracking changes to data items

To track changes to data items, you can add a data item to your Watchlist like any other page on the wiki. You can also configure your Watchlist to automatically include changes to the data item associated with any wiki page you are watching, by opening the "Filter changes (use menu or search for filter name)" dropdown and checking the "Data item edits" checkbox. To make these changes permanent, click the bookmark button or checking the "Show data item edits in your watchlist" checkbox in your watchlist preferences.

Changes to data items are included in Special:RecentChanges, Special:RecentChangesLinked, and Special:Watchlist by default. To filter out all edits to data items, click the  Namespaces button in the filter panel, check "Item", and click "Exclude selected". To filter out changes to labels, descriptions, or aliases but include changes to statements, click the  Tags button in the filter panel, check "Data item terms", and click "Exclude selected". (This latter filter is useful for ignoring translations and is implemented via an "abuse filter" and corresponding tag.)

To make any of these changes permanent, click the bookmark button.

Quality control

There are several additional extensions designed to validate Wikibase data, and find items that do not pass validation. Installing such capabilities may not be done in the first deployment stage.

Note that due to limitations listed in section below watchlisting data items is dysfunctional. As result, lower number of editors monitors data items. In case of conflict between data item data and data on OSM Wiki page in the infobox, it is very likely that data item is incorrect.

Limitations and known issues

  • Wikibase's "Commons File" properties do not yet support files stored on this wiki. Instead, we use a regular string property to store the image name, and use a gadget (see your preferences) to show strings as images.
  • The sitelink in the upper right corner does not show whether the Tag:* or a Key:* page exists or not.
  • All sitelinks must use spaces instead of underscores. API sitelink search does not work otherwise. See permanent key ID (P16) and permanent tag ID (P19) for the correct value. Note that regular Mediawiki Key:* and Tag:* pages have the same issue, and use a special hack to change the title.
  • MediaWiki removes spaces/underscores from the key, so Key:_abc_ would become Key: abc. There are no way to have two items with sitelinks Key:_abc and Key:_abc_ -- they are treated as the same, and fail.
  • Date item titles and talk page titles are not human readable, e.g. Item:Q5007 vs Tag:amenity=shelter
  • Item "Q" numbers collide with those used in the main Wikidata site, e.g. wikidata Q5007 vs dataitem Q5007). Despite numerous attempts by OSM community, and a working implementation by Yurik, the fix for this issue was declined by the maintainers of the Wikibase software. They suggest that the distinction should be done via a prefix.
  • The Wikibase software and the Wikidata project are sometimes confused with these data items, which in the past have been called "the wikibase" or "wiki data items" by some users.

Watchlist

  • Adding a data item to the Special:Watchlist will generate more watchlist notification, including all language translations. There are ways to work around it (TODO: add watchlist instructions).
  • Because of this, many experienced wiki users and mappers are not following the data item pages, so any mistakes are less likely to be fixed as quickly as mistakes on Tag and Key wiki pages.

Editing

The current data item editing experience should be improved, especially in these areas:

  • Most data items were created by bot, and used to be updated by bot, unless a human user has edited the item. The bot source code is available, but undocumented.
  • Editing data item should be made simpler by a direct edit button from the key/tag/relation page, without navigating to the data item page itself. An experimental data item editor can already be enabled in user preferences, but there is more work to be done in polishing this feature. (Source code)
  • Currently there is no bulk-editing interface available, other than to write a bot. We should enable Quick Statements tool to simplify such operations.
  • The current system of having the tag description in two places (wiki page and data items) has been creating some problems. The difference is shown by a small icon, and can also be queried in Sophox (TODO: add query link here).
  • The data item user interface still has some bugs, and the javascript may load slowly on some systems, resulting in a less than ideal experience. Hopefully this issue will be fixed upstream (TODO: link to Phabricator issue)
  • It is not currently possible to copy the data item content, edit it outside browser as text, and copy it back, or to make multiple changes to a data item at one time.
  • In some cases editing an article page (for example to delete a wrong field from the Infobox) will result in the data item taking over. As result, after a user removed the parameter the same wrong data is still displayed. As result people editing infobox parameters now need to also edit data items.[2]
  • Editing data item is not resulting in immediate update of wiki page, ?action=purge is needed what is confusing for a typical user (especially in cases mentioned above where wiki page was polluted by invalid data from data item)

History

  • Use of data items to generate infoboxes results in confusing view of old page state, as they will reflect not state of data item at time of edit but current data item state

Blanking infoboxes

It is technically feasible to delete parameters from infoboxes on OSM Wiki pages and display infobox using just data item. But there is no support for doing this kind of edits. There may be support for such action among people translating a given language.

See also

References