Automated edits/MaliMrav-Bot

From OpenStreetMap Wiki
Jump to navigation Jump to search


Introduction

Hi, I am MaliMrav, and you can contact me using https://openstreetmap.org. My script/bot account MaliMrav-Bot will be used to make automated edits to map data, that would be otherwise quite difficult to create manually.

I will respect Automated edits Code of conduct.

Problem statement (before edits)

I noticed several problems with OSM data and OSM use in Serbia. Those include, so far:

  1. unintentional mixing of Cyrillic and Latin alphabet
  2. search not working properly in Organic Maps app

My intention is to try to solve this problem, using both manual (from my main account MaliMrav) and automated edits (from account MaliMrav-Bot).

Background

Naming used in Serbia usually follows this pattern:

  • name: Serbian Cyrillic or Latin name
  • name:sr: Serbian Cyrillic name
  • name:sr-Latn: Serbian Latin name
  • name:en: English name
  • int_name: Serbian Latin without diacritics

Not all tags are available in all elements.

Unintential mixing of Cyrillic and Latin

So far, there are several problems with mixing of Cyrillic and Latin:

  1. Since some letters in Cyrillic and Latin alphabets have the same glyphs, people sometimes confuse them, and unintentionally mix them. For instance one street was entered as "Ive Andrića", but was later corrected to "Ivа Andrića". First letter "a" is, in this real exmaple unintentionally written in Cyrillic.
  2. Some people enter Latin name in "name:sr" tag, and/or cyrillic name in "name:sr-Latn" tag by mistake.

Search not working properly working in Organic Maps app

Some people in Serbia use Cyrillic keyboards on their phone, while other use Latin (some use both). Data in OSM is available partially in Cyrillic and partially in Latin alphabet. Organic Maps does not index name:sr-Latn tags, neither transliterate the other tags automatically.

As a result, some elements in OSM database can only be found with search strings in Cyrillic, and some can only be found with search strings in Latin. You cannot know upfront if an element can be found in Cyrillic or in Latin alphabet. This leads to users not being able to find a lot of elements.

To mitigate this problem, the idea is to make sure that we have both Cyrillic and Latin names (wherever possible) in appropriate tags.

The majority of work will be to populate int_name tags on all named elements that does not have it.

Technical description

  1. PBF export from Serbia is obtained from Geofabric.
  2. It is imported to local database using osm2pgsql or Osmosis.
  3. Using SQL scripts I find elements I want to edit. I store them in a specially formatted table, that contains type column (node/way/relation), osm_id column, and columns that matches name tags, with prefixed "old:" and "new:".
  4. One script can export part of that table into TSV file.
  5. TSV can be sent to overview or approval if necessary. It can also be manually edited.
  6. Another script reads TSV file, and for every row it reads the element of given type and id. It then checks if all values in columns with prefix "old:" (e.g. "old:int_name") matches appropriate tags (e.g. "int_name") in the record, and if so proceeds with the replacement. All values in columns prefixed with "new:" (e.g. "new:int_name") are replaced in the element that was previously read. Meta fields (timestamp, uid, user, changeset) are removed or modified. All elements are added to a changelog.
  7. Changelog is sent to OSM API.

Example Changesets

https://www.openstreetmap.org/changeset/157373687


Forum Announcement

https://community.openstreetmap.org/t/bot-za-dopunu-name-tagova/119610