User:Strainubot
Jump to navigation
Jump to search
This is an automated account. Please contact Strainu if you have any questions.
Import info about Romanian villages from Wikipedia
Purpose of the proposal
This proposal is meant to import information about the wikipedia articles and postal codes for villages in Romania, i.e. add the following tags to the POIs representing villages:
- wikipedia=ro:title
- postal_code=xxxxxx (if it does not exist on OSM and it exists on Wiki, otherwise raise an error if the 2 values are different)
- postal_code:source=Romanian Wikipedia (if we added the postal_code tag)
Definitions
- Wikipedia articles about villages: articles in the Romanian Wikipedia containing the {{CutieSate}} template
- Coordinates available: the Wikipedia article contains at least 1 link to http://toolserver.org/~geohack/geohack.php from where we can extract latitude and longitude with at least 1 degree accuracy.
- wiki-latitude (wlat) and wiki-longitude (wlon) refer to the coordinates (floored to the nearest degree) of a certain village
- Postal code available: the Wikipedia article contains the {{CutieSate}} template and the codpoștal parameter has a valid value (e.g. 080732)
Import process
The import process works slightly different depending on the data available in Wikipedia.
Both coordinates and postal code available in Wiki article
- Extract the wlat, wlon from the wiki page
- Download the bbox=[wlong,wlat,wlon+1,wlat+1] from OSM
- Find a village with the same name (TODO: node or way?)
- If not found, extend the bbox to [wlong-1,wlat-1,wlon+1,wlat+1] and find again
- TODO: if still not found, should I use Nominatim? It seems pretty risky.
- Add the link to Wikipedia and the postal code to the selected node
Only coordinates available in Wiki article
- Extract the wlat, wlon from the wiki page
- Download the bbox=[wlong,wlat,wlon+1,wlat+1] from OSM
- Find a village with the same name (TODO: node or way?)
- If not found, extend the bbox to [wlong-1,wlat-1,wlon+1,wlat+1] and find again
- TODO: if still not found, should I use Nominatim? It seems pretty risky.
- Add the link to Wikipedia to the selected node
Only postal code available in Wiki article
- Search for the article title (usually as: name, county) in Nominatim limiting the results to Romania
- If not found, remove the county name and search again
- If more than one found, log and quit - don't risk anything
- Select the corresponding result (TODO: node or way?)
- Check that the "is_in:county" tag contains the same county as the Wikipedia article
- Add the postal_code=* to the entity
No data available in Wiki article
TODO: should we handle this case? It might lead to some errors, but more than half the articles are in this situation.