Import/Catalogue/AddressImport RAFVG
About
This page talks about importing addresses using the data provided by Regione Autonoma Friuli Venezia-Giulia (RAFVG) (Italy), which includes 196 municipalities, roughly 433K nodes.
The import has been discussed on the regional OSM mailing list. This wiki page is the result of consensus there.
Goals
This import goal is to use the dataset provided by RAFVG in order to improve the addresses available in OSM. It will not be a complete blind import: whereas possible, data will be checked by local mappers.
Schedule
Imports will be performed after local community revision of shared .osm files linked on the italian wiki page Elenco dei Comuni table; progress will be trackable herein.
Two pilot imports have been performed. Municipalities were choosen because of their limited size.
Stregna
Changeset 25763431; approx. 300 nodes.
Issues raised due to missing on-the-ground highway names. Solution is being discussed about using OSM tag addr:place
instead of addr:street
Sacile
Changeset 28443255; approx 5.000 nodes
An issue raised by OSM Inspector is due to upper/lowercase. Nominatim can handle, anyway, but QA tool requires exact matching in case.
Other issue due to error in wiki and missing discussion; consequently tag value separator ";" has been used instead of "/". Wiki corrected, TODO separator replacement.
Import Data
Background
Address format
House numbering follows the European scheme.
Address in RAFVG is determined by its streetname, housenumber; where present, subordinate and in-house values are included.
Subordinate is mostly noted with suffix letter, but can be any alphanumeric (e.g. "A", "A3", "Z7"). Subordinates usually arise when a new house is build between existing houses with subsequent housenumbers. In-house is mostly noted with suffix numbers.
RAFVG dataset table structure is detailed in the italian wiki page.
The postal codes (AKA codici di avviamento Postale, CAP) are not included in RAFVG dataset; each address will inherit its municipality postcode as from national Indirizzi della PA IODL 2.0; derived filtered content has been quoted in the italian wiki page. Sole exception is Trieste municipality, which spans more than one postcode, will be imported in a separate process.
Legal
- Data source site: http://irdat.regione.fvg.it/consultatore-dati-ambientali-territoriali/
- Data license: IODL 2.0
- Type of license: "dato pubblico"
- ODbL Compliance verified: yes
Import Type
The dataset will be imported on a municipality base.
Due to upload constraint, high density areas will be splitted where needed [TBD 50K node constraint?].
Prior to upload, each .osm will be published in the italian wiki page to be manually checked by local team.
Data Preparation
The data is presented as a shapefile. This shapefile consists in a collection of punctual elements, one for each housenumber. Projection is Gauss-Boaga.
Prior to OSM XML conversion, some issues require intermediate shapefile's to be generated. Actions needed are:
- re-projection to WGS84 CRS (ogr2ogr)
- geometry type from multi to single (ogr2ogr)
- municipality extraction (ogr2org)
- record standardization (TBD SQL tool)
Record standardization
To minimize conflation work due to no-match errors with existing OSM odonyms, a replacement is being performed. Most of replacements are:
- first name expansion (i.e "P. DIACONO" or "DIACONO P." > "PAOLO DIACONO")
- abbreviation expansion (i.e. "P.LE" > "PIAZZALE")
- latin numbering conversion (i.e. "VII" > "SETTE")
- accent and apostrophe checking
Replacements have been previously compiled by team work and concern both SPECIE and DENOMINAZI. Replacement table will be published.
Tagging Plan
Each node has the keys (in bold used fields):
- COD_ISTAT: municipality code, [www.istat.it/ ISTAT] defined
- NOME_COMUN: municipality name
- ID_STRADA: street ID
- SPECIE: Toponym type (AKA "Denominazione Urbanistica Generica")
- DENOMINAZI: street name
- NUM_CIV: house number
- BARRATO: house number, subordinate
- INTERNO: house number, in-house
- DATA_AGG: record modification date
- DATA_INS: record creation date
- X: WGS84/ETRS89 longitude
- Y: WGS84/ETRS89 latitude
- ID1: unique id (unique, used for indexing)
The intermediate shapefile will be converted to OSM XML using ogr2osm. Ogr2osm translation file will manage:
- selective uppercase (i.e. "VIA DEL LAVORO" > "Via del Lavoro", "VIA GIACOMO LEOPARDI" > "Via Giacomo Leopardi")
- tag mapping
Tag mapping for final upload will assign:
addr:housenumber
< NUM_CIVaddr:street
< SPECIE | DENOMINAZIaddr:postcode
(feeded by Elenco dei Comuni)addr:city
< NOME_COMUNsource
= "RAFVG"
Conflation
Will be performed thru JOSM. Existing OpenStreetMap data will be merged thru semi-automatic conflation plugin. Such procedure has been detailed in the italian wiki page
Dedicated upload account
The account RAFVG import will be used to upload revised .osm files.
Changeset Tags
Changeset will be tagged with:
Data Translation
Ogr2osm will be used to convert the shapefile to OSM XML format using the above tagging plan.
Source scripts for ogr2osm will be stored at https://github.com/rafvgimport/translations
Data Transformation Results
OSM XML files repository: https://github.com/rafvgimport/osm
Data Merge Workflow
Addresses already in OSM will be extracted using the Overpass query herein defined.
Addresses already present will be kept.
Team Approach
Import will be managed by the following OSM users:
- Cascafico
- marcodena
- Stefano Salvador
- Marco_T
- damjang
- Bredy
Workflow
Step by step instructions:
- Run ogr2ogr to reproject and extract nodes inside municipality
- Perform record standardization (QGIS, SQLITE or similar)
- Run ogr2osm to export the data in OSM XML
- Run overpass query to export the existing addresses
- Merge 2 and 3 addresses in JOSM
- Upload the changeset in OSM
The changeset should be small enough to be uploaded at once.
In case of import problem the changeset will be reverted using the JOSM Reverter Plugin
Conflation
See #Data Merge Workflow.
QA
Street names
After the import, addr:street names could be slightly different than street names.
These differences should be catched using OSM Inspector.
Unmarked streets
The result can be used to locate areas where streets are missing.
Missing roads will be created in JOSM using PCN 2012 areal images.
Unnamed streets
The result can be used to derive street names for unnamed streets when all the nodes along the street has the same addr:street value.
Missing road names will be identified using the OpenStreetMap NoName Map Overlay:
tms:http://tile3.poole.ch/noname/{zoom}/{x}/{y}.png
OSM Inspector can also be used to find these streets.