BCN Addresses Import
Goals
The goal is to import the addresses from the Barcelona city council database (addr:housenumber, addr:street, addr:postcode).
In Barcelona there are 32782 addresses already in OSM, we'll merge them.
Schedule
- December 12 2016: We started the preparation and the discussion. See the discussion in the talk-es list (in spanish)
- XXXX: Proposal sent to the imports list.
- XXXX: Import started.
- XXXX: QA finished.
- XXXX: Report sent to talk-es and to Barcelona city council open data site.
Import Data
Data description
We'll mix two files: one to get the street names and the other to get the housenumbers and postcodes. There are 171,637 housenumbers with 4,099 different street names.
-Carrerer de la ciutat de Barcelona: http://opendata.bcn.cat/opendata/es/catalog/URBANISME_I_INFRAESTRUCTURES/taulacarrers/
-Direccions postales elementales: http://opendata.bcn.cat/opendata/es/catalog/URBANISME_I_INFRAESTRUCTURES/tauladirele/
We'll use the csv files for processing.
Background
ODbL Compliance verified: YES
The license is CC BY 3.0 (in Spanish). It requires attribution of the source, so all nodes and the changeset will include source=Ajuntament de Barcelona. The Barcelona City Council (Ajuntament de Barcelona) is already on the contributors list.
The city council asked for help using OSM in a council project (decidim.Barcelona). We'll ask for a explicit permission to the data owner (Ajuntament de Barcelona) to get the confirmation that we comply with their attribution clause using their data in OSM if we clearly specify the source and the source:date.
Compatibility with the ODbL was already discussed in talk-es mailing list.
Import Type
The import will be done manually. The conflation between OSM data and AJ BCN data will be done using the Tasks Manager, so it can be assigned to different volunteers.
Data Preparation
Data Reduction & Simplification
As mentioned before, the data files we have chosen is in csv format. Some of the fields aren't relevant and will be ignored.
Tagging Plans
Here are the original fields, their meaning and how they will be converted to the resulting OSM file:
Original Fields | Meaning | OSM tag | Comments |
---|---|---|---|
CODI_VIA | Street ID | ajbcn:street_id=* | We will import this code for the updating processes. |
NUMPOST | Housenumber | addr:housenumber=* | |
LLEPOST | Housenumber complement with letter | addr:housenumber=* | When it is complementary from NUMPOST |
TIPUSNUM | Type of number | Not importable | |
DTE | District | Not importable | |
DIST_POST | Postal district | addr:postcode=* | We'll add the first 3 digits, that are the same for all the city (080) |
SECC_CENS | Not importable | ||
SECC_EST | Not importable | ||
BARRI | Neighbourhood | Not importable | |
ED50_COORD_X | Not importable | ||
ED50_COORD_Y | Not importable | ||
ETRS89_COORD_X | X coordinates (EPSG:25831) | X | Reprojection to EPSG:4326 needed |
ETRS89_COORD_Y | Y coordinates (EPSG:25831) | Y | Reprojection to EPSG:4326 needed |
ORD | Not importable |
Original Fields | Meaning | OSM tag | Comments |
---|---|---|---|
CODI_CARRER | Street ID | ajbcn:street_id=* | We will import this code for the updating processes. |
CODI_CARRER_INE | Internal reference number from the INE (Spanish National Stats Institute) | Not importable | |
TIPUS_VIA | Type of street | Not importable | |
NOM_OFICIAL | Official name of the street | addr:street=* | |
NOM_CURT | Short name | Not importable | |
NRE_MIN | Minimum number of housenumber in this street | Not importable | |
NRE_MAX | Maximum number of housenumber in this street | Not importable |
To all the nodes, we will add the following tags:
- source:date=YYYY-MM-DD (we'll get the modification date from their RDF file)
- source=Ajuntament de Barcelona
Changeset Tags
We will use the following changeset tags:
- comment=#BCN_adresses_import and any notes that the mapper may add.
- source=Ajuntament de Barcelona
- source:date=YYYY-MM-DD (we'll get the modification date from their RDF file)
- url=https://wiki.openstreetmap.org/wiki/BCN_Addresses_Import (this page)
- created_by=JOSM/version (This tag is created automatically by JOSM)
Data Transformation
Data is in csv format. We will:
- Download, mix, transform and reproject the data with this script https://github.com/kresp0/direccionesbcn2osm.sh
#!/bin/bash
# Descarga, transforma y reproyecta las direcciones de Barcelona
# a partir los datos abiertos del Ajuntament al formato XML de OSM.
# Santiago Crespo 2016
# https://creativecommons.org/publicdomain/zero/1.0/
OUT_FILE=direcciones-BCN.osm
TMPDIR=/tmp/direcciones-bcn
ORIG_PWD="$PWD"
rm -rf $TMPDIR
mkdir $TMPDIR
cd $TMPDIR
# Download the rdf with the source:date information:
wget "http://opendata.bcn.cat/opendata/es/catalog/SECTOR_PUBLIC/data-catalog/0/RDF" -O rdf
FECHA_CALLES=`grep -A 3 CARRERER rdf | grep "dct:modified" | awk -F '>' '{print $2}' | awk -F 'T' '{print $1}'`
FECHA_DIRECCIONES=`grep -A3 INFRAESTRUCTURES/TAULA_DIRELE rdf | grep "dct:modified" | awk -F '>' '{print $2}' | awk -F 'T' '{print $1}'`
if [ "$a" != "$b" ]; then
echo "ERROR: FECHA_CALLES y FECHA_DIRECCIONES no coinciden!"
echo "FECHA_CALLES = $FECHA_CALLES"
echo "FECHA_DIRECCIONES = $FECHA_DIRECCIONES"
echo "No sé que poner en source:date"
exit 1
fi
# Download the csv file with the addresses
wget http://opendata.bcn.cat/opendata/es/catalog/URBANISME_I_INFRAESTRUCTURES/tauladirele/ -O direcciones.html
wget `grep csv direcciones.html | grep http | awk -F '"' '{print "http://opendata.bcn.cat"$2}'` -O TAULA_DIRELE.csv
perl -pe 's/ETRS89_COORD_X/x/g' TAULA_DIRELE.csv | perl -pe 's/ETRS89_COORD_Y/y/g' > t ; mv t TAULA_DIRELE.csv
# Reproject from EPSG:25831 to EPSG:4326:
echo '<OGRVRTDataSource>
<OGRVRTLayer name="TAULA_DIRELE">
<SrcDataSource>TAULA_DIRELE.csv</SrcDataSource>
<GeometryType>wkbPoint</GeometryType>
<LayerSRS>+init=epsg:25831 +wktext</LayerSRS>
<GeometryField encoding="PointFromColumns" x="x" y="y"/>
</OGRVRTLayer>
</OGRVRTDataSource>' > direcciones-bcn.vrt
ogr2ogr -lco GEOMETRY=AS_XY -overwrite -f CSV -t_srs EPSG:4326 DIRECCIONES-BCN.csv direcciones-bcn.vrt
# Download the csv file with the complete street names
wget http://opendata.bcn.cat/opendata/es/catalog/URBANISME_I_INFRAESTRUCTURES/taulacarrers/ -O calles.html
wget `grep csv calles.html | grep http | awk -F '"' '{print "http://opendata.bcn.cat"$2}'` -O CARRERER.csv
# Remove the first line
tail -n +2 DIRECCIONES-BCN.csv > t ; mv t DIRECCIONES-BCN.csv
tail -n +2 CARRERER.csv > t ; mv t CARRERER.csv
# Add headers
echo '<?xml version="1.0" encoding="UTF-8"?>' > $OUT_FILE
echo '<osm version="0.6" generator="direccionesbcn2osm.sh 1.0">' >> $OUT_FILE
COUNTER=0
while IFS=$';' read -r -a VIA; do
echo "Procesando: ${VIA[3]}"
while IFS=$',' read -r -a DIRECCIONES; do
# Si CODI_VIA es = CODI_CARRER
if [ "${VIA[0]}" = "${DIRECCIONES[2]}" ]; then
let COUNTER=COUNTER-1
echo ' <node id="'$COUNTER'" lat="'${DIRECCIONES[1]}'" lon="'${DIRECCIONES[0]}'">' >> $OUT_FILE
echo ' <tag k="ajbcn:street_id" v="'${DIRECCIONES[2]}'"/>' >> $OUT_FILE
########## TODO: JUNTAR NÚMERO Y LETRA SI TIENE LETRA
echo ' <tag k="addr:street" v="'${VIA[3]}'"/>' >> $OUT_FILE
NUMERO=$(echo ${DIRECCIONES[3]} | sed 's/^0*//') # Remove leading zeroes
echo $NUMERO
echo ' <tag k="addr:housenumber" v="'$NUMERO'"/>' >> $OUT_FILE
echo ' <tag k="addr:postcode" v="080'${DIRECCIONES[7]}'"/>' >> $OUT_FILE
echo ' <tag k="source" v="Ajuntament de Barcelona"/>' >> $OUT_FILE
echo ' <tag k="source:date" v="'$FECHA_DIRECCIONES'"/>' >> $OUT_FILE
# echo ' <tag k="source" v="Infraestructura de dades espacials de l\'Ajuntament de Barcelona - Geoportal"/>' >> $OUT_FILE
# echo ' <tag k="source" v="Carto BCN / Ajuntament de Barcelona"/>' >> $OUT_FILE
echo ' </node>' >> $OUT_FILE
fi
done < $TMPDIR/DIRECCIONES-BCN.csv
done < $TMPDIR/CARRERER.csv
echo '</osm>' >> $OUT_FILE
cp $OUT_FILE $ORIG_PWD/ && echo "Creado el archivo $ORIG_PWD/$OUT_FILE :)"
- Generate a polygon from the nodes using the convex hull tool in QGIS and use it to generate the working area on the TM
- Filter the nodes to import that are already near a mapped node in OSM using the osmsplitdup script
- Split the file with the possibly non-duplicated nodes to have one file for each task on the TM using the osmboxes script
Data Import Workflow
Team Approach
Import will be undertaken by experienced OSM volunteers, following a strict workflow.
References
The import will be discussed in the Talk-Es list and in the Imports list.
Workflow
As most (if not all) of the volunteers will be Spanish speaking, they can follow the detailed workflow in Spanish language on the TM.
The workflow will be as follows:
- 1. Install the TODO list JOSM plugin if the volunteer don't have it.
- 2. Create, if we don't have one yet, an import specific user account, like username_bcn_housenumbers_import (you will need a different email account from the account you used for your regular OSM account). Change the OSM username to that specific account in the JOSM preferences.
- 3. Configure JOSM to allow remote access.
- 4. Login into the Tasks Manager.
- 5. Select one square to work on.
- 6. Download the OSM data and the housenumbers data by clicking on the JOSM button and the link on the extra instructions.
- 7. Select all nodes on the addr:housenumber layer and add them to the ToDo list.For each node we first check its correctness, correcting any errors or typos we may still encounter. If the node is clearly wrong or suspicious of being wrong, it won't be imported in the first place, and it will be added to the comments, so it can be checked afterwards by other mappers to take a decision about it. The mapper will delete that node to be sure it won't be imported. Also we will conflate the node data with the existing if there is already one node or create a new one if not. Don't replace existing hand mapped information without on-the-ground verification.
- 8. If the housenumber is already on OSM, we'll not import it but we will check the extra information to add to this node.
- 9. After the import is done, there will be a review process using the TM.
Updates
Each year I'll download the dataset, do a diff and update the data to OSM manually. Until I get bored. yopaseopor (talk) 23:29, 11 December 2016 (UTC)
Reverse plan
In case of any trouble, JOSM reverter will be used.
After the import
Report with problems and errors
I will make made a report with all the problems and errors detected and sent to the Barcelona city council Open Data administrators. yopaseopor (talk) 23:29, 11 December 2016 (UTC)