This step serves to describe how the acquisition and conversion of the original data was performed, converting them into OSM tag pattern. This step is only made once.
Data download
The data was provided by e-mail, but can be accessed through this link.
Neighborhood Addition
The parcels file does not provide the neighborhood of each unit, but the city hall also provides another SHP with the official limits of city neighborhoods. Therefore, using QGIS, neighborhoods were added for each parcel, using the function Vector-Manage Data-Associate attributes by location.
Conversion of address numbers
The conversion described here is from the CSV file due to the greater familiarity with this type of file. However, the same procedure can be done with QGIS using SHP.
- Open Microsoft Excel (or any other spreadsheet editor) without opening the file
- Go to the Data tab and click on Get Data - From Text/CSV
- Select the 65001: UTF-8 encoding in the File Source tab so that all accents marks are read correctly
- Load the data
- Delete all columns, keeping only the following columns: WKT, nomelog (street name), endereço_n (house number) , and bairro (neighborhood)
- Save the file in CSV format by clicking on File - Save As - CSV (Comma delimited). Warning: do not choose the CSV UTF-8 option as the accent mark problem will persist.
- Close Excel and open the saved file directly (without performing the import procedure) by double clicking on the file.
General data cleaning
Initially we need to clean and fix the data first. For this, we will carry out the following procedure:
- Using Excel's filter, we removed nonexistent symbols such as commas, periods, and additional zeros in both the street name and house number columns.
- Next, we convert the names from uppercase to lowercase using the function =PROPER().
- Then we make the adjustments, like De (Of) to de (of) and so on.
- With street names, we expand and correct street types, such as Av to Avenida (Avenue), using Excel's Replace function.
Standardization of street names and house numbers
Finally, we must correct the street names. We suggest doing this by neighborhood (smallest to largest, as suggested in the work plan), so that the work is more organized and can be done concurrently with the import.
- We initially put the first column (name of the streets) in alphabetical order.
- Then we filter the desired neighborhood.
- We then corrected the street names manually. Anyway, as the names are arranged in alphabetical order, just correct the first value and copy the name to the cells below.
- We also make adjustments to any address that is not correct.
- Finally, we separately save each revised neighborhood in a new CSV.
Doing this in this way, separating the work by neighborhoods, the work is not so heavy and it is possible to do this in stages (fix a neighborhood, the import is carried out, then correct another neighborhood successively).
Data conversion to OSM standard
Finally, with the data already cleaned and standardized, we have to convert this CSV file to an OSM file. For this, we will use QGIS:
- After opening QGIS, go to menu - Layer - Add Layer - Add Delimited Text Layer (Ctrl + Shift + T).
- Open previously created file, select semicolon as delimiter, first record has field names, and in Geometry Definition select Well known text (WKT), then select last column that contains polygon data, and select the type of geometry as polygon.
- When adding the polygons with the addresses, we now have to convert them to points and leave it in a file ready to be read in JOSM. For this, we do it like this:
- Menu - Vector - Geometries - Centroids
- Select data layer and perform processing.
- With the nodes layer ready, export the file as SHP. It can be saved to another type of file, but current experience recommends saving to SHP so JOSM can open it without problems.
|