Corine Land Cover Romania 2006
This page aims to describe the phases of the CLC land cover import for Romania.
Important : this page is a draft; no data processing is performed at this stage on the real openstreetmap servers/data.
The target
The CLC land cover import for Romania is targeted at using the publicily available data from the European Environment Agency with respect to land cover; additional data is available for major cities, under the designation of 'Urban Atlas'; while the data has almost the same format, this document addresses mainly the CLC import
The unanimously accepted solution, even if a very small number of voters expressed their opinion (just like in modern politics), will ensure that existing land cover data is not deleted, it is retagged for later analysis and possible revert, while the new data is imported. See Romania CLC Import for more details
The approach
The following stages have currently been established :
Stage | Description | Status | Notes | Who's involved (feel free to volunteer to any stage) |
---|---|---|---|---|
1 | Request for permission | Completed | See ro-talk list for details | stefanu |
2 | Data preparation | Completed | Data is provided by the EEA as archived shapefiles; a one-pass coordinate conversion is required for easy processing using a C++ application that makes use of the shapelib library; conversion is performed using 'ogr2ogr' utility from the GDAL library. | stefanu |
3 | OSM data corrections | Completed | Analisys of the existing data on OSM server to look for inconsistencies (like wrong landuse and natural tags). Corrections on objects that seem obvious errors. For now, this includes wrong tagging and layer corrections (overlapping polygons placed different layers due to rendering issues are converted into relations) | stefanu |
4 | Data conversion | Completed (see details on the bot algorithm below for details) | Data has been converted to OSM XML format by a dedicated utility written in C++ and using the shapelib library; the current status of the conversion enables the creation of one OSM file per CLC land cover code, ultimately resulting in a large OSM file (approx 1.5 Gb). See data testing section below for details. | stefanu |
5 | Retagging bot development | Completed | Retagging bot development uses Java code from the tile splitter application by Steve Ratcliffe & Chris Miller. It uses simplified XML parsing and objects along with a simple OSM API client. This may well serve for developing a full API client, since I found none while working on this. | stefanu |
6 | Retagging bot testing | Completed | Using the server at http://api06.dev.openstreetmap.org/ for this purpose | stefanu |
7 | Import bot development | Completed | Simple OSM elements imported and manipulated using a simple OSM API Java component aimed at this purpose. | stefanu |
8 | Import bot testing | Completed | Using the server at http://api06.dev.openstreetmap.org/ for this purpose | stefanu |
9 | Prepare existing OSM data for retagging | Completed | Some work has been done for this stage along with the development of a filtering application needed for the Romanian Garmin map. See data testing section below for details. | stefanu |
10 | Run the retagging bot | As needed, on a per-landuse type basis | stefanu | |
11 | Run the import bot | As needed, on a per-landuse type basis | CLC code being imported : 311-Broad-leaved forest. CLC codes imported so far : 112 (Continuous urban fabric), 124 (Airports), 131 (Mineral extraction sites), 221 (Vineyards), 312 (Coniferous forest), 313 ( Mixed forest). | stefanu |
Data tagging
CLC code | CLC description 1 | CLC description 2 | CLC description 3 | Tags | Number of polygons | Notes |
---|---|---|---|---|---|---|
111 | Artificial surfaces | Urban fabric | Continuous urban fabric | landuse=residential | 0 | No data |
112 | Artificial surfaces | Urban fabric | Discontinuous urban fabric | landuse=residential | 10743 | Imported |
121 | Artificial surfaces | Industrial, commercial and transport units | Industrial or commercial units | landuse=retail;industrial | 2707 | |
122 | Artificial surfaces | Industrial, commercial and transport units | Road and rail networks and associated land | landuse=industrial | 74 | |
123 | Artificial surfaces | Industrial, commercial and transport units | Port areas | landuse=harbour | 30 | |
124 | Artificial surfaces | Industrial, commercial and transport units | Airports | aeroway=aerodrome | 26 | Imported |
131 | Artificial surfaces | Mine, dump and construction sites | Mineral extraction sites | landuse=quarry | 213 | Imported |
132 | Artificial surfaces | Mine, dump and construction sites | Dump sites | landuse=landfill | 112 | |
133 | Artificial surfaces | Mine, dump and construction sites | Construction sites | landuse=construction | 47 | |
141 | Artificial surfaces | Artificial, non-agricultural vegetated areas | Green urban areas | leisure=park | 103 | |
142 | Artificial surfaces | Artificial, non-agricultural vegetated areas | Sport and leisure facilities | leisure=park | 125 | |
211 | Agricultural areas | Arable land | Non-irrigated arable land | landuse=farm | 10651 | |
212 | Agricultural areas | Arable land | Permanently irrigated land | landuse=farm | 0 | No data |
213 | Agricultural areas | Arable land | Rice fields | landuse=farm | 16 | |
221 | Agricultural areas | Permanent crops | Vineyards | landuse=vineyard | 3233 | Imported |
222 | Agricultural areas | Permanent crops | Fruit trees and berry plantations | landuse=orchard | 3265 | |
223 | Agricultural areas | Permanent crops | Olive groves | landuse=orchard; trees=olives | 0 | No data |
231 | Agricultural areas | Pastures | Pastures | landuse=meadow | 17692 | |
241 | Agricultural areas | Heterogeneous agricultural areas | Annual crops associated with permanent crops | landuse=farm | 0 | No data |
242 | Agricultural areas | Heterogeneous agricultural areas | Complex cultivation patterns | landuse=farm | 9274 | |
243 | Agricultural areas | Heterogeneous agricultural areas | Land principally occupied by agriculture, with significant areas of natural vegetation | landuse=farm | 10718 | |
244 | Agricultural areas | Heterogeneous agricultural areas | Agro-forestry areas | landuse=farm | 0 | No data |
311 | Forest and semi natural areas | Forests | Broad-leaved forest | landuse=forest; wood=deciduous | 11421 | |
312 | Forest and semi natural areas | Forests | Coniferous forest | landuse=forest; wood=coniferous | 2956 | Imported |
313 | Forest and semi natural areas | Forests | Mixed forest | landuse=forest; wood=mixed | 2490 | Imported |
321 | Forest and semi natural areas | Scrub and/or herbaceous vegetation associations | Natural grasslands | natural=grassland | 1771 | |
322 | Forest and semi natural areas | Scrub and/or herbaceous vegetation associations | Moors and heathland | natural=heath | 331 | |
323 | Forest and semi natural areas | Scrub and/or herbaceous vegetation associations | Sclerophyllous vegetation | natural=scrub | 0 | No data |
324 | Forest and semi natural areas | Scrub and/or herbaceous vegetation associations | Transitional woodland-shrub | natural=wood; wood=mixed | 7509 | |
331 | Forest and semi natural areas | Open spaces with little or no vegetation | Beaches, dunes, sands | natural=beach | 143 | |
332 | Forest and semi natural areas | Open spaces with little or no vegetation | Bare rocks | natural=rock | 56 | |
333 | Forest and semi natural areas | Open spaces with little or no vegetation | Sparsely vegetated areas | natural=scrub | 179 | |
334 | Forest and semi natural areas | Open spaces with little or no vegetation | Burnt areas | landuse=rock | 0 | No data |
335 | Forest and semi natural areas | Open spaces with little or no vegetation | Glaciers and perpetual snow | natural=glacier | 0 | No data |
411 | Wetlands | Inland wetlands | Inland marshes | natural=wetland; wetland=marsh | 1515 | |
412 | Wetlands | Inland wetlands | Peat bogs | natural=wetland; wetland=bog | 2 | |
421 | Wetlands | Maritime wetlands | Salt marshes | natural=wetland; wetland=saltmarsh | 3 | |
422 | Wetlands | Maritime wetlands | Salines | landuse=salt_pond | 0 | No data |
423 | Wetlands | Maritime wetlands | Intertidal flats | tidal=yes | 0 | No data |
511 | Water bodies | Inland waters | Water courses | waterway=riverbank | 582 | |
512 | Water bodies | Inland waters | Water bodies | natural=water | 895 | |
521 | Water bodies | Marine waters | Coastal lagoons | natural=water | 9 | |
522 | Water bodies | Marine waters | Estuaries | natural=coastline | 0 | No data |
523 | Water bodies | Marine waters | Sea and ocean | natural=coastline | 1 |
Data testing
Some data testing has been performed while using the CLC data for the creation of a Garmin compatible map. OSM files created at stage 3 have been successfully pipelined through the tile splitter and mkgmap, and resulted in an IMG file (Garmin proprietary format) targeted at compatible devices.
Steps taken in preparation of data
- translation of CLC data; a couple of batch files was created to translate all shapefiles to the WGS84 coordinate system; takes a couple of hours to complete
- filtering of CLC data and dump to OSM format; a custom C++ app was written solely for this purpose; takes a couple of hours to complete
- merging of OSM data; the standard DOS command 'copy' does a very good job at this; takes minutes to complete; however, there is a problem with the generated file, and the very last byte needs to be dropped out; this is done with a very simple C++ app.
- import into a temporary database for duplicate merging; another small C++ custom app was created for this purpose; the very same app does the reverse operation also, at step 5; this takes also a few hours to complete.
- duplicate merging; two stored procedures in the PostgreSQL database fullfill this goal.
- export from database into OSM format again; the very same custom app from step 3 does the reverse; this takes also a few hours to complete.
- import to the final import database; yet another custom app, this time written in Java (made with bits and pieces from the map tiling app targeted at Garmin devices) does the job; this takes also a few hours to complete.
Algorithms used
Tools and applications used (details)
- ogr2ogr (external link); this small utility from GDAL is used to translate from the original coordinate system to WGS84.
- shp2osm; custom built C++ helper app based on the shplib library to filter and dump data from the shapefiles into OSM format.
- shp2db; custom built C++ helper app to import data from the shp2osm output into a very simple, temporary, database, so that duplicate nodes can be removed. ALso used to reconstruct the OSM file after duplicate nodes have been removed
- one PostgreSQL stored procedure that eliminates node duplicates
- filter; custom built Java helper app, that initially started as a filtering app for the daily planet extract for Romania. Now, this app performs several functions :
- filter : reads a daily country extract and removes all landcover polygons, while resampling object IDs; this function was needed to generate a garmin map with no landcover data
- border : extract the Romanian border from the daily country extract; this was needed so that the shp2osm helper app use a proper polygon for determining land use data that does not exceed the country borders.
- polygon : converts the country border saved as OSM file into polygon format.
- retag : function designed for retagging, but instead the test function is used.
- upload : function designed for uploading CLC data, but instead the test function is used.
- setup-db : created the master import database used by the import process, from the OSM file
- test : originally designed as a test function to be used with the dev server at openstreetmap.org, it now holds all the import code and acts as retagging and uploading procedure.
- one PostgreSQL stored procedure that is used to examine import statistics
Technical info
This section describes in further detail the algorithm used by the bots oulined in previous sections. It is not intended to be a fully detailed software specification, but rather explain the concepts used.
- How to find out what elements need retagging
- Based on the code from the tile splitter by Steve Ratcliffe ([1]), a map filtering application has beed developped; its main purpose is to filter a daily extract of the planet file for Romania and remove all land cover elements, then resample all element IDs; the same utility can be used to dump the filtered elements so they can be used as input by the retagging bot.
- UPDATE 01/11/2010 : filtering app has been modified to du the dumping; also, several functions have been added, as helpers to reach the main goal.
- How to do the retagging
- The best way to do this seems to use the direct API access; the bot should use the element IDs from the filtering app, add or modify the tags, then update it directly on the server; given the restrictions on the size of changesets, there is a need for the state of the operation to be persistent; this will also enable pause/resume. osmosis also sounds like a good idea, but it requires the setup of a database and additional processing.
- UPDATE 01/11/2010 : Direct API access it is; a veri simple OSM API 0.6 Java component has been developped. It's not aimed at being 100% compliant with all the API functions, but rather do it's job for importing landuse data.
- How to store the OSM data prior to importing
- Given the volume of the data, the restrictions on the changesets and the need for the bot state to be persistent, one idea is to save one OSM file per way/relation and to process them one at a time; this avoids setting up a database and related access code. However, a database (not necessarily a PostGIS database) may solve other issues. See below for details.
- UPDATE 01/11/2010 : Data preparation stage is completed. All landuse data has been filtered using a polygon created from Romanian boundaries from the OSM server, duplicate points have been removed and all elements have been imported into a database. This database will serve as a data source for the import bot, storing current import state, allowing it to resume, since it is expected to be a lengthy operation.
- How to save the state of the import
- The simplest way to do this seems to have two folders on disk : the ways that have been already uploaded to the server and those waiting to be uploaded; once a changeset has sucessfully been closed, the included ways/relations are moved from one folder to another; the only issue here is that there will be a large number of small files stored on disc. While using a database, this is even simpler.
- UPDATE 01/11/2010 : See the above paragraph for details; basically the import state is saved in a database.
- What is needed to develop the bots
- Time ;) Besides that, there is still a preliminary decision to make : C++ or Java. Or something else if anyone has a better idea. More to come on this issue later.
Known and possible issues
- No longer an issue; all data is stored in a database.
large numbers of small files will be stored on disc; will have to determine if it is the best approach; a database may work around this issue. - No longer an issue; node duplicates have been removed prior to import.
nodes are duplicated in 'touching' ways; two ways that have a common segment will have their own nodes, thus resulting in a large number of nodes, almost all being duplicates; some intelligent algorithm must be used to avoid this; normally, this is where a database comes in handy.