2015 Sierra Leone village mapping data - data processing
Background
A quick guide of how to take the data collected by Ivan’s ‘motorbike mappers’ and get it uploaded onto OSM.
The priorities are to get the data ready for OSM and not for analysis; this means that the purely ‘spatial data’ is a priority (i.e location, name of village, administration ward) rather than the demographic/social/health data collected (i.e. chief’s mobile, no. of health workers).
The process works on the idea that you’ve been provided with the CSV sheet or Shapefile of the data collected from the data collection form.
To get the data ‘ready’ for OSM, it first needs to be cleaned and processed following the process outlined below. You’ll be left with two shapefiles, one of villages that are new to OSM and one of villages that currently pre-exist on OSM; these will be used to update OSM through merging or by manual editing (respectively). The latter may be a possible new task to add to the HOT OSM tasking manager.
Process
Step 1: Download the data
Check you have:
- CSV or XLS file
- Shapefile (optionable)
Step 2: Clean the data
Depending on your preference, you will need to edit either the cells (spreadsheet) or attribute table (shapefile). Recommendation is to edit within the CSV or XLS as it is easier to edit values and apply format changes.
The process to clean the data is as follows:
- ● Delete any non-spatial columns, including:
- ○ CHIEF_NAME
- ○ CHIEF_PHONE_NUMBER
- ○ HEALTH_WORKER_NAME
- ○ HEALTH_WORKER_PHONE
- ○ HAS_HC
- ○ HC_TYPE
- ○ TYPE_HC
- ○ NEAREST_HC_LOCATION
- ○ NEAREST_HC_TYPE
- ○ HAS_SCHOOL
- ○ SURVEY_START_TIME
- ○ SURVEY_END_TIME
- ○ DEVICE_ID
- This is all good information - but not relevant to OSM, hence the deletion!
- ● You should be left with:
- ○ LOCATION
- ○ LATITUDE
- ○ LONGITUDE
- ○ VILLAGE_NAME
- ○ ALT_VILLAGE_NAME
- ○ DISTRICT
- ○ CHIEFDOM
- ○ WARD
- ○ CONSTITUENCY
- ○ NUMBER_OF_HOUSEHOLDS
- ● Next, you’ll need to clean the data. You need to check for:
- ○ No entries / Empty cells
- ■ Look out for entries that have a location but no other info - these are surveys that have been started and have no actual data!
- ○ Consistency in spelling, typos, and formatting
- ■ E.g. Tonkolili vs. tonkolili, Kholifa Rowala vs Kholifa Rowalla
- ○ Consistency in names and data
- ■ E.g. Tonkolili vs Tonkolili District vs Poor Tonkolili
- ○ Consistency in spelling, typos, and formatting
● Once cleaned, make sure you save what you have done.
Step 3: Create a shapefile from the data
- ● Using ArcGIS or QGIS, import the cleaned CSV dataset into a vector data (see ArcGIS/QGIS help on how to do this).
- ○ Note, you are using the columns _location_latitude and _location_longitude for spatial reference
- ○ Coordinate system used is WGS84.
- ● This should plot your table data as point data. You can add a base map to check that the data is in the right place.
- ○ Go to Plugins → Manage & Install Plugins → select ‘Openlayers’ to install the base map plugin. Then select web → Openlayers and choose from the options.
Step 4: Process the village data
- ● Install the QuickOSM plugin (go to Plugins → Manage & Install Plugins → select QuickOSM)
- ● Run a QuickOSM query to download the OSM data for all place names within the extent of the village survey shapefile.
- ○ Click ‘Web’ and run ‘QuickOSM’.
- ○ Use the following options:
- ■ Key: select ‘place’
- ■ Value: ‘village’
- ■ In → Extent of layer: select your village survey layer
- ○ Run query; this should add the OSM village place data to your map.
- ○ Do this again for value: ‘hamlet’, ‘town’
- ● Due to the way QGIS operates, you will now need to reproject your shape files and open a new map canvas in order to complete the next step. To do so:
- ○ Right click on the shapefile and click ‘Save as’.
- ○ When saving, select the CRS: World_Azimuthal_Equidistant .
- ○ Save the file to distinguish it from WGS84 (e.g. name_AZM)
- ○ This will now allow you to process a buffer around your data in meters, rather than degrees.
- ○ Do this for both files.
- ● Open a new map canvas and add the two files.
Step 5: Distinguish between pre-existing and new villages
- ● Run a buffer around the OSM village place data file.
- ○ Head to ‘Vector’ → ‘Geoprocessing Tools’ → ‘Buffer’
- ○ Enter the following options:
- ■ Input vector layer: Your osm village place data file
- ■ Buffer distance: 500 (don’t worry about trying to add a unit sign)
- ■ Output shapefile: Save it somewhere you can find it.
- ■ Tick ‘Add to canvas’
- ● Next, you will need to extract the village survey layer data that intersects with the buffer.
- ○ Head to ‘Vector’ → ‘Research Tools’ → Click ‘select by location’
- ○ Enter the following options:
- ■ Select features in: village survey layer data
- ■ That intersect with: buffer layer
- ■ Tick ‘Add result to canvas’
- ○ You should now have three layers: village survey layer data, osm village place data, and villages within 500m of osm data layer (i.e. these are likely to be duplicate POIs).
- ● The next step is to extract the ‘new villages’ i.e. POIs that currently do not exist in OSM. To do so, you’ll be look at the ‘difference’ between the shapefiles.
- ○ Head to ‘Vector’ → ‘Geoprocessing Tools’ → ‘Difference’
- ○ Enter the following options:
- ■ Input vector layer: Your village survey layer data file
- ■ Difference layer: villages within 500m of osm data layer
- ■ Output shapefile: New villages to add. Save it somewhere you can find it.
- ■ Tick ‘Add to canvas’
Step 6: Check the data!
This step is extremely important to ensure that we don’t duplicate or mess up OSM.
- ● For both of the newly created files, villages within 500m of osm data layer and new villages to add, we need to check for duplication within the survey data.
- ● The most simple way to do this, is to scan the data visually to check for any POIs sitting very closely to one another.
- ● Duplicates may have been caused by two different surveyors surveying the same villages - whilst you may have spotted this in the earlier cleaning stage, sometimes a few may have escaped the process.
- ● To help with the scanning, a grid system can be used - this allows you to visually separate and segregate data into squares to check relatively easily.
- ○ To create a grid:
- ■ Head to ‘Vector’ → ‘Research Tools’ → ‘Vector Grid’
- ■ Enter the following options:
- ● Extent layer: the original village survey layer data layer (you’ll need to click ‘Update extents from layer’)
- ● X and Y: 5000.00
- ● Output shapefile: 5000_Grid .Save it somewhere you can find it.
- ● Output: as polygon
- ● Tick ‘Add to canvas’
- ○ You can now use this grid to help scan the data for duplicates (e.g. divvy up the squares between yourself and anyone else involved).
- ● You may want to first scan the data, and select any of the squares that appear to contain duplicates. You can then save these squares as a separate file, and then use these to focus your checking.
- ○If you spot a possible duplicate (in either layer), use the ‘i’ tool within QGIS to investigate the data. If both points have the same village name (and other details), you will need to delete one of the points.
- ○ For any duplicates within the new villages to add layer, use satellite imagery base layer (Openlayers → bing) to help identify which village to keep (i.e. the point most accurately set within the village).
- ○ For any duplicates within the villages within 500m of osm data layer, again use the satellite imagery base layer (Openlayers → bing) to help identify which village to keep (i.e. the point most accurately set within the village). Do not delete the original point within the osm village place data layer (you’ll need this to help guide you when updating the data set).
- ● And now you’re set and ready to upload and update your data to OSM.