Bash/Script for cleaning up the descriptive text LINZ layer
- About
This page was last updated 12 years ago and contains out-of-date information. See LINZ for the latest info. - Captured time
- 2012
The descriptive text LINZ layer has an assortment of unique place names and generic names like "School" or "Hospital". We can convert the generic names to OSM tags before uploading. These are all individual nodes. ()See also LINZ attribute matching and LINZ geo_name matching.
- Note: the new version of web app exports .osc files, and we'll try to do this on the tag matching site instead of a shell script. That's all a bit academic though as this layer was uploaded in its entirety in 2010, and has been slowly & manually merged into the nearby unlabeled ground features since then.
First download the descrip_text.osm.gz export from the LINZ-2-OSM web app. (I renamed it chat_descrip_text.osm.gz locally to show that it's the Chatham Islands data.)
Decompress it:
gzip -d descrip_text.osm.gz
Create a sorted list of unique names:
grep 'k="name"' descrip_text.osm | sort | uniq -c | \ sort -nr | cut -f1,4 -d'"' | sed -e 's/<[^"]*"//'
Finally search and replace some common generic values:sed -i \ -e 's/k="name" v="Aerodrome"/k="aeroway" v="aerodrome"/' \ -e 's+k="name" v="Airstrip"+k="aeroway" v="aerodrome" />\n <tag k="type" v="airstrip"+' \ -e 's/k="name" v="Camp"/k="tourism" v="camp_site"/' \ -e 's+k="name" v="Fire lookout"+k="man_made" v="tower" />\n <tag k="tower:type" v="observation"+' \ -e 's/k="name" v="Fire station"/k="amenity" v="fire_station"/' \ -e 's/k="name" v="Grave"/k="historic" v="grave"/' \ -e 's/k="name" v="Hall"/k="amenity" v="public_hall"/' \ -e 's/k="name" v="Hospital"/k="amenity" v="hospital"/' \ -e 's/k="name" v="Hotel"/k="tourism" v="hotel"/' \ -e 's/k="name" v="Hut"/k="tourism" v="alpine_hut"/' \ -e 's/k="name" v="Landfill"/k="landuse" v="landfill"/' \ -e 's/k="name" v="Power generation"/k="power" v="generator"/' \ -e 's/k="name" v="Quarry[ ]*"/k="landuse" v="quarry"/' \ -e 's/k="name" v="Racecourse"/k="highway" v="raceway"/' \ -e 's/k="name" v="Racetrack"/k="leisure" v="track"/' \ -e 's/k="name" v="Reservoir"/k="landuse" v="reservoir"/' \ -e 's/k="name" v="School"/k="amenity" v="school"/' \ -e 's/k="name" v="Sch"/k="amenity" v="school"/' \ -e 's/k="name" v="Silo"/k="man_made" v="silo"/' \ -e 's/k="name" v="Substation"/k="power" v="sub_station"/' \ -e 's/k="name" v="Substn"/k="power" v="sub_station"/' \ -e 's/k="name" v="University"/k="amenity" v="university"/' \ -e 's/k="name" v="Weir"/k="waterway" v="weir"/' \ -e 's/k="name" v="Well"/k="man_made" v="well"/' \ -e 's/k="name" v="(disused)"/k="disused" v="yes"/' \ descrip_text.osm
Top 100 repeated names from the mainland:3524 Airstrip 2185 Sch 1034 Quarry 798 Hall 488 Hut 329 Marae 320 Gravel pit 256 Camp 229 Substation 204 Landfill 196 Reservoir 191 Rapids 140 Silo 128 Mill 110 Hospital 109 Cableway 100 Substn 83 Oxidation ponds 73 Power generation 71 Racecourse 69 Silos 64 Gas valve 51 Walkwire 50 Oxidation pond 48 Disused mine 45 Huts 45 Gun club 43 Weir 40 Rifle range 38 Well 38 Quarries 38 Old dam 38 Derelict 36 Abattoir 36 (disused) 33 Pipeline 31 Water treatment plant 27 Siphon 27 Shelter 26 Aerodrome 25 Rock bivouac 25 Old gold workings 24 Surf club 24 Factory 23 Derelict hut 21 Marina 19 Gravel pits 19 Fire lookout 18 Limeworks 18 Forest headquarters 17 Showgrounds 16 Reservoirs 16 Gas compound 16 (historic) 16 (derelict) 15 Intake 15 Disused gold workings 15 Aerial hazard 14 Old well 13 Spillway 13 Numerous disused gold workings 13 Grave 12 Thermal area 12 Racetrack 12 Disused 11 Prison 11 Camping ground 11 Airstrips 10 Motor camp 9 Wildlife refuge 9 Visitor centre 9 Lodge 8 Vehicle access along beach at low tide 8 University 8 Surge chamber 8 Flume 8 Fertilizer works 8 Disused railway 8 Derelict buildings 8 Airport 7 Gun emplacements 6 Water intake 6 Suspension bridge 6 Speedway 6 Settling pond 6 Quicksand 6 Pumice pit 6 Old tunnel 6 Meteorological station 6 Gold workings 6 Bivouac 5 Shingle works 5 Sale yards 5 Riverbed subject to rapid flooding 5 Old dams 5 Old battery 5 Numerous sinkholes 5 Numerous rock outcrops 5 Gas well 5 Fuel tanks ...
and 421 more names @ 5 or less occurrences, some* more important than others.
[*] e.g. "INTERMITTENT LIVE FIRING"
Placement
On the NZOGPS mailing list, Peter S wrote:
> The point coords describe the left, vertical lower case center location of > the label as it was applied to the 260 series maps, and usually to be found > in the most blank spot on the map near the proper location, the offset can > be literally kilometers away.