Import/Boston Street Address Management (SAM) Import
Current Status
2016-04-01: Awaiting response from City of Boston regarding attribution requirements since the data is licensed under CC BY 4.0.
2016-09-29: Restarted trying to reach City of Boston GIS over the phone to get a definite answer on attribution requirements.
About
Boston, Massachusetts is relatively well mapped, however the buildings usually lack addr:housenumber=* and addr:street=* tags. Since the city is not built on a grid-based design, the streets can wander around making finding the right house with OpenStreetMap a challenge. City of Boston provides a sizeable amount of data via Open Government initiative. One of the things available freely is a Live Street Address Management (SAM) dataset that identifies the location of the buildings on the map. We can use that.
As part of the import, existing buildings that span multiple tax parcels will be split provided that the roof pattern looks different among different parts of the building.
Import Plan Outline
Since a large amount of Boston buildings were imported from a LIDAR/orthoimagery survey (or traced from Bing), a sizable number of OSM buildings are actually 2 or more real world buildings, so the first step will be to update them on the map itself and it can be performed while waiting for an approval.
Import will be performed by neighborhood in these stages:
- Import unique buildings.
- Regenerate a set of .osc, .osn files, post the links here for ongoing review by interested parties.
- Upload ONLY unique buildings into OSM (no fixmes, no notes, etc.).
- Modify the rest of buildings so that they become unique (verify with Tax Parcel Map/Bing and roughly with MassGIS Data - 2015 WorldView Orthoimagery) if possible, if not - skip them.
- Add missing buildings, verify they are unique.
- Repeat from #2
- When there are no more buildings to split, import
-addresses.osc
. This will put address nodes onto the buildings that are confined to a single tax parcel, yet have more than one address. The number of such buildings is relatively low (see the data below).
Goals
Add the missing house numbers and associate the building ways with the street they are on. Improve navigation for offline applications and provide correct destination information for OSRM, graphhopper etc. With addresses allocated sometimes seemingly at random, having the correct addr:housenumber=*s is extremely helpful.
Schedule
Total time: 2 weeks from the date of approval.
- Initial upload for Jamaica Plain (the current location of User:Rye), corrections (1 day).
- Unique addr:housenumber upload for all the neigborhoods (1 day).
- Splitting the buildings according to the Boston Tax Parcel map, adding missing buildings and upload of the changes in two steps (rest of the time).
Import Data
Background
Data source site: http://bostonopendata.boston.opendata.arcgis.com/datasets/b6bffcace320448d96bb84eabb8a075f_0
Tax Parcel Basemap: http://app01.cityofboston.gov/ParcelViewer/?pid=1103032000 (http://gis.cityofboston.gov/arcgis/rest/services/Basemaps/base_map_webmercatorV2/MapServer)
Tax Parcel Shapefile: http://boston.maps.arcgis.com/home/item.html?id=cd6d9058ee9d4475b924751a2bb9d263
MassGIS Data - 2015 WorldView Orthoimagery: http://www.mass.gov/anf/research-and-tech/it-serv-and-support/application-serv/office-of-geographic-information-massgis/datalayers/colororthos2015wv.html
Data license: CC BY 4.0 - Open and Protected Data Policy, awaiting for attribution clarification as per Import/ODbL_Compatibility.
Type of license (if applicable): CC BY 4.0
ODbL Compliance verified: in progress
Script repository: https://bitbucket.org/roman-yepishev/osm-boston-sam-import (Heavily Work In Progress, spaghetti code by User:Rye).
CartoDB visualization: https://rye.cartodb.com/viz/e81047a4-fbfb-11e5-a30c-0ef24382571b/embed_map
OSM Data Files
MediaFire folder with all the data: http://www.mediafire.com/folder/j7lb1tuhwd17i
These files are for reference only and to increase the bus factor (in case User:Rye is hit by a bus). These are now ready for upload (no safeguards of any kind).
Unique buildings: ways that can be uniquely tagged and require no further manual interaction - .osc
file.
Address nodes: nodes that bear addr:*
tags for the buildings that are within the same tax parcel yet having multiple numbers - .osc
file.
Fixme buildings: (not for upload into OSM) ways that are tagged with more than one address, ways that are already tagged with an address that does not match City data. - .osc
file.
Missing/fixme building markers: (not for upload into OSM) Missing buildings, buildings that are most likely not split properly (same as first item in Fixme buildings) according to both tax parcels and SAM, buildings with unknown street or a street name that does not match SAM exactly. .gpx
is for field verification, .osn
for JOSM.
Name / Poly | Unique buildings | Address nodes | Fixme buildings | Missing/fixme building markers |
---|---|---|---|---|
Allston / Brighton | 6509 | 30 | 85 | 250 .gpx, .osn.gz |
Back Bay | 1516 | 4 | 12 | 34 .gpx, .osn.gz |
Bay Village | 234 | 0 | 17 | 39 .gpx, .osn.gz |
Beacon Hill | 1233 | 2 | 12 | 51 .gpx, .osn.gz |
Charlestown | 1765 | 14 | 16 | 95 .gpx, .osn.gz |
Chinatown | 158 | 3 | 6 | 20 .gpx, .osn.gz |
Dorchester | 13835 | 51 | 22 | 306 .gpx, .osn.gz |
East Boston | 4272 | 32 | 128 | 692 .gpx, .osn.gz |
Fenway / Kenmore | 350 | 5 | 85 | 558 .gpx, .osn.gz |
Financial District/Downtown | 290 | 2 | 22 | 63 .gpx, .osn.gz |
Government Center/Faneuil Hall | 38 | 0 | 1 | 22 .gpx, .osn.gz |
Hyde Park | 7617 | 8 | 6 | 140 .gpx, .osn.gz |
Jamaica Plain | 4143 | 7 | 59 | 177 .gpx, .osn.gz |
Mattapan | 5535 | 15 | 33 | 199 .gpx, .osn.gz |
Mission Hill | 864 | 8 | 26 | 204 .gpx, .osn.gz |
North End | 968 | 9 | 6 | 49 .gpx, .osn.gz |
Roslindale | 6748 | 9 | 40 | 239 .gpx, .osn.gz |
Roxbury | 6322 | 217 | 196 | 788 .gpx, .osn.gz |
South Boston | 5617 | 84 | 42 | 284 .gpx, .osn.gz |
South End | 2741 | 8 | 32 | 158 .gpx, .osn.gz |
West End | 117 | 0 | 5 | 26 .gpx, .osn.gz |
West Roxbury | 8275 | 36 | 22 | 135 .gpx, .osn.gz |
Last updated: 2016-04-11 18:08:18-0400
Previews
File | Description | Size | Last updated |
---|---|---|---|
Massachusetts-latest+boston-sam_2.obf | OsmAnd map + unique Boston SAM numbers | 211408445 | 2016-04-11 16:38:38 |
massachusetts-latest+boston-sam.osm.pbf | OSM PBF export + unique Boston SAM numbers | 222951460 | 2016-04-11 10:27:25 |
Import Type
While split between many uploads, this is a one-time action. The scripts are released in public domain to enable updates of this kind in the future. Future updates will require the same steps to be taken.
JOSM will be used to review the changes, resolve conflicts and upload the resulting updates as a separate 'ryebread_bos_sam_import' user.
Data Preparation
Data Reduction & Simplification
Only updated way tags will be uploaded. If there are missing ways in OSM and the buildings are found in Bing/Boston Tax Parcel basemap, these buildings will be added after the initial import of unique addresses.
Tagging Plans
Excluded from processing (won't be updated, won't appear in fixme/notes):
- Buildings that have source:addr=survey.
- Buildings that have at least one entrance=yes node with addr:housenumber=* defined.
The following tags will be added by buildings affected by the import:
Since the street names may be different in various official sources, a separate mapping table is set up. The script adds a note/gpx waypoint pointing to a building that is about to be tagged with an unknown street name.
Changeset Tags
tag | value |
---|---|
comment | Import Boston SAM addresses (Neighborhood Name) |
source | Boston Street Address Management |
Data Transformation
osmconvert will be used to convert the .pbf export to .osm. MariaDB is used to store both osm data as well as the SAM Addresses. XML is processed with an actual XML parser library.
Data Transformation Results
See OSM Data Files above.
Data Merge Workflow
Team Approach
I'm going to perform this work solo.
References
Buildings that only have 'fixme' added by the system will not be uploaded (as that's bad data).
Workflow
- Import fresh
massachusetts_latest.pbf
file into MariaDB - Run
boston-sam-generate-osm.py "Neighborhood Name"
. - Open resulting polygon, notes file and osc file in JOSM with Boston Tax Parcel Basemap as imagery.
- Evaluate the changes. Validate the street names.
- For buildings that have more than one address:
- Using Bing and Tax Parcel Basemap verify the buildings are actually split. If they are, split them in OSM
- If buildings aren't split on Tax Parcel Basemap and the aerial image does not show party wall, manually mark the building with "fixme=survey required" remove any addr: tags.
- Upload the OSM building changes as a regular user.
- Re-fetch data for the neighborhood and re-run the import script.
- For buildings that have more than one address:
- Upload the resulting dataset in chunks of 100 objects as 'Boston SAM Import' user.
- If needed, revert the changesets produced by 'Boston SAM Import' user.
Conflation
A building with source:addr=survey will not be modified.
If a building already has the correct addr:housenumber=*, addr:city=*, addr:postcode=*, and addr:street=*, no changes will be uploaded. If a building has a number that is different from the one currently specified, Tax Parcel Basemap, Bing, and search engines will be used to verify the address. If new address is not verifiable, it stays in the -fixmes.osm. In case postcode starts from the ZIP retrieved from SAM, it is left unchanged as ZIP+4 format may have been used during manual entry.
Since current Boston joined a number of cities, some street names are not unique within the official city borders (e.g. Boylston Street in Jamaica PLain and in Back Bay neighborhoods). The street mapping script will therefore map the wrong SAM street ID. This is not an issue, since addr:street=* references the street name, not an ID.
QA
- Using JOSM verify that no buildings about to be uploaded have
+addr:
in fixme. - Verify that there is only 1-1 building way mapping to addr:housenumber=* for a addr:street=*.
- At the moment the collection script does not support buildings presented as multipoligon relations, these will trigger a false-positive for a missing building and will have to be checked manually. As of 2016-03-19 there are 48 relations tagged as building in Boston area being processed.
See also
Thread in talk-us@: https://lists.openstreetmap.org/pipermail/talk-us/2016-March/015994.html