Utah/CacheValleyAddressImport
Scope
Import address data for Cache Valley, UT. This includes the regions Logan, North Logan, Hyde Park, Smithfield, Mendon, Wellsville, Paradise, Hyrum, Providence, and other local municipalities.
Future imports could re-use much of this procedure to import addresses to any part of Utah.
Community Buy-In
Please notify Xvtn of any issues with past or future progress in this ongoing import..
- OSM Diary post 2022-06-06 - Initial thoughts, comments from local mappers.
- OSM Diary post 2022-10-05 - Specifically for this import
- OSM-US Slack - #local-utah channel
- imports@openstreetmap.org - Much discussion, mostly regarding the use of a foreign key.
License Approval
The source dataset from UGRC is licensed under Creative Commons Attribution 4.0 International License. (Utah GIS License Page)
According to the Utah OSM Wiki page, "the UGRC office has signed a waiver so we can use those datasets to improve OSM." A copy of the waiver letter is shown here and on the UT wiki page.
Documentation
Data Flow
UGRC Shapefile -> QGIS Filter Data -> OGR2OSM Translation Script to OSM format -> JOSM Quality Check -> Upload to OSM Database
Please see the README on translation script GitHub for detailed technical instructions.
Data Format, Conflation
The data will first be imported as individual address point nodes. This approach has several benefits:
- The import for an area could be considered "done" with the address points. They are usable by geocoders, search engines, etc.
- Some buildings have multiple addresses in them, they should be left as points anyway
- It's easy to manually merge an address point and building outline where it's obvious they belong together.
Next, where appropriate, address points are merged with other features:
- Using JOSM Conflation plugin, move address tags to nodes and areas where it can reasonably be assumed they represent the same feature: Buildings, certain amenities, etc.
- Manually review cases where it isn't possible to automatically merge.
- Where possible, do armchair mapping to resolve conflicts with obvious solutions.
- Use notes or FIXME tag to queue up conflicts for in-person review.
- Check on the ground to resolve remaining issues.
- These steps are all performed before uploading to OSM.
Tag Format
- Required:
addr:{housenumber, street, postcode, city, country, state}
- Optional / as applicable:
addr:unit
for apartment units, etc. According to the wiki, the secondary unit designator is not commonly used, but mail can sometimes not be delivered to addresses missing it. This seems like something to concatenate and include: egaddr:unit=Apt 3
rather thanaddr:unit=3
name
fromLandmarkNa
field if presentfixme
- When a potential issue is detected by translation script. For example, one address in Clarkston has no street suffix, so the script has done what it can (addr:street=North 300
) and flagged for manual review.
Future Updates
Rather than using foreign keys, which are now discouraged, updates will be handled mainly using the same process as the initial import: Select areas which have missing or incorrect data, combine automatically where possible, manually review conflicts as necessary.
Other Details
- I (Xvtn) am prepared to do this import myself. I'd love to develop a plan to collaborate with other mappers if there is interest.
- The import will be done in smaller chunks, for example small municipalities such as Clarkston, UT. Changeset size will be limited to a single municipality.
- The "bot" account used will be
Xvtn_Import
.
Sample Data
Before (.gpkg) and after (.osm) translation script data files are available on the Git repository. Note that .osm files are without any manual work - they are straight out of the script.
Timeline / Schedule / Progress
- Preparation
- 2022-10-05 - Initial Proposal
- 2022-10-12 - Approval from imports@osm
- Import Data
- 2022-10-13 - Import Clarkston
- 2022-10-16 - Import Newton
- 2022-10-18 - Import Providence
- 2022-10-24 - Import North Logan
- 2022-12-13 - Import Richmond
- 2023-05-14 - Import Logan
- Ongoing
- Updates
- Need to finish importing data first