Talk:Utah/CacheValleyAddressImport
Initial discussion topics
Hi, a couple of points (haha) for discussion perhaps:
- Have you considered doing a combined import for addresses & building footprints (also available from UGRC)?
- Are you aware of https://wiki.openstreetmap.org/wiki/Utah/Naming_Conventions ? Any thoughts here?
Do you already have a sample in .osm format we could look at?
- Martijn van Exel (talk) 19:22, 5 October 2022 (UTC)
I do have a few .osm samples for review - Any pointers on the best way to get them out there? (Only a couple MB in total)
- Xvtn (talk) 17:55, 6 October 2022 (UTC)
Hmm if you have dropbox or something similar you could share a folder on there. If it's a small sample you could zip it and upload it direcly to this wiki, I think? Last option is to commit it as a changeset to https://master.apis.dev.openstreetmap.org/ and share the changeset link (but you would need to create an account on there, it's separate from osm.org)
Martijn van Exel (talk) 18:38, 6 October 2022 (UTC)
I've added some to the git repo. link
- Xvtn (talk) 17:36, 12 October 2022 (UTC)
Combined Import - Addresses and Building Footprints
Since this is the first import I've been involved in, I was hoping to keep it simple. That's why I proposed the scope initially be limited to Cache Valley, and to just address points. Also, CV already has a pretty good portion of its building footprints traced manually by yours truly. (It's kind of relaxing for me so I don't mind the repetitive nature!) Would there be any major downsides to doing addr / bldg imports separately since they come from different datasets?
- Xvtn (talk) 17:55, 6 October 2022 (UTC)
Not at all, I agree you should keep it simple. Especially if there's a lot of manually traced buildings already!
Martijn van Exel (talk) 18:38, 6 October 2022 (UTC)
Naming Conventions, Tag Format
The translation script follows the conventions laid out Utah's naming conventions wiki page. Specifically, this seems like the relevant bit: "For addr:street=*, use the full name with expanded prefix and suffix: West 100 North, East 14600 South."
-Xvtn (talk) 17:55, 6 October 2022 (UTC)
Awesome! Martijn van Exel (talk) 18:38, 6 October 2022 (UTC)
Updates
Another question is how to deal with future address updates. The source dataset from UGRC gets updated quite frequently as Utah is growing rapidly and new addresses are assigned all the time. Would it make sense to keep a reference to the source record on each address point to facilitate future incremental updates?
- Martijn van Exel (talk) 19:36, 5 October 2022 (UTC)
Great point. It looks like there are a couple of potential candidates for unique identifiers that would allow skipping previously imported data. (The UGRC site links to this document describing the various fields.)
- OBJECTID: 6-digit number, this isn't mentioned in the document. We would want to be sure they are truly unique and static before using that, of course.
- UTAddPtID: Example "LOGAN | 263 E 870 N APT MAIN" - Seems like these would be unique, but some points need correction - Do you correct the ID or leave it?
- ParcelID: Probably a bad option since multiple addresses can fall on a single parcel
Also, if a new user mistakenly or intentionally removes/changes this ID, how do we prevent a new one being imported on top of it during the next update? Some sort of duplicate detection may be necessary (also relates to Greg Troxel's concerns with duplicates in Imports mailing list.)
-Xvtn (talk) 17:55, 6 October 2022 (UTC)
One other thought: How closely should we try to track the source dataset? For example, how to handle entries that are removed?
-Xvtn (talk) 18:01, 6 October 2022 (UTC)
It's definitely not *easy* to do incremental updates. My thought would be make a "best effort" by adding a unique identifier from the source and then use MapRoulette or similar to manually triage future updates from the UGIC source. If someone removed the source ID, the building would appear again, but it would be easy for a human to dismiss that duplicate. We could handle removals in the same manner, by having human mappers inspect it and hopefully be able to confirm based on aerial imagery. I suspect / hope the incremental updates would not be more than a few hundred or at most a couple thousand new / removed buildings at a time...
Martijn van Exel (talk) 18:38, 6 October 2022 (UTC)
After lengthy discussion with imports@openstreetmap.org list, it seems foreign keys of any kind are heavily discouraged. Therefore I've updated the plan and wiki page to reflect the new plan - see there. I really need to learn how to use MapRoulette - from all the praise I hear, you've done awesome work on that tool.