Talk:Import/Orange County, California Buildings
Hi!
I was getting ready to import the same building and address dataset into OSM. How far have you gotten?
My examination of that data leads me to believe that the building outlines are rather poor and will require lots of clean up. For more on that, I created a GitHub project with description, observations and scripts. The project is at https://github.com/n76/OSM_OC_Buildings (I was just going to create an import page in the OSM wiki with the same write up when I discovered yours). --n76
Hi!
I am glad that you discovered this import page, and to hear that you are interested in working with the building data for Orange County. We have not gone very far yet with adding the buildings data to OSM so it is a good opportunity to coordinate. Our plan was not to do a 'bulk import' of the data but rather for OSM mappers to use new tools in JOSM and RapiD to add individual features with manual review and editing. I updated the import page to include a link to the new tools if you did not also see the ArcGIS Datasets page.
I have manually edited a few dozen features in Orange, CA area to test out the new tools in RapiD, and it was working well for me. I agree with your statements in the GitHub project that "each building needs to be examined for flaws individually prior to inclusion into OSM." We have done similar preparation work to what you describe (e.g. address formatting) to prepare the feature attributes for inclusion in OSM. You can preview and download this data the processed buildings link on this page if you want to check that out.
You might want to consider using the new JOSM plugin that has access to this data and determine if it would be an efficient way for you to continue mapping with these features. That is intended to make it easier for OSM mappers in the community to discover and use this type of existing GIS data for editing. It would be good to know if this works well for your purposes.
Thanks again for coordinating! --Dkensok (talk) 15:31, 28 July 2020 (UTC)
That would be the MapWithAI data. I've looked at that for the area around my home and don't see any of the address data or height information in it.
Would you have any issue if I setup my own import page and proceed with the workflow I outlined in the GitHub page? It seems that if/when people working with the RapiD data and I look at an area that there will be one of two conditions: That the area has had the data imported (in which case skip it) or the data has not yet been imports (in which case import the data). My first goal would be to get the data for the city I live in done so that is where I'd start. --n76
Yes, that would be the MapWithAI data. If you are not seeing any address or height info, I wonder if you are looking at the Microsoft Buildings or the Orange County, California Buildings, which is one of the new ArcGIS datasets. The Orange County buildings should have both the address and height info, so as in this graphic from RapiD. I suspect you are not using the latest development version of the MapWithAI plugin that references the new data.
However, I don't have an issue if you want to do a separate import of the data with the workflow that you described (e.g. for your city). I suspect the results would be similar. If other OSM mappers are using the Orange County Buildings data through the new MapWithAI tools in JOSM or RapiD, then they should see any features that you import and the tools should avoid adding duplicate features. --Dkensok (talk) 20:18, 28 July 2020 (UTC)
Volunteer Data Quality Review
Geometry
I'm seeing an unfortunate number of building polygons that overlap each other. I might filter these out, kick them onto another layer and resolve them manually. Or just omit them.
- - - We will review overlapping buildings further and determine a path forward. In the meantime, a mapper who runs across an overlap can decide whether to add the feature or not. --Jshimota (talk) 22:01, 7 August 2020 (UTC)
Shape__Area & Shape__Length
What are these tags used for? I couldn't find anything about them in your documentation.
- - - With regard to the Shape_Length and Shape_Area, those fields will not be used for tags. They are required fields for the geometry but should not be exposed through the OSM Editor tools that access this layer, nor should they appear in the popup for the live feature layer. --Jshimota (talk) 21:57, 7 August 2020 (UTC)
Addresses
- Many missing addr_city tags (empty or just whitespace).
- Many missing addr_housenumbers, and many of those do have addr_streets, which seems like a problem.
- A suspicious number of addr_street names are missing suffixes, I'd check a small sample of them and verify they are correct.
- Only 3 streets have any punctuation (Suspiciously low). I suggest searching (punctuation and capitalization agnostic) for nearby highways in OSM and adopting the punctuation from those features' names.
- Unit numbers have some oddities: 'FRNT', 'BLDG', 'OFC', 'PICO', 'STE', 'LBBY', 'AVE', 'RD', 'WAY'.... I think I found those missing street suffixes.
- You have some highly duplicated addresses, the worst being 230x '1201 West Valencia Drive Fullerton'. Duplicates are indicative of a problem. I would investigate a few instances and make a plan from there.
Blackboxlogic (talk) 02:34, 1 August 2020 (UTC)
- - - The many missing and suspicious address field values are missing in the source and not available. However the data has now been spatially joined to US Census populated place polygons and where the addr_city value was previously blank the Census city value has been added where available. This now standard processing step was not included at the time the Orange County data was processed. The addr_unit field has also been edited to remove any oddities. For the duplicate addresses of buildings, this will be reviewed keeping in mind that there are many cases (e.g. multi-building complex) where duplicate addresses are valid. --Jshimota (talk) 22:03, 7 August 2020 (UTC)