User:BubbaJuice/PimaCountyAddressImport
Summary
This is a work in progress. Information is subject to change. This page talks all about a theoretical import of over 400,000 address points into Pima County, Arizona. If you have any questions on comments please post here.
License
The source of this are Pima County Parcels that are open source. In an email to the Pima County Information Technology Department they state:
"Hi [My Name],
Since we have established our Open Data portal, we have not yet had the chance to update our overall disclaimer/terms of service. By this email, we state that GIS data obtained from our Open Data portal can be used for all purposes deemed appropriate by the user, including commercial use. We are also providing the following disclaimer –
We give no warranty, expressed or implied, as to the accuracy, reliability, or completeness of the data. This disclaimer applies both to the direct use of the data and any derivative products produced with the data.
Any type of boundary, linear or point locations contained within this data, or displayed within this product, are approximate, and should not be used for authoritative or legal location purposes. Users should independently research, investigate, and verify all information to determine if the quality is appropriate for their intended purpose. If legally-defensible boundaries or locations are required, they should first be established by an appropriate state-registered professional.
The information contained in the data is dynamic and may change over time. It is the responsibility of the data user to use the data appropriately and consistent with the intent stated in the metadata.
Per A.R.S. 37-178: A public agency that shares geospatial data of which it is the custodian is not liable for errors, inaccuracies or omissions and shall be held harmless from and against all damage, loss or liability arising from any use of geospatial data that is shared.
Thanks!
-Steve"
This confirms that the data is open source and can be used commercially which is compatible with the OpenStreetMap Database license.
Source
First, I obtained the compressed GEOJSONL file from https://openadresses.io.
Preparing the GEOJSON
I uncompressed the file from the source using 7zip. I then used Notepad to convert the file from GEOJSONL to a regular GEOJSON. I did this by adding {"type":"FeatureCollection","features": [
to the first line and ]}
to the last line. I also had to add ,
to the end of each line except for the second to last one. I added the commas by using a simple find and replace. Now the file should be able to be opened in JOSM. I opened the file in JOSM and saved it after to have the file become in a nicer format. The file then can be opened again in JOSM and I removed the useless tags like hash
, unit
, district
, unit
, and id
. The tag addr:state=AZ was added to all of the points. postcode
was converted to addr:postcode=*, city
was converted to addr:city=*, street
was converted to addr:street=*, and number
was converted to addr:housenumber=*. Next, I fix up the street names. I first use find and replace in Notepad to convert directions of the street names to their long version. For example, "addr:street": "N
would be searched for and "addr:street": "North
would replace it. I did this for each of the directions. Next, I did the same but for the ending of the road name. For example, RD"
would be searched for and Road"
would replace it. I did this will all of the endings I could manually find in the file. All of the them that I can remember are listed:
- Street (ST)
- Road (RD)
- Highway (HY)
- Boulevard (BL)
- Court (CT)
- Avenue (AV)
- Stravenue (SV)
- Trail (TR)
- Drive (DR)
- Place (PL)
- Way (WY)
- Lane (LN)
- Circle (CI)
- Loop (LP)
- Path (PH)
- Parkway (PW)
- North (N)
- East (E)
- South (S)
- West (W)
Next, I make searched for MOUNT LEMMON
and replaced it with Summerhaven
. After all of this has been done we have to use the Notepad++ feature found in Edit>Convert Case to>Proper Case (alt+u). We will need all of the text selected (ctrl+a) for this to work. This makes all words in the proper case and finishes the proper casing of street names and city names. This will also break some of the GEOJSON file. We will need to edit line 3 to change Featurecollection
to FeatureCollection
. We will use find and replace to fix lines like these, "Type": "Feature"
, "Geometry": {
, "Type": "Point"
, "Coordinates": [
, "Generator": "Josm"
, "Features": [
, and "Properties": {
to "type": "Feature"
, "geometry": {
, "type": "Point"
, "coordinates": [
, "generator": "JOSM"
, "features": [
, and "properties": {
. After these are fixed you can fix all of the keys and the value of addr:state=* using find and replace or JOSM.
Fixing Address Points
Now we have our GEOJSON file that can be opened in JOSM and edited freely. All of the locations in the dataset are based upon addresses connected with parcels. Of course many of these parcels are undeveloped so these address points serve no use and are in the middle of the desert. To fix this we can go around looking at clumps of addresses and identify which ones are developed/not developed. Undeveloped ones should be deleted. Also many street parcels also have addresses connected to them which are incorrect and should be deleted. We also need to fix incorrect addr:city=* for Oro Valley and South Tucson. Many of the points are also not where the actual location of where the address would be. We can manually check out each address point and move it to its correct position in the approximate middle of the building or area of the area connected with it.
Remove Duplicate Addresses
Remove duplicate addresses from the local file that are already in the database. I am unsure on how to do this. Help would be appreciated.
All of the address in Pima County can be downloaded with the following overpass query:
[out:xml][timeout:90];
{{geocodeArea:Pima County}}->.searchArea;
(
nwr["addr:housenumber"](area.searchArea);
);
(._;>;);
out meta;
Merging Address Points with Buildings
I am planning on using the Conflate plugin in JOSM to handle nodes close enough to the building=house tag. We do this because buildings like sheds, etc. should not contain the addresses. Buildings that should be separate connecting ways can also be big ways causing only one of the addresses to be merged and be inaccurate. You can download all of the buildings in a separate layer using the following Overpass query:
[out:xml][timeout:90];
{{geocodeArea:Pima County}}->.searchArea;
(
nwr["building"](area.searchArea);
);
(._;>;);
out meta;
The settings of the plugin are for it to conflate within 30 meters. The reference is the address layer. The subject is the houses layer.
I will make a MapRoulette task to merge all other buildings with address points.
Download
Want to check out my local version the dataset? Check out this GitHub page.