User:B1tw153/GNIS Imports
The State of GNIS Imports into OSM
(This page is a copy of a personal diary entry posted on 2024-06-05.)
Some History of GNIS Imports
GNIS is a database developed by USGS that contains information about the official, standardized names for geographic features in the US.
From 2008 to 2011, there was an effort to import basic data into the US map, including records from GNIS and records from TIGER and NHD that are cross-referenced to GNIS. You can see some of the history of these imports based on the tags they used.
The gnis:feature_id=* tag was typically used for imports of many types of GNIS records. The gnis:id=* tag was used for imports of nodes for Populated Places, i.e. place=city, town, village, hamlet, etc. The tiger:PLACENS=* tag was used for imports of civil boundaries from the US Census Bureau datasets and contained a GNIS Feature ID value. The NHD:GNIS_ID=* tag was used for imports of waterways and other hydrographic features from the National Hydrography Dataset and also contained a GNIS Feature ID value.
An effort by @watmildon in late 2023 normalized the various tags to use the gnis:feature_id=* key. You can see this in the drop in other tag usage and the bump in gnis:feature_id=* usage. However, it is easy to see that there have been no substantial new imports of records referring to GNIS since about 2011.
The imports from 2008 to 2011 did not bring all the GNIS records into OSM. The imports for some features were much more complete than others. And the GNIS database has not been static since 2011.
Updates to GNIS
USGS adds, updates, and removes records in GNIS all the time. New records are added when new features are created, such as when a new reservoir is built or a new municipality is incorporated. Records are updated when official names change, and records are corrected when there are errors in the data that are resolved by references to definitive sources.
Although GNIS actively maintains records for historical features that no longer exist (e.g. a summit removed by strip mining), USGS will remove records where there is no evidence that the feature ever existed. The most common case is when GNIS has two records that refer to the same feature. The data is consolidated in one record and the other record is removed.
USGS also made two big changes to GNIS data in the last few years.
In 2021, USGS made a major reorganization of GNIS data. Many record classes relating to man-made features were removed from GNIS. Notably, this included all of the records for buildings. The final version of the old data set containing these records was archived and is still available but is no longer updated. The current data set retains all the records for natural features but also includes civil boundaries maintained by the US Census Bureau and reservoirs included in the NHD data set.
In 2023, USGS published new names for some 650 features that previously used a derogatory term for Native American women in their names. I, along with a group of other mappers, manually updated all these features in OSM to reflect the revised names.
GNIS and OSM Today
Let's take a look at how much of the GNIS data was originally imported from 2008 to 2011. For this comparison, we're looking at features mapped in OSM versus the archived GNIS data set, which has all the same types of records as the original imports but also has 10 years of updates and corrections that happened after the imports.
Some of the classes were more thoroughly imported than others but no class was completely imported. Overall, about 45% of the GNIS records in the archived data set are mapped in OSM and 1,249,416 records were not imported. One of the biggest gaps is the Stream class, which covers waterway features in OSM. That's understandable because the GNIS and NHD data can't be imported directly into OSM without some manual editing.
Any effort to map additional features in OSM using GNIS data should be using the current data set -- not the archived data set -- since the records in the archived data set are now out of date. Here's how much of the current GNIS data set is mapped in OSM.
Only 42% of the current GNIS records are mapped in OSM and 566,328 of the current records are not mapped. Waterways are a big part of that gap, but other natural features like lakes, valleys, and springs are not well mapped.
Notably, only 27% of the civil boundaries in GNIS have been mapped in OSM. The GNIS records for civil boundaries are synched with TIGER boundary data from the US Census Bureau, so it looks like OSM is missing a whole lot of administrative boundary data.
As I mentioned above, GNIS gets updated all the time. There have not been any substantial new imports of GNIS data into OSM since 2011. But there have been 15,952 new GNIS records that were created in 2012 or later. These are new features that OSM hasn't kept up with.
Stale GNIS Data in OSM
Sometimes GNIS records are withdrawn, particularly when the record was a duplicate but sometimes when the feature never existed. There are 4,270 instances of withdrawn GNIS records that have been mapped in OSM.
An additional 6,255 elements in OSM have gnis:feature_id tag values that are not present in either the archived or current GNIS data sets. Many of these features also likely have IDs associated with records that have been withdrawn. Every one of these things should be corrected.
GNIS actively collects records of "historical" features, which are features that once existed but no longer exist. Since OSM is a map of things that are currently present, features that no longer exist should not be mapped in OSM. However, OSM has 7,271 historical GNIS features that no longer exist!
Some of these historical features could be correctly mapped with lifecycle prefixes, but it seems likely that many of these historical features should not be mapped in OSM at all.
The Future of GNIS and OSM
One key lesson of the GNIS imports into OSM is that the work doesn't end when the data is imported. Source data changes over time, and for very good reasons! Anywhere that OSM has imported data, we need to find ways to keep that data up to date with changes in the original source.
GNIS presents a huge opportunity for OSM in the US, but the scale of the tasks needed to bring OSM and GNIS into alignment is also huge.
- GNIS records not mapped in OSM: 566,328
- Historical GNIS features mapped in OSM: 7,271
- OSM features with invalid GNIS IDs: 6,255
- Withdrawn GNIS records present in OSM: 4,270
There is little opportunity to import missing GNIS records into OSM because many of the GNIS records lack the detailed geometry that OSM requires and because the standards for import quality are much higher now than they were in 2011. But where we have hundreds of thousands of features to map, or thousands of features that need review and correction, these tasks are also not practical for purely manual editing.
Instead, my hope is that projects like the recoGNISer will provide automated assistance to make manual editing simpler and faster.