Draft Geocoding Guideline

From OpenStreetMap Wiki
Jump to navigation Jump to search

Community Guideline – Geocoding

Background

When addresses or geographic locations from user input or third party databases are geocoded with OpenStreetMap data, what obligations apply?

Definitions: Geocoding, Geocoding Results, and Geocoder

“Geocoding”' refers to the process of querying a geo-database by entering a latitude/longitude pair to search for a corresponding place (city, state, etc.) or address; or entering an address, POI, or other place to search for a corresponding lat/long. The resulting response is a “Geocoding Result”. “Forward Geocoding” uses address data to query the map database and return a lat/long coordinate result. “Reverse Geocoding” uses a lat/long coordinate to query the map database and return a corresponding address result.

Geocoding Results can be latitude/longitude pairs (as typical in forward Geocoding Results), and/or full or partial addresses and/or point of interest names (as typical in reverse Geocoding Results). Latitude/longitude pairs may come from a “Direct Hit” -- in which case the data returned will exactly match the data of a feature in the geo-database used for geocoding -- or it may be an “Indirect Hit”, in which case the data is inferred or derived from other features, but does not directly match any feature in the database. The most common type of indirect hits are interpolated addresses.

For example: Suppose a Geocoding user queries “120 Main St, Anytown, Big State, USA” and there is a node in the geo-database for that address. A Geocoding Result consisting of the lat/lon of that node would be a Direct Hit. However, suppose instead the database contains nodes for 150 Main St and 110 Main St, but not 120 Main Street. A Geocoder might return a point in between in between the two known nodes as an estimate of the requested location. This point (an interpolated result) would be an Indirect Hit.

Geocoding Results may be stored (either permanently or temporarily) together with the external data used for querying.

A “Geocoder” is a software program that provides Geocoding functionality.

The Guideline

If a Geocoder uses the OSM database to produce Geocoding Results and the Geocoder is Publicly Used, then the OSM database is being Publicly Used under the ODbL.  Individual Geocoding Results are insubstantial database extracts: Individual Geocoding Results that are based on a Direct Hit contain an insubstantial amount of raw OSM data; Individual Geocoding Results that are based on an Indirect Hit contain no raw OSM data at all and only transformations of or inferences from OSM data.

Consequently, if:

1. no modifications (or only trivial transformations) have been made to the OSM database used by a Geocoder; and

2. the Geocoding Results are not used to create a new database that contains the whole or a substantial part of the original OSM database, then

the share-alike obligations of the ODbL are not triggered (per Section 4.4.b of the ODbL). If the Geocoder is Publicly Used, then the provider of the Geocoder (or the application incorporating the Geocoder, if it is part of a larger application) must provide attribution to OpenStreetMap as described in Section 4.3 of the ODbL.

Since individual Geocoding Results are insubstantial extracts, they may be stored and used together with other proprietary or third party data without having a share-alike impact on such other data, provided the Geocoding Results have not been aggregated to create a new database that contains the whole or a substantial part of the OSM database.

If Geocoding Results are used to create a new database that contains the whole or a substantial part of the contents of the OSM database, this new database would be considered a Derivative Database and would trigger share-alike obligations under section 4.4.b of the ODbL. For example, systematically reverse engineering the whole or a substantial part of the OSM database through Geocoding would result in creation of a Derivative Database and thus would trigger share-alike with respect to the Derivative Database. This limitation applies regardless of whether the systematic reverse engineering uses Direct or Indirect hits. (A sufficient number of indirect hits could in theory be used to infer OSM data, although the Geocoding Results themselves contain no such data.)

If Geocoding is performed using a Derivative Database (i.e., a modified version of the OSM database, rather than the unmodified OSM database), then Publicly Using the Geocoder will trigger share-alike obligations on the Derivative Database that was used to obtain the Geocoding Results. For the same reasons described above, share-alike obligations would not apply to the individual Geocoding Results themselves.

A collection of Geocoding Results is not a substantial extract of the OSM database provided:

1. only names, addresses, and/or latitude/longitude information are included in the Geocoding Results, and

2. the collection is not a systematic attempt to aggregate all or substantially all Primary Features of a given type (as defined in the Collective Database Guideline) within a geographic area city-sized or larger.

Furthermore, if only names are provided in Geocoding Results from OSM -- in particular, latitude/longitude information from OSM is not included in the Geocoding Results -- a collection of such results is not a substantial extract. A collection of Geocoding Results will be considered a systematic attempt to aggregate data if it is used as a general purpose geodatabase, regardless of how the original aggregation was accomplished.

“Names” in this context includes alternative names, identifiers, or translations regardless of the exact tag used. For example, while https://wiki.openstreetmap.org/wiki/Key:name generally provides names and alternative names, other tags such as ref and network (https://wiki.openstreetmap.org/wiki/Key:ref, https://wiki.openstreetmap.org/wiki/Key:network) may provide the name of a highway commonly referred to by its number,, and https://wiki.openstreetmap.org/wiki/Key:brand may provide commonly understood names, such as for a hotel chain.

Examples

(1) Searching on a non-OSM map

A map-based navigation application offers map search functionality. Users enter addresses and point of interest names in a search box, which are used to search OpenStreetMap for corresponding lat/long coordinates. Search query results are cached both server side and on-device for performance reasons and displayed on a non-OSM map.

The cached Geocoding Results do not trigger share-alike obligations because the cached Geocoding Results are an insubstantial extract or contain no OSM data. The application developer is, however, required to credit OpenStreetMap as described in Section 4.3 of the ODbL.

(2) Adding location names to photos

A mobile photo application uses the current device location (a latitude/longitude coordinate) to perform a Reverse Geocode search in OpenStreetMap to find a corresponding location name. The resulting place names (city, neighborhood, street name or POI name) are embedded into the photo images (e.g., in the form of JPEG headers).

The Geocoding Results do not trigger a share-alike obligations because the embedded Geocoding Results are an insubstantial extract. The application developer is, however, required to credit OpenStreetMap as described in Section 4.3 of the ODbL.

(3) Geocoding store locations

A geo-services vendor provides Geocoding services using OSM Data. A retail store operator uses the Geocoder to populate lat/long locations for certain locations in their database of store locations (Forward Geocoding).

The geo-services vendor is required to credit OpenStreetMap as described in Section 4.3 of the ODbL because it is Publicly Using the OSM database. The Geocoded Results added to the store location database are not subject to the share-alike requirements, and the store location database records themselves need not include attribution to OpenStreetMap because the Geocoding Results used are an insubstantial extract or contain no OSM data.

(4) Enriching an OSM-based geocoding database

A geo-services vendor provides Geocoding services using an improved OpenStreetMap Derivative Database through a public web API. The Derivative Database is Publicly Used, so the vendor must comply with the share-alike provisions of Section 4.2 of the ODbL.

(5) Using OSM-based Geocoding Results together with non-OSM Geocoding Results.

Users of a navigation application send an address search query to a cloud-based Geocoder. The Geocoder has access to two separate map databases, one of which contains solely OSM data. The other database contains non-OSM data. If the address is accurately found in the OSM database, the location is sent back to the navigation application. If the address is not found in the OSM database, then the other database is searched, and that result is returned. (The same example applies when the third party database is searched before the OSM database or when they are searched concurrently.) The OSM-based Geocoding Results are an insubstantial extract or contain no OSM data and thus do not trigger share-alike obligations and can be stored together with the non-OSM-based Geocoding Results with no impact on the non-OSM-based Geocoding results, so long as the aggregated collection of results does not contain the whole or a substantial part of the OSM database. The cloud-based Geocoder is, however, required to credit OpenStreetMap as described in Section 4.3 of the ODbL.


Draft Geocoding Guideline FAQ

Why did the LWG decide to take up work on a geocoding guideline?

As OpenStreetMap has matured and use of online mapping technology continues to grow, the major uses cases for OpenSteetMap data have expanded. A geocoding guideline will help keep OpenSteetMap’s legal guidance up to date with current uses.

When the OpenSteetMap community first adopted the Open Database License (OdbL) and the License Working Group drafted the early accompanying guidelines, we were primarily focused on visual maps. The license and accompanying guidelines made clear that users could make maps under any license so long as they shared data improvements back with the project. Our early guidelines also clarified questions like how non-ODbL data could be layered on OpenStreetMap-based maps (the Horizontal Layers guideline).

The basic licensing approach underlying the license and these guidelines was to protect the OpenStreetMap database itself, while enabling broad and flexible use of that database. This basic approach laid the foundation for broad adoption of OpenStreetMap and steady growth of the editor community. Numerous individuals, governments, companies, and non-profits have made their own OpenStreetMap based maps, while contributing underlying data back to the OpenStreetMap database.

Increasingly, however, map data is used for a variety of purposes beyond visual display. Search -- more technically referred to as geocoding -- is one of the most important of these purposes. Whether via Nominatim or one of the many other OSM-based geocoders, we have seen an increasing desire to use OpenStreetMap in this way. But the application of the ODbL to geocoding is substantially less clear than for visual maps.

We believe encouraging more use of OpenStreetMap benefits the project, and increasing clarity around the license encourages use. The more people who use OpenStreetMap, the more exposure we get, the more contributors we welcome into our community, and the better the map gets. Given the uncertainty about how the ODbL applies to geocoding, and the importance of this use case, we believe the time has come to provide legal guidance specific to this use case.


How did you draft the guideline?

We started with two basic goals:

1. Follow the text and spirit of the ODbL. This is a license clarification and interpretation, so it must be consistent with the license text, as well as the community goal of collaborative creation. This includes ensuring the protection of the core OpenStreetMap database.

2. Facilitate broad use of the database. We believe broad use of OpenStreetMap ultimately benefits the project and expands the community of contributors.

In addition, we considered the way commercial map databases (e.g. HERE, TomTom) tend to license their data, to the extent we were able to gather such information. Our goal, insofar as possible, is to make OpenStreetMap at least as permissive as commercial, proprietary databases.

What does the guideline say?

Please read it!

What’s the approval process?

The LWG has discussed the proposed guidelines at length and also shared them for initial thoughts (not a vote or formal feedback) with the Board.

Now, we’re sharing the guideline on the talk listserv to gather community feedback.

Following the community input, the LWG will decide on next steps. Those steps could include further revisions, further consultation, and/or a board vote. A board vote will be the final step before any guideline is approved.

Would this guideline allow someone to just export the database through geocoding, and thereby circumvent of ODbL?

No. The proposed guidelines says that geocoding using OSM to create a database that contains the whole or a substantial part of the contents of the OSM database results in a Derivative Database which must be shared. It emphasizes that attempts to systematically extract data at scale through geocoding results in a substantial extract or Derivative Database subject to ODbL.

How does the draft guideline relate to other existing guidelines?

The Geocoding Guidelines supplements the existing guidelines, providing guidance specific to geocoding and clearing up issues left ambiguous by prior guidelines.

It is most closely related to the Collective Database Guideline and the Substantial Guideline.

The Collective Database Guideline, which the OSMF Board approved last year, sets out circumstances under which OpenStreetMap data may be stored with other data without triggering share alike. In so doing, it defines the concept of “primary feature type”, which the draft Geocoding Guideline also uses.

The Substantial Guideline, which is older, describes certain circumstances under which extracts of OSM data would be “insubstantial” and therefore not trigger share alike. In particular, it sets (i) a specific quantitative safe harbor (100 features) below which extraction is always permissible, and (ii) a broader, but less specific protection for non-systematic exports. The draft Geocoding Guideline builds principally on the second category, providing additional clarity specific to the geocoding context.

The draft guideline restricts “systematic” aggregation, but couldn’t non-systematic geocoding also yield a large database of OSM material if the use is high volume? Is this a loophole?

While non-systematic geocoding uses could incidentally result in a large database, no loophole exists. The draft Guideline makes clear that any attempt to use such an incidental collection of OSM-data as a general map database would be treated as a systematic extraction. The draft Guideline states, “A collection of Geocoding Results will be treated as a systematic attempt to aggregate data if it is used as a general purpose geodatabase, regardless of how the original aggregation was accomplished.”

For example, suppose a taxi company collected the location of vehicle stops and reverse-geocoded the database to determine the names and locations of popular drop-off POIs for internal analytics and planning. That is not “a systematic attempt to aggregate all or substantially all Primary Features of a given type . . .within a geographic area city-sized or larger” so it is permissible without triggering share-alike obligations. If, however, the the company decided to resell this database as part of a general POI database to others, that would become a systematic attempt to aggregate many features across the city, and its database would likely be a Derivative Database subject to the ODbL.

The draft guideline permits even “systematic” aggregation at smaller than city scale. Why?

Below a certain scale, the line between systematic and non-systematic searches becomes difficult to discern. For example, depending on the configuration of a geocoder, a search for “pizza” centered on a small town might indeed return all pizzerias in that down. By providing a clear lower limit on the application of share-alike obligations, we can eliminate many disputes of this nature, without creating any meaningful threat to the goals of share-alike. While sub-city-sized data sets (e.g. all the pizzerias in a small town) may be of local use, they are unlikely to be broadly marketable assets or divert meaningful contributions from the main OpenStreetMap database. Moreover, the protection provided by the guideline for aggregation of sub-city scale geocoding results applies only where the data in geocoding results is limited to feature names and latitude/longitude information. It is not a backdoor to a general database export, even at sub-city scale.

Can you explain how the draft guideline handles attribution, and why?

As described above, our goals were (1) follow the text and spirit of the ODbL and (2) facilitate broad use of OpenStreetMap data. In addition, following OpenStreetMap’s general approach to attribution -- that users “should expect to credit OpenStreetMap in the same way and with the same prominence as would be expected by any other map supplier” [1] -- we also looked at attribution norms for commercial map databases.

Taking these three considerations into account, the basic approach adopted by the draft Guideline is as follows:

  • Geocoders that use OpenStreetMap data must credit OpenStreetMap
  • Applications that incorporate a geocoder must credit OpenStreetMap
  • A geocoded database need not maintain attribution attached to the database, provided it is not a Derivative Database

This approach follows the text of the ODbL. In our view, the most reasonable interpretation of the text and structure of the license as applied to geocoding is that geocoders are public uses of OpenStreetMap data, while geocoding results are small data extracts, which may or may not be substantial depending on the type and volume of data involved. The approach to attribution above flows directly from this interpretation.

In addition, this approach facilitates broad use of the database while protecting the database itself. It permits geocoded data to be used freely and with minimal friction, provided no substantial export is made. It also ensures attribution at the point OpenStreetMap data is used.

Finally, the approach in the draft guideline is similar to the terms of commercial providers. This supports our view that the draft strikes an appropriate balance between feasibility and brand awareness.

The alternative approach to attribution, which we previously considered, was declaring all geocoding results or collections of geocoding results to be Produced Works. Produced Works are a defined category in the ODbL that require attribution but not share-alike. Declaring all geocoding results or collections of geocoding results to be Produced Works sits uneasily with the text of the ODbL, however. In addition, since Produced Works are free of share-alike obligations, this approach could permit even worldwide, systematic collections of geocoding results to be used without share-alike. At the same time, declaring geocoding results to be Produced Works would impose a broad and impractical attribution requirement, covering even small databases and individual geocoding results. For example, even a single photographs tagged with the city in which it was taken would be covered under this interpretation.

To get around some of the problems with the Produced Work approach, we also considered whether there was a way to apply attribution requirements to “medium-sized” databases, while keeping the smallest geocoded databases free of all obligations, and attaching both attribution and share-alike to only large and systematic collections of geocoded results. In other words, to adopt a spectrum of obligations depending on the amount of geocoding. An example of a database that might be covered by attribution (but not share-alike) under this approach would be a database of store locations geocoded against OpenStreetMap. One might imagine requesting the database operator to keep an OpenStreetMap identification permanently associated with that database, and require OpenStreetMap attribution in any future use of that data, while forbearing from share-alike obligations.

While we recognize the appeal of such a sliding-scale approach, we ultimately did not see a path that was consistent with the text of the ODbL or widely practicable. We think it would be a stretch to try to try to read a category of “intermediary databases” like this into the Produced Works definition. Moreover, we believe this approach would substantially reduce the usability of OpenStreetMap for geocoding. Creating an additional definitional line between “small”, “medium” and “large/systematic” extracts would increase rather than reduce complexity, and, depending on the nature of the definition, add uncertainty around the obligations.

Notably, no commercial terms we have found impose an attribution obligation on databases geocoded using their data. Thus, attaching attribution obligations to geocoded databases themselves -- under either of the Produced Work approaches described above -- would require users to add and monitor a new layer of technical and legal protections around these database, unique to OpenStreetMap. We believe this would create a substantial obstacle to broader use of OpenStreetMap.