BigQuery dataset
Google created public dataset with OpenStreetMap data snapshot accessible from BigQuery. This dataset could be used to replace OverpassAPI to a certain extent. Google provides this data for free with limitation 1TB/mo of free tier processing.
Data should updat weekly but there's an issue with this. The GitHub project contains the Apache Beam-based ETL software used to load the database. Quality control concerns and bugs may be reported as GitHub issues.
The dataset contains:
- OSM primary datatypes (nodes, ways, relations)
- GDAL derived datatypes (lines, multilinestrings, multipolygons, other relations, points) using ogr2ogr
- Geofabrik semantic "Layered GIS" derived datatypes using SQL
- Both the OSM full history and weekly planet updates
To query OpenStreetMap Public Dataset one need to have Google Account and Google Console project.
How to setup Google Console project
Assume you have Google Account. Open console.cloud.google.com and login with Google username and password.
Press "Select a project" button in header:
In opened dialog click "New Project" button:
Google Cloud Project initialization takes some time. Wait for success notification. After that you can use this project for free.
How to query dataset
Open BigQuery UI, enter query in text area and press "Run" button.
Tables in Dataset
BigQuery tables are created using an ETL as illustrated in the project's GitHub repo.
Further there are two sets of tables.
- One set for all OSM historical data (history file, with tables prefixed by `history_`.
- One set for all OSM current data (planet file, with tables prefixed by `planet_`.
They can be grouped by ETL phase.
Phase 1 - OSM objects
Load the OSM native objects.
In addition to (nodes, ways, relations), changesets are also loaded.
In combination with the `history_layers` table, changesets can be used to analyze what *types of objects* were edited within a given time window.
Planet objects:
- `geo_openstreetmap.planet_nodes` - TODO
- `geo_openstreetmap.planet_ways` - TODO
- `geo_openstreetmap.planet_relations` - TODO
- `geo_openstreetmap.planet_chagesets` - TODO
History objects:
- `geo_openstreetmap.history_nodes` - TODO
- `geo_openstreetmap.history_ways` - TODO
- `geo_openstreetmap.history_relations` - TODO
- `geo_openstreetmap.history_changesets` - TODO
Phase 2 - GDAL objects
Convert OSM objects (nodes, ways, relations) to GDAL objects (points, lines, multilinestrings, polygons, multipolygons). Each object contains the original OSM id or way_id for reference.
Planet objects:
- `geo_openstreetmap.planet_features` - TODO
History objects:
- `geo_openstreetmap.history_features` - TODO (apparently missing, see relevant GitHub issue).
Phase 3 - Layered GIS
Use SQL-defined Layered GIS filters to add semantic information to GDAL objects. OSM id / way_id references are also present for linking back to original data.
Planet objects:
- `geo_openstreetmap.planet_layers` - TODO
History objects:
- `geo_openstreetmap.history_layers` - TODO
Pros and Cons of using BigQuery OSM dataset
Pros
- Could be faster than Overpass API
- More complex queries using powerful SQL
- Running queries from command line using bq cli
- Export to CSV, JSON, BigQuery table
- Could be used for analytics and basic statistics
- In addition to primary datatypes (nodes, ways, relations), data are also pre-processed to GDAL types using ogr2ogr.
- GDAL objects are typecast from OSM tags using Layered GIS spec for more compact and powerful queries.
Cons
- SQL is not designed to query OSM data and sometimes query looks more complicated than with Overpass QL
- No OSM map visualization of result set
- Limited support of GeoJSON
Query samples
Query 1: nodes with tag man_made=lighthouse
SELECT ST_ASGEOJSON(geometry) as feature_geojson
FROM `bigquery-public-data.geo_openstreetmap.planet_nodes`
WHERE ('man_made', 'lighthouse') in (select (key,value) from unnest(all_tags))
LIMIT 10
Alternate query:
SELECT ST_ASGEOJSON(geometry) as feature_geojson
FROM `bigquery-public-data.geo_openstreetmap.planet_nodes` planet_nodes, planet_nodes.all_tags all_tags
WHERE all_tags.key = 'man_made' AND all_tags.value = 'lighthouse'
LIMIT 10
Query 2: hospitals with no phone tag
Query map features with tag 'amenity=hospital' and no key 'phone'.
SELECT feature_type, osm_id, osm_timestamp, geometry
FROM `bigquery-public-data.geo_openstreetmap.planet_features`
WHERE ('amenity', 'hospital') IN (SELECT (key, value) FROM UNNEST(all_tags))
AND 'phone' NOT IN (SELECT key FROM UNNEST(all_tags))
LIMIT 10;
Query 3: multipolygons with more tags
Query multipolygons with tags boundary=administrative and admin_level=4.
SELECT feature_type, osm_id, osm_timestamp, geometry
FROM `bigquery-public-data.geo_openstreetmap.planet_features`
WHERE feature_type="multipolygons"
AND ('boundary', 'administrative') IN (SELECT (key, value) FROM UNNEST(all_tags))
AND ('admin_level', '4') IN (SELECT (key, value) FROM UNNEST(all_tags))
LIMIT 10;
Query 4: ways with tag 'highway' and arbitary value
Query ways with key 'highway'.
SELECT id, version, username, changeset, osm_timestamp
FROM `bigquery-public-data.geo_openstreetmap.planet_ways`
WHERE 'highway' IN (SELECT key FROM UNNEST(all_tags))
LIMIT 10;
Other version of the same query:
SELECT id, version, username, changeset, osm_timestamp
FROM `bigquery-public-data.geo_openstreetmap.planet_ways` planet_ways, planet_ways.all_tags as all_tags
WHERE all_tags.key = 'highway'
LIMIT 10;
Query 5: buildings with more than 5 levels
Query ways with key 'building' and 'building:levels' > 5.
SELECT id, version, username, changeset, osm_timestamp
FROM `bigquery-public-data.geo_openstreetmap.planet_ways`
WHERE 'building' IN (SELECT key FROM UNNEST(all_tags))
AND EXISTS (SELECT key FROM UNNEST(all_tags)
WHERE key = 'building:levels'
AND SAFE_CAST(value as INT64) > 5)
LIMIT 10;
Query 6: query over regexp
Select all shops opened 7 days a week using regexp over "opening_hours" field (Note: don't use this query on real project, "opening_hours" tag should be parsed other way).
SELECT feature_type, osm_id, osm_timestamp, geometry
FROM `bigquery-public-data.geo_openstreetmap.planet_features`
WHERE 'shop' IN (SELECT key FROM UNNEST(all_tags))
AND EXISTS (SELECT key FROM UNNEST(all_tags)
WHERE key = 'opening_hours'
AND REGEXP_CONTAINS(value, r'Mo-Su .+'))
LIMIT 10;
Query 7: query all bridges in Netherland
Select all features with tag "man_made=bridge" and filter by Netherlands boundary poly.
This sample uses WITH expression to find bounding polygon first.
-- Define bounding_area by selecting administrative boundary multipolygon. One can use hardcoded OSM relation ID here.
WITH bounding_area as (SELECT geometry from `bigquery-public-data.geo_openstreetmap.planet_features`
WHERE feature_type="multipolygons"
AND ('name:en', 'Netherlands') in (SELECT (key, value) from unnest(all_tags))
AND ('boundary', 'administrative') in (SELECT (key, value) from unnest(all_tags))
AND ('admin_level', '3') in (SELECT (key, value) from unnest(all_tags))
)
SELECT feature_type, osm_id, osm_timestamp, planet_features.geometry
FROM `bigquery-public-data.geo_openstreetmap.planet_features` planet_features, bounding_area
WHERE ('man_made', 'bridge') IN (SELECT (key, value) FROM UNNEST(all_tags)) -- Select features with 'man_made=bridge' tag
AND ST_DWithin(bounding_area.geometry, planet_features.geometry, 0) -- Filter only features within bounding_area
LIMIT 10;
Query 8: count number of bus stops in San Francisco grouped by network operator
Select all points with tag "highway=bus_stop". Group them by "network" tag and count items in each group.
-- Define bounding area of San Francisco
WITH bounding_area AS (SELECT geometry FROM `bigquery-public-data.geo_openstreetmap.planet_features`
WHERE feature_type="multipolygons"
AND ('wikidata', 'Q62') IN (SELECT (key, value) FROM unnest(all_tags))
)
SELECT count(*) AS stops_count,
(SELECT value FROM unnest(all_tags) WHERE key='network') AS bus_network -- Extract value of "network" tag
FROM `bigquery-public-data.geo_openstreetmap.planet_features` planet_features, bounding_area
WHERE feature_type = 'points'
AND ('highway', 'bus_stop') IN (SELECT (key, value) FROM UNNEST(all_tags)) -- Select bus stops
AND ST_DWithin(bounding_area.geometry, planet_features.geometry, 0) -- Filter only features within bounding_area
GROUP BY bus_network
ORDER BY stops_count DESC;
Resultset:
stops_count | bus_network |
---|---|
5900 | Muni |
120 | Muni;GGT |
84 | PresidiGo |
64 | AC Transit |
58 | GGT |
48 | Muni;SamTrans |
38 | SFRP |
... | ... |
Query 9: calculate length of ways with 'highway' tag in Japan
Calculate length and count of all roads in Japan grouped by type.
-- Define bounding area of Japan
WITH bounding_area AS (SELECT geometry FROM `bigquery-public-data.geo_openstreetmap.planet_features`
WHERE feature_type="multipolygons"
AND ('wikidata', 'Q17') IN (SELECT (key, value) FROM unnest(all_tags))
)
SELECT SUM(ST_LENGTH(planet_features.geometry)) AS highway_length,
format("%'d", CAST(SUM(ST_LENGTH(planet_features.geometry)) AS INT64)) AS highway_length_formatted,
count(*) as highway_count,
(SELECT value FROM unnest(all_tags) WHERE key='highway') AS highway_type -- Extract value of "highway" tag
FROM `bigquery-public-data.geo_openstreetmap.planet_features` planet_features, bounding_area
WHERE feature_type = 'lines'
AND 'highway' IN (SELECT key FROM UNNEST(all_tags)) -- Select highways
AND ST_DWithin(bounding_area.geometry, planet_features.geometry, 0) -- Filter only features within bounding_area
GROUP BY highway_type
ORDER BY highway_length DESC;
Resultset:
highway_length | highway_length_formatted | highway_count | highway_type |
---|---|---|---|
515544744.3 | 515,544,744 | 2651488 | residential |
462757205.9 | 462,757,206 | 2460587 | unclassified |
208945963.9 | 208,945,964 | 637984 | path |
132285333 | 132,285,333 | 441447 | tertiary |
86336294.03 | 86,336,294 | 312449 | track |
73082827.68 | 73,082,828 | 158325 | secondary |
60538681.24 | 60,538,681 | 117100 | primary |
59189791.17 | 59,189,791 | 488955 | service |
58793335.55 | 58,793,336 | 115185 | trunk |
35814200.76 | 35,814,201 | 368688 | footway |
... | ... | ... | ... |
Query 10: find highways withing bounding box
Find all ways in bounding box defined by four points:
-- Define bounding_area from polygon
WITH bounding_area AS (
SELECT ST_MAKEPOLYGON(ST_MAKELINE(
[ST_GEOGPOINT(47.3812, -21.9073), ST_GEOGPOINT(47.3812, -21.4249),
ST_GEOGPOINT(48.0734, -21.4249), ST_GEOGPOINT(48.0734, -21.9073)
]
)) AS geometry )
SELECT feature_type, osm_id, osm_timestamp, planet_features.geometry
FROM `bigquery-public-data.geo_openstreetmap.planet_features` planet_features, bounding_area
WHERE 'highway' IN (SELECT key FROM UNNEST(all_tags)) -- Select features with 'highway' key
AND ST_DWithin(bounding_area.geometry, planet_features.geometry, 0) -- Filter only features within bounding_area
LIMIT 100;
How to visualize with BigQuery
If you need to visualize huge amount of data with millions of geo objects you can do it with BigQuery:
Step 1: Prepare data
Select features (farmlands). Instead of each polygon use centroid because it's easier to work with points. Snap each point to grid. Count number of points at each grid node.
SQL Query |
WITH bounding_area as (SELECT geometry from `bigquery-public-data.geo_openstreetmap.planet_features`
WHERE feature_type="multipolygons"
AND ('wikidata', 'Q1019') IN (SELECT (key, value) FROM unnest(all_tags))
),
centroid as (
SELECT osm_way_id,
ST_ASTEXT(ST_SNAPTOGRID(ST_CENTROID(planet_ways.geometry), 0.05)) as centroid_point -- Snap to grid at 0.05°
FROM `bigquery-public-data.geo_openstreetmap.planet_features` planet_ways, bounding_area
WHERE planet_ways.feature_type in ('lines', 'multipolygons')
AND ('landuse', 'farmland') in (select (key,value) from unnest(all_tags)) -- Select farmlands
AND planet_ways.geometry is not null
AND ST_CONTAINS(bounding_area.geometry, planet_ways.geometry) -- Filter only features within bounding_area
)
select ST_GEOGFROMTEXT(centroid_point) as centroid_geometry, count(*) as num_features
from centroid
GROUP BY centroid_point
|
Step 2: Add to map
Now when we have points and number of features near that points we can add the data to map
Step 3: Adjust styles
Depending on features count we should adjust color and size of grid nodes: