osm-parquetizer
Jump to navigation
Jump to search
The project intends to provide a way to get the OpenStreetMap data available in a Big Data friendly format as Parquet (https://parquet.apache.org/).
Currently any PBF (PBF_Format) file is converted into three parquet files, one for each type of entity from the original PBF (Nodes, Ways and Relations).
Source Code: https://github.com/adrianulbona/osm-parquetizer
In order to get started:
git clone https://github.com/adrianulbona/osm-parquetizer.git
cd osm-parquetizer
mvn clean package
java -jar target/osm-parquetizer-X.X.X-SNAPSHOT.jar path_to_your.pbf
For example, by running:
java -jar target/osm-parquetizer-1.0.1-SNAPSHOT.jar romania-latest.osm.pbf
In a few seconds (on a decent laptop) you should get the following files:
-rw-r--r-- 1 adrianbona adrianbona 145M Apr 3 19:57 romania-latest.osm.pbf
-rw-r--r-- 1 adrianbona adrianbona 372M Apr 3 19:58 romania-latest.osm.pbf.node.parquet
-rw-r--r-- 1 adrianbona adrianbona 1.1M Apr 3 19:58 romania-latest.osm.pbf.relation.parquet
-rw-r--r-- 1 adrianbona adrianbona 123M Apr 3 19:58 romania-latest.osm.pbf.way.parquet
The parquet files have the following schemas:
node
|-- id: long (nullable = true) |-- version: integer (nullable = true) |-- timestamp: long (nullable = true) |-- changeset: long (nullable = true) |-- uid: integer (nullable = true) |-- user_sid: string (nullable = true) |-- tags: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- key: string (nullable = true) | | |-- value: string (nullable = true) |-- latitude: double (nullable = true) |-- longitude: double (nullable = true)
way
|-- id: long (nullable = true) |-- version: integer (nullable = true) |-- timestamp: long (nullable = true) |-- changeset: long (nullable = true) |-- uid: integer (nullable = true) |-- user_sid: string (nullable = true) |-- tags: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- key: string (nullable = true) | | |-- value: string (nullable = true) |-- nodes: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- index: integer (nullable = true) | | |-- nodeId: long (nullable = true)
relation
|-- id: long (nullable = true) |-- version: integer (nullable = true) |-- timestamp: long (nullable = true) |-- changeset: long (nullable = true) |-- uid: integer (nullable = true) |-- user_sid: string (nullable = true) |-- tags: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- key: string (nullable = true) | | |-- value: string (nullable = true) |-- members: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- id: long (nullable = true) | | |-- role: string (nullable = true) | | |-- type: string (nullable = true)
Alternatives to osm-parquetizer
openstreetmap_h3 high performance tool to transform PBF dump into Apache Arrow/PostGIS