OSMbin (file format)/version 1.0
This is version 1.0 of the OSMbin(file format)
status
This specification is complete!
This file-format requires features introduced with API v0.6!
changes to version 0.9
The following has changes since version 0.9 of this file.format:
- introduction of a properties-file contaning the version of the format
- introduction of version-numbers
- storing relation-ids in nodes and ways
- storing long attribute-values inside the .osm -file
The format consists of multiple files contained in a single directory:
osmbin.properties
This file contains CRLF-delimited name=value -pairs.
Currently only the following names and values are defined:
- "osmbin.version=v1.0\n" (or v0.9 for older versions)
attrnames.txt
This file consists of all attribute-keys in UTF8-encoding, delimited by "\n". The first key gets the ID Short.MIN_VALUE + 2 = -32766.
The + 2 is required because in *.obm the value
- Short.MIN_VALUE denotes an unused attribut-slot
- Short.MIN_VALUE + 1 denotes the continuation of an attribut-entry spanning multiple attribute-slots
Java-code of a reference-implementation
nodes.obm
This file stores fixed-size records of nodes.
Java-code of a reference-implementation
Layout of a record:
- nodeID [4 byte signed integer]
- nodeVersion [4 byte signed integer]
- latitude [4 byte signed integer used in OSM]
- longitude [4 byte signed integer used in OSM]
- attrID1 [2 byte short integer]
- attrValue1 [32 character string in 16-bit Unicode big-endian (64 bytes), padded to 32 characters with appended spaces ' ']
- wayID1 [4 byte signed integer]
- wayID2 [4 byte signed integer]
- wayID3 [4 byte signed integer]
- relationID1 [4 byte signed integer]
Semantics of nodeID:
- the nodeID Integer.MIN_VALUE (0x80000000) is used to denote an unused record
- this nodeID is authorative. If there is disagreement with the .idx or .id2 -files, then the .idx or .id2 -files are incorrect and need to be re-generated.
Semantics of attrID1:
- if attrID1 is Short.MIN_VALUE then the record stores no attribute.
- if attrID1 is Short.MIN_VALUE - 1 then the attribute-value is to be appended to the value of the last attribute
- attrValue1: All values 0 (beware, char has 2 bytes here as we are talking utf16) are to be removed. They are used to pad the value to the length of the attrValue1-field.
Semantics of wayID :
- the wayID Integer.MIN_VALUE marks an unused entry
Overall semantics:
- the sort-order of attributes and ways is application-dependend. No assumptions are to be made. However it is advised to store important attributes like 'highway' first.
- if the record has not enough size for all attributes or ways, the next record is used (this is marked by the next record having the same nodeID).
- if this situation arised while updating the node and the next record is not free, the node is moved to a new location in this file.
- if there is a discrepancy between wayIDs here and the nodeIDs in the ways-file. The ways-file is authorative and this entry must be corrected. (rule for repaiting broken files)
nodes.idx
This file stores an index of node-ID to record-number in nodes.obm.
We use an unbalanced tree of order 16. The record-format is as follows:
middle-node:
- recordNumber[32bit] of (current value>>4)
- recordNumber[32bit] of (current value>>4+1)
...
- recordNumber[32bit] of (current value>>4+15)
leaf:
- record-number of indexed OSM-Object 1 [32bit]
- record-number of indexed OSM-Object 2 [32bit]
...
- record-number of indexed OSM-Object 16 [32bit]
Notes:
- for the empty recordNumner the value Integer.MIN_VALUE is used.
- a record consisting of only Integer.MIN_VALUE marks the empty record
- A record is a leaf if and only if it has the depth of 32/4+1.
- The root-node has the recordNumber of 0 and is thus stored at the beginning of the file.
nodes.id2
(uses less storage-space then draft 1)
- We use a KD-Tree as an AB-Tree here.
- Each tree-node stores one node like in a KD-Tree;
- Each tree-node with an even depth (root=depth 0=even) stores children with the same or lower latitude as a left child an with a larger latitude as a right child.
- Each tree-node with an odd depth stores children with the same or lower longitude as a left child an with a larger longitude as a right child.
- If the tree is empty, it has not even a root-node.
- There is no separation of inner nodes vs. leaf-nodes.
- We have fixed-size records
- Latitude and Longitude of Long.MIN_VALUE denote an empty record.
Record-format:
- 4 byte latitude of the center
- 4 byte longitude of the center
- 2 byte - recordNumber in nodex.osm stored in this tree-node
- 2 byte - recordNumber in nodex.id2 of left child or Integer.MIN_VALUE
- 2 byte - recordNumber in nodex.id2 of right child or Integer.MIN_VALUE
ways.obm
This file stores fixed-size records of ways.
Layout of a record:
- wayID [4 byte signed long]
- wayVersion [4 byte signed integer]
- minLatitude [4 byte signed long as used in OSM]
- minLongitude [4 byte signed long as used in OSM]
- maxLatitude [4 byte signed long as used in OSM]
- maxLongitude [4 byte signed long as used in OSM]
- attrID1 [2 byte short integer]
- attrValue1 [32 character string in 16-bit Unicode big-endian (64 bytes)]
- attrID12[2 byte short integer]
- attrValue2 [32 character string in 16-bit Unicode big-endian (64 bytes)]
- attrID3 [2 byte short integer]
- attrValue3 [32 character string in 16-bit Unicode big-endian (64 bytes)]
- attrID4 [2 byte short integer]
- attrValue4 [32 character string in 16-bit Unicode big-endian (64 bytes)]
- attrID5 [2 byte short integer]
- attrValue5 [32 character string in 16-bit Unicode big-endian (64 bytes)]
- attrID6 [2 byte short integer]
- attrValue6 [32 character string in 16-bit Unicode big-endian (64 bytes)]
- nodeID1 [4 byte signed long]
- nodeID2 [4 byte signed long]
- nodeID3 [4 byte signed long]
- nodeID4 [4 byte signed long]
- nodeID5 [4 byte signed long]
- nodeID6 [4 byte signed long]
- nodeID7 [4 byte signed long]
- nodeID8 [4 byte signed long]
- relationID1 [4 byte signed integer]
Semantics of wayID:
- the wayID Long.MIN_VALUE is used to denote an unused record
- this wayID is authorative. If there is disagreement with the .idx or .id2 -files, then the .idx or .id2 -files are incorrect and need to be re-generated.
Semantics of attrID:
- attrID and attrValue have the same semantic as for nodes.obm
Overall semantics:
- the nodeID Long.MIN_VALUE marks an unused entry
- the sort-order of attributes and ways is application-dependend. No assumptions are to be made. However it is advised to store important attributes like 'highway' first.
- if the record has not enough size for all attributes or ways, the next record is used (this is marked by the next record having the same wayID).
- if this situation arised while updating the node and the next record is not free, the node is moved to a new location in this file.
- if there is a discrepancy between nodeIDs here and the wayID in the nodes-file. The this file is authorative and this entry of the node must be corrected.
ways.idx
This file stores an index of way-ID to record-number in ways.obm.
The structure is analog to nodes.idx .
relations.obm
This file stores fixed-size records of relations.
Layout of a record:
- relationID [4 byte signed long]
- relationVersion [4 byte signed integer]
- minLatitude [4 byte signed long as used in OSM]
- minLongitude [4 byte signed long as used in OSM]
- maxLatitude [4 byte signed long as used in OSM]
- maxLongitude [4 byte signed long as used in OSM]
- attrID1 [2 byte short integer]
- attrValue1 [32 character string in 16-bit Unicode big-endian (64 bytes)]
- elementID1 [4 byte signed long]
- elementType1 [4 byte signed long] (orginal of the v0.5-EntityType -enum)
- roleID1 [4 byte signed long] (stored like an attribute-name in attrnames.txt)
- elementID2 [4 byte signed long]
- elementType2 [4 byte signed long]
- roleID2 [4 byte signed long] (stored like an attribute-name in attrnames.txt)
- elementID3 [4 byte signed long]
- elementType3 [4 byte signed long]
- roleID3 [4 byte signed long] (stored like an attribute-name in attrnames.txt)
- elementID4 [4 byte signed long]
- elementType4 [4 byte signed long]
- roleID4 [4 byte signed long] (stored like an attribute-name in attrnames.txt)
notes:
- the relationID Long.MIN_VALUE marks an unused entry
- the elementID Long.MIN_VALUE is used to denote an unused record
- this relationID is authorative. If there is disagreement with the .idx or .id2 -files, then the .idx -file is incorrect and need to be re-generated.
- attrID and attrValue have the same semantic as for nodes.obm
- the sort-order of attributes and ways is application-dependend. No assumptions are to be made. However it is advised to store important attributes like 'highway' first.
- if the record has not enough size for all attributes or ways, the next record is used (this is marked by the next record having the same relationID ).
- if this situation arised while updating the node and the next record is not free, the node is moved to a new location in this file.
relations.idx
This file stores an index of relation-ID to record-number in ways.obm.
The structure is analog to nodes.idx .