Osmfilter version 0
osmfilter version 0 is a tool to provide an easy way of shrinking a .osm XML file in situations where you need only a few tags. It has proven useful to filter the OSM XML file before you import it into a PostgreSQL database with osm2pgsql. Although the result will be the same, you can speed-up the import process significantly, in particular because osmfilter is written in C and quite fast.
Note that there is a newer filter program (osmfilter version 1) which offers some more options and is easier to use if object dependencies are to be cared about.
Download
These Downloads are available:
- binary for Linux 32 bit
- binary for Linux 64 bit
- binary for Windows
- source code (newest version, might be a beta)
(As usual: There is no warranty, to the extent permitted by law.)
Program Description
This program operates as filter for OSM XML data.
Only sections containing certain tags will be copied from standard
input to standard output. Use the calling line parameter -k to
determine which sections you want to have in standard output.
For example:
-k"key1=val1 key2=val2 key3=val3" -k"amenity=restaurant =bar =pub =cafe =fast_food =food_court =nightclub" -k"barrier=" -K/ -k"description=something with blanks/name=New York"
Limitations: the maximum number of key/value pairs is 1000, the maximum length of keys and/or values is 100. The -t option invokes a test mode which prints a list of accepted search strings to standard output.
To suppress certain records, please use the -d option. For example:
-d"highway=path =footway =cycleway railway=rail"
All objects containing at least one of the mentioned values will be dropped, regardless of their being part of a relation which is not dropped. I.e., key/val pairs in the -d parameter overrule the pairs which have been defined in the -k parameter.
Considering Dependencies
To get dependent elements, e.g. nodes of a selected way or ways of
a selected relation, you need to feed the input OSM XML file more
than once. You need to do this at least 3 times to get the nodes of
a way which is referred to by a relation.
If you want to ensure that relations which are referred by other
relations are also processed correctly, you must input the file
a 4th time. If there are more than one inter-relational hierarchies
to be considered, you will need to do this a 5th or 6th time.
If you feed the input file into an osmfilter more than once, you must tell the program the exact beginning and ending of the pre-processing sequence. For example:
cat lim a.osm a.osm a.osm a.osm lim a.osm | ./osmfilter -k"lit=yes" >new.osm
where 'lim' is a file containing this sequence as a delimiter:
- <osmfilter_pre/>
If you have a compressed input file, you can use bzcat instead of. cat. If this is the case, be sure to have compressed the 'lim' file as well.
To speed-up the filter process, the program uses some main memory
for a hash table. By default, it uses 320 MiB for storing a flag
for every possible node, 60 for the way flags, and 20 relation
flags.
Every byte holds the flag for 8 ID numbers, i.e., in 320 MiB the
program can store 2684 million flags. As there are less than 1000
million IDs for nodes at present (Oct 2010), 120 MiB would suffice.
So, for example, you can decrease the hash sizes to e.g. 130, 12 and
2 MiB using this option:
-h130-12-2
But keep in mind that the OSM database is continuously expanding. For this reason the program-own default value is higher than shown in the example, and it may be appropriate to increase it in the future. If you do not want to bother with the details, you can enter the amount of memory as a sum, and the program will divide it by itself. For example:
-h1000
These 1000 MiB will be split in three parts: 800 for nodes, 150 for ways, and 50 for relations.
Because we are taking hashes, it is not necessary to provide all the
suggested memory; the program will operate with less hash memory too.
But, in this case, the filter will be less effective, i.e., some
nodes and some ways will be left in the output file although they
should have been excluded.
The maximum value the program accepts for the hash size is 4000 MiB;
If you exceed the maximum amount of memory available on your system,
the program will try to reduce this amount and display a warning
message.
Optimizing the Performance
As there are no nodes which refer to other objects, preprocessing
does not need the node section of the OSM XML file. Nearly the same
applies to ways, so the ways are needed only once in preprocessing -
in the last run.
If you want to enhance performance, you should take pre-filtering the
OSM XML file into consideration. Pre-filtering can be done using the
drop option. For example:
cat a.osm | ./osmfilter --drop-nodes >wr.osm cat wr.osm | ./osmfilter --drop-ways >r.osm cat lim r.osm r.osm wr.osm lim a.osm | ./osmfilter -k"lit=yes" >new.osm
If you are using pre-filtering, there will be no other filtering, i.e., the parameter -k will be ignored.
Example of Use
The project OpenGastroMap.de uses osmfilter to speed-up the database import. That makes it possible to run the application on a small virtual Internet server. Here are the details: OpenGastroMap/install#Tool_osmfilter.