User:Colin Marquardt/OSM Tech
Jump to navigation
Jump to search
Discussion in #osm in early December 2009:
22:34:11 jburgess * emarsden: I know some of the basics about quadtiles, like that they are used as a simple 2d index on the lat/lon data 22:34:18 _dodger_ * emarsden: Do you have any specific questions? I could point you to the code for the quadtree. 22:38:01 _dodger_ * emarsden: Did you get our answers before disconnecting? 22:40:28 emarsden * _dodger_: I did not 22:41:02 _dodger_ * <jburgess> emarsden: I know some of the basics about quadtiles, like that they are used as a simple 2d index on the lat/lon data 22:41:07 _dodger_ * <_dodger_> emarsden: Do you have any specific questions? I could point you to the code for the quadtree. 22:42:38 emarsden * OK, thanks. I take it that the quadtree is used to find the ways/nodes/relations in response to a map call? 22:43:07 zereb * emarsden: only the nodes. the ways and relations are found from linkage on the way_nodes and relation_members tables 22:44:08 jburgess * emarsden: they are only used for nodes & gps_points 22:45:25 emarsden * OK, so the geom VARCHAR field contains the quadtree information, which is a string of A/B/C/D equivalent characters? 22:45:46 jburgess * no, I think it is just a 32bit int 22:46:12 zereb * emarsden: the geom field is used in the mapnik rendering database. the node just has an integer field called tile. 22:46:51 emarsden * where can I find the schema? 22:47:12 _dodger_ * http://svn.openstreetmap.org/applications/utils/osmosis/trunk/script/contrib/apidb_0.6.sql 22:47:34 jburgess * see also http://wiki.openstreetmap.org/wiki/Database_schema 22:47:56 zereb * or even http://wiki.openstreetmap.org/wiki/Osmosis#DB_Schema 22:47:58 jburgess * though neither are guaranteed to be completely accurate (the definitive source is the ruby migrations) 22:48:43 jburgess * I believe 'tile' is the one which contains the quadtile info 22:49:46 jburgess * to generate the number, the world is first split into 65k x 65k xy grid. Then the bits of these two 16 bit numbers is interleaved to generate the 32 bit tile number 22:50:29 _dodger_ * Also called the Morton number http://en.wikipedia.org/wiki/Morton_number_(number_theory) 22:50:32 zereb * indeed. it contains a 32bit morton code created in this http://trac.openstreetmap.org/browser/sites/rails_port/lib/quad_tile/quad_tile.c 22:50:43 balrog-kun was just going to mention Morton numbers 22:51:04 _dodger_ * But all that has very little to do with PostgreSQL - and that's what you were initially asking about. 22:51:17 jburgess * then when we need to lookup the data in a given bounding box we calculate the (small) number of tile numbers covered and fetch the data with matching tile numbers 22:51:54 zereb * well, it turns out that a 32bit tile code lookup using a btree index is about twice as fast as using gist/rtree index on the points 22:52:09 emarsden * great information, thanks 22:52:52 emarsden * I was going to ask what the gbtree* functions in apidb.sql correspond to 22:53:30 _dodger_ * zereb: That's quite fast! I wasn't aware of that. I always thought that the R-Tree performed quite well. On the other hand do the morton/quadtile numbers not be recalculated/redistributed. 22:53:50 _dodger_ * +need to 22:55:55 zereb * _dodger_: exactly. and, for a fairly even distribution, they're extremely efficient 22:56:35 zereb * in fact, i think the reported numbers were for queries only, no updates going on. i think it has more to do with the amount of optimisation that postgres' btree implementation has had 22:57:27 zereb * emarsden: the gbtree* functions are used to implement the btree/gist index which is used on the changeset bounding boxes. i don't think they're used for the tile code lookup - that's just a vanilla btree 23:02:23 emarsden * and the database is sometimes returning > 10M tuples per second. Impressive! 23:03:41 zereb * it's averaging 2.73M over the past week, spiking to 195M http://munin.openstreetmap.org/openstreetmap/smaug.openstreetmap-pg_openstreetmap_stat_database.html 23:04:05 zereb * it's fast :-) 23:04:12 emarsden * I'm looking at the stats to try to determine the balance between reads & writes, but it isn't clear to me 23:05:51 zereb * looks like it's fetching about 3,500 times as many tuples as it's inserting, updating or deleting 23:05:57 emarsden * this is the ratio between "tuples fetched" + "tuples returned" and "tuples inserted/updated/deleted"? 23:06:32 zereb * i guess so. 23:08:00 zereb * the iostat says the disk array is averaging about 30k blocks/s read and about 3.4k blocks/s write 23:08:40 emarsden * so about 99% of reads are served from RAM? 23:09:18 emarsden * that's strange 23:10:02 zereb * it may not be that the tuples come from ram, there could just be more contiguous reads than writes 23:10:20 emarsden * ok 23:11:30 zereb * it does keep about 28Gb of cached data, so i wouldn't be surprised if most reads are served from ram 23:12:46 emarsden * where the entire database on disk is around 100Gb? 23:13:28 zereb * about 1.3Tb on disk 23:14:15 zereb * 1.23Tb, actually 23:14:56 emarsden * is that mostly history tables, or user diary spam, or? 23:16:07 elasticdog * does anyone know how much storage space a complete set of rendered Mapnik tiles will take up? (zoom 0-19) 23:16:19 _dodger_ * I'd guess that a big part of that comes from the GPS track points and the various indexes. But zereb will surely have a more detailed answer :) 23:18:11 zereb * elasticdog: over 4 exabytes, by a back-of-the-envelope calculation 23:18:25 elasticdog * I don't think that's right :-) 23:18:49 _dodger_ * elasticdog: That question was recently answered on one of the mailing lists (talk or dev) and someone said 240 GB but I have no idea how accurate that is. 23:19:12 bobkare * no way it's just 240GB down to z19 23:19:13 zereb * well, there are 4^n tiles at zoomlevel n, so sum(4^n, n=0..19) * 14kb (average tile size), which is 4.6 exabytes 23:19:21 jburgess * _dodger_: higher than that, the main tile server currently has over 500GB of tiles 23:19:52 dhansen * zereb: is an empty .png 14k? 70% of the worlds tiles will be solid blue :) 23:20:09 jburgess * I think the current average tiles size is closer to 1kB, so maybe in the 400TB region 23:20:19 zereb * dhansen: that's true - let's assume they're not stored and knock off 70% of that 23:20:20 _dodger_ * jburgess: Okay. I really have no idea - was just citing the mailing list. So elasticdog, ignore what I was saying, trust jburgess 23:20:25 zereb * so about 1 exabyte? 23:20:40 elasticdog * hehe...had this conversation with a co-worker yesterday, and we had ranges from very small to roughly 3 petabytes 23:20:48 jburgess * plus the compute power to render them all 23:21:21 elasticdog * I'm in the middle of a full render, but only up to zoom 12 23:21:43 zereb * jburgess: based on a 1kb average and dropping 70% water tiles, that's still 100 petabytes 23:21:43 zereb * jburgess: based on a 1kb average and dropping 70% water tiles, that's still 100 petabytes 23:21:50 apmon * elasticdog: Good luck with the exabyte ;-) 23:21:55 zereb * or have i dropped a power of 1024 somewhere? 23:22:28 zereb * yeah, i dropped one - 100 Tb 23:22:45 elasticdog * how are you calculating the full number of tiles? 4 ^ 18 ? 23:23:14 zereb * no, in ruby: (0..19).inject(0) {|s,x| s + 4**x } 23:23:30 jburgess * I make 2^19 * 2^19 * 1kB = 274TB 23:24:06 zereb * i then multiplied by 0.3 on the assumption that you wouldn't bother rendering or storing water-only tiles 23:24:21 elasticdog * but it's not smart enough to not render water tiles, right? 23:24:49 jburgess * don't forget that the 1kB number is including the current water tiles we have. Without them the average might be higher 23:24:51 elasticdog * I've found that each zoom level is roughly 3x the previous...so far, which I assuming is due to compression of the *.meta files? 23:25:36 apmon * elasticdog: metatiles are not compressed. They are just 64 .png files concatinated 23:25:59 jburgess * it is true though that as you zoom in you don't get much more data, it is just more spread out 23:26:05 apmon * The difference is I suspect that the higher the zoom, the simpler the tile. So it will compress in png form better 23:26:11 jburgess * so the higher zoom tiles probably do compress better 23:27:05 elasticdog * well, I'm hoping the real number will be in the 500 GB range 23:27:14 bobkare * also as zoom level increases there will be more blank tiles 23:27:46 jburgess * elasticdog: 500GB is possible, but you can not pre-render everything 23:27:51 elasticdog * it would be nice if it were smart enough to recognize all water tiles and render only one and have the rest point to it 23:28:14 elasticdog * jburgess: why not? 23:28:38 elasticdog * ohh...nevermind, you're saying that I can't prerender it all in under 500GB 23:28:41 jburgess * see the estimate above... rendering every tile gives you too much data 23:29:10 elasticdog * so, when doing a full-render, what level is realistic to cut it off at? 23:29:21 jburgess * but if you just render what people look at & cap the maximum cache size then you can handle a web site like OSM.org with 500GB 23:29:23 bobkare * osmarender stops as z17 23:29:39 jburgess * provided you can render on demand when someone requests a tile that is not cached 23:29:50 bobkare * but it takes loads of distributed renderers many weeks to render everything 23:29:52 jburgess * you maintain the illusion that you have everything 23:30:34 elasticdog * I've got a fairly standard mod_tile/renderd setup, but that won't exactly do it on demand, just queue it up 23:30:36 jburgess * bobkare: do you know how much space the osmarender tiles take in total? 23:31:03 bobkare * no, don't remember. i think i've seen some numbers floating around, probably on the ML 23:32:07 bobkare * but z17 is the cutoff point where it stops being sane to prerender 23:33:10 elasticdog * jburgess: by max cache, do you mean the max cache that are rendered on the fly, or total tiles cached? 23:33:30 jburgess * I was talking about the maximum stored on disk 23:33:43 elasticdog * is there a place to limit that amount? 23:34:19 jburgess * nothing automatic, but you could probably write some scripts to periodically purge old tiles 23:34:32 elasticdog * gotcha 23:34:42 zereb * emarsden: very roughly, the nodes table is about 130Gb, current_nodes is 56Gb 23:35:02 elasticdog * so I'll probably kill this thing after z16...thanks for the advice everyone 23:35:42 emacsen * Hey OpenStreetMappers... Anyone here know if there's a way to make josm output an osmchange file 23:35:49 emacsen * or any change delta? 23:36:09 jburgess * elasticdog: there is a table on the wiki showing how many tiles have been viewed for each zoom level... I'll see if I can find it 23:36:27 jburgess * elasticdog: http://wiki.openstreetmap.org/wiki/Tile_Disk_Usage 23:37:08 jburgess * so up to around z12 pretty much all the tiles get looked at, past that the ratio that actually get viewd drops very rapidly 23:37:32 zereb * emarsden: all in all looks like the current_* tables are about 330Gb and the history ones about 690Gb. but very roughly speaking... 23:38:07 apmon * zereb: So gpx tables aren't all that big? 23:38:10 emarsden * zereb: thanks 23:38:27 elasticdog * jburgess: THANKS! I've been looking for a page like that...great data to know 23:38:44 ris * so we've run out of space and I should stop mapping? 23:38:58 zereb * apmon: gps points is apparently about 100Gb