Key:boundary/statistics
Following the debate on trac ticket 1332 I wanted to create statistics for the size of the objects (closed ways and relations in which the ways can be arrange to form one large way) for the different zoomlevels. Having done so I want to share the results with others.
Getting the data
To get the necessary data I went to Xapi and downloaded the relevant data using this little shell script:
for i in $(seq 1 10); do
wget -O admin_level_$i.osm "http://osmxapi.hypercube.telascience.org/api/0.5/*%5Badmin_level=$i%5D";
done
Determining way area
To determine the area I wrote a shell script which parses the osm file and handles two cases.
Closed ways
For closed ways I calculate the size using the formula for the area of an polygon from Wikipedia.
"Closed" Relations
For a relation I try the order the ways such that way i+1 has its first node where way i has its last and the last way ends with the same point the first way started with. If this is the case I call the relation "closed". For such an relation I create an temporary way by concating the nodes of the single ways and calculate the area using the same formula as above.
The Code
The code itself is partially copied from osmarender/perl and is available from http://www.petschge.de/osm/stats/areastats.pl. The raw output (before averaging) is available at http://www.petschge.de/osm/stats/admin_level_results.tar.bz2.
Averaging
The code discussed above prints one line for every closed way and relation. To make the data more useful I calculated average and standard deviation of the size for each admin level using the following short perl script:
#!/usr/bin/perl
my $areasum = 0;
my $areasquaresum = 0;
my $linecount = 0;
while (<STDIN>) {
$areasum += $_;
$areasquaresum += $_ * $_;
$linecount++;
}
my $average = $areasum / $linecount;
my $s = sqrt($areasquaresum / $linecount - $average * $average);
print "average is $average, stddev is $s\n";
The results
admin_level | # of closed ways | # of closed relations | average area | standard deviation of area |
---|---|---|---|---|
1 | 3 | 0 | 0.0069 | 0.0098 |
2 | 124 | 62 | 0.1 | 1.3 |
3 | 2 | 0 | 0.0079 | 0.0079 |
4 | 95 | 16 | 0.17 | 0.93 |
5 | 1 | 3 | 0.215 | 0.039 |
6 | 696 | 114 | 0.004 | 0.016 |
7 | 7 | 2 | 0.030 | 0.078 |
8 | 26206 | 1581 | 0.0004 | 0.0085 |
9 | 12 | 6 | 0.00038 | 0.00064 |
10 | 36 | 26 | 0.0008 | 0.0027 |
All areas are given in units of arc degres squared.
Discussion
A couple of points are apparent from the data
Few closed objects
Especially at low zoom level few objects are closed. This might be related to the unresolved situation of maritime borders. One relations for each object with low admin_levels collecting all the ways forming the border would really help
- Why not help resolve Maritime borders so we can get more closed objects? --Skippern 16:02, 23 February 2009 (UTC)
even / odd difference
even admin_levels are way more popular than odd values
high standard deviation
Possibly due to the low number of closed objects or some bug in my code the standard deviation is quite high.