Automated edits/TTmechanicalupdates/Fix issue with duplicated inner polygons in Canada
Who
TomTom team using TTmechanicalupdates bot account.
The team can be contacted at OSM@tomtom.com.
Why
Phase 2
During the first iteration of the bot it was discovered that:
- there are cases which have 3 or more duplicated ways: initially our bot handled only duplicated ways (meaning exactly 2 duplicated ways) --> 151,339 issues were skipped in the first bot iteration
- the issue of "double inner polygons" is also coming from other sources, not only from CanVec --> 21,628 issues were skipped because of a source tag which didn't contain CanVec in the value. (This is further explained in the 'Discussion - Phase 1' section of this document.)
Still, these cases are incorrect and need to be fixed. So, we will slightly modify the code by adding functionality to solve such scenarios and run a second phase of the bot. We expect to automatically solve up to 80% of the above issues. (There might still be individual cases that cannot be fixed in a mechanical way).
Phase 1
Based on the Osmose Rule 1170 Class 1 "Double inner polygon" (the geometry of the multipolygon inner ring is duplicated: one is in a relation but without a tag and another has tags but is not part of the relation), we have detected approximately 570,000 such issues in Canada only.
Verification of the data showed that these issues are mainly caused by imports of the CanVec source. According to the CanVec OSM wiki documentation the issue is already known - it is mentioned there as "Duplicate land features" issue. Following the OpenStreetMap wiki: if the inner way represents something in itself (e.g., a forest with a hole where the hole is a lake), then the inner way must be tagged as such.
Examples
Phase 2
Example of what will be fixed in Phase 2:
The above example shows a case where there are more than 2 duplicated ways. Relation 3602683 has way 268798550 as an inner ring, which has exactly the same geometry as and shares the same nodes with inner ring way 268797546 of relation 3602591. There is also a duplicated way 268797439 sharing the same nodes, without an association to any relation. Ways which are part of the relations have no tags and the duplicating way has tags assigned.
Example of what will not be fixed:
Cases like this one won't be fixed because both duplicating ways 261328743 and 261328705 have different tags. It is impossible to determine in an automated way which of them is correct and which should be removed.
Phase 1
The first example shows the relation with ID=3656347. This relation has 116 members, where one way with ID=274712968 (hereinafter referred to as Inner Ring Way) has a duplicating way ID=274712970 (hereinafter referred to as Duplicating Way) which is assigned to the same nodes as the Inner Ring Way, but is not a member of the relation (ID=3656347). In addition, the Inner Ring Way has no tags assigned, and the Duplicating Way has assigned tags, which would suggest that the Duplicating Way should be a member of relation ID=3656347.
Other examples:
- Example without "source" tag: https://osmose.openstreetmap.fr/en/map/#item=1170&zoom=15&lat=65.18924&lon=-123.42072&level=3&tags=&fixable=
- Example from CanVec import (same source on way and on relation): http://osmose.openstreetmap.fr/en/map/#item=1170&zoom=17&lat=44.143906&lon=-80.737051&level=3&tags=&fixable=&issue_uuid=9b5e022e-c03b-f008-0d92-45cf50afbcec
- Example from CanVec import (different source on way and on relation → won't be updated in case we will go with the approach in which we are updating only ways/relations with same "source" value): http://osmose.openstreetmap.fr/en/map/#item=1170&zoom=17&lat=43.294247&lon=-80.402761&level=3&tags=&fixable=&issue_uuid=71119888-3738-f37a-417c-689aae2fbc26
Algorithm
Phase 2
The bot takes violations from Osmose (rule id 1170, class 1) as input data. For each violation, data from OSM is fetched and violations are verified one more time.
Violations with common way ids (separate violations in Osmose, with repeating way id) are grouped into one changeset.
The following verifiers are executed:
- all ways are closed,
- all ways have the same nodes (direction of way digitalization and starting node can be different),
- the duplicating way should have tags*,
- the duplicating way is not a member of any relation,
- inner ring ways should have no tags,
- inner ring ways are a member of only one relation (the one from the Osmose violation),
- optional: relation and duplicating way should have a required "source" tag (e.g., "source=CanVec 8.0 - NRCan")**.
*In case of multiple duplicating ways, these violations will be skipped.
**In the second iteration, the run will be performed without a source check.
When a violation is confirmed by the bot, data modification is performed.
Data modification is understood as:
Basic scenario:
- copying tags from the way, which does not belong to any relation to the way, which is a member of a violating relation,
- removing way which does not belong to any relation.
Three or more duplicated ways scenario:
- removing inner ring ways which belong to relations,
- assigning a duplicating way to those relations.
Phase 1
The bot takes violations from OSMOSE (rule id 1170, class 1) as input data. For each violation, data from OSM is fetched and violations are verified one more time.
The following verifiers are executed:
- both ways are closed,
- both ways have the same nodes (direction of way digitalization and starting node can be different),
- duplicating way should have tags,
- duplicating way is not a member of any relation,
- inner ring way should have no tags,
- inner ring way is a member of only 1 relation (the one from the Osmose violation),
- optional: relation and duplicating way should have a required "source" tag (e.g., "source=CanVec 8.0 - NRCan).
When a violation is confirmed by the bot, data modification is performed.
Data modification is understood as:
- copying tags from the way which does not belong to any relation, to the way which is a member of a violating relation,
- removing way which does not belong to any relation.
Link to the GitHub: https://github.com/tomtom-international/osm-bots/tree/main/bot-double-inner-ring
Test Run
Before running the bot on the whole of Canada, we will run the same automated updates on a smaller area. For this, we've selected Southwestern Ontario, where we have 800 cases logged by the Osmose rule.
Bot runs
Phase 2
Just as with Phase 1, not to overwhelm the system, the bot will be executed in multiple iterations.
Phase 1
To make sure that the system is not overloaded, we plan to run the bot in parts based on Osmose regions. Below you can see the proposed approach:
Order | Osmose region | Count of issues from Osmose |
---|---|---|
1 | canada_ontario_southwestern_ontario - TEST RUN | 812 |
2 | canada_quebec_montreal, canada_quebec_laval, canada_quebec_centre_du_quebec, canada_prince_edward_island, canada_ontario_golden_horseshoe, canada_quebec_estrie, canada_quebec_monteregie | 2176 |
3 | canada_quebec_chaudiere_appalaches | 1127 |
4 | canada_quebec_gaspesie_iles_de_la_madeleine | 2116 |
5 | canada_nunavut | 2561 |
6 | canada_quebec_bas_saint_laurent | 2662 |
7 | canada_ontario_eastern_ontario | 5194 |
8 | canada_british_columbia | 6082 |
9 | canada_quebec_lanaudiere | 6144 |
10 | canada_quebec_capitale_nationale | 6433 |
11 | canada_quebec_laurentides | 6806 |
12 | canada_ontario_northwestern_ontario | 7562 |
13 | canada_saskatchewan | 8798 |
14 | canada_quebec_outaouais | 9093 |
15 | canada_nova_scotia | 9133 |
16 | canada_yukon | 9646 |
17 | canada_new_brunswick | 10037 |
18 | canada_ontario_central_ontario | 10335 |
19 | canada_quebec_abitibi_temiscamingue | 11895 |
20 | canada_newfoundland_and_labrador | 19094 |
21 | canada_quebec_mauricie | 19518 |
22 | canada_ontario_northeastern_ontario | 23560 |
23 | canada_alberta | 39736 |
24 | canada_quebec_nord_du_quebec | 42628 |
25 | canada_quebec_saguenay_lac_saint_jean | 48749 |
26 | canada_northwest_territories | 63542 |
27 | canada_quebec_cote_nord | 70243 |
28 | canada_manitoba | 126317 |
Discussion
Phase 2
The initial phase of the bot was intended to fix data primarily coming from CanVec imports. The main reasons were the following:
- tag "source" containing "CanVec" represented the majority of corrupted cases (this is showed precisely with counts in the table in "Phase 1" section below)
- thanks to OSM users explanation (https://lists.openstreetmap.org/pipermail/talk-ca/2021-December/010185.html) and info stored on wiki pages (https://wiki.openstreetmap.org/wiki/CanVec#Issues_found_in_OSM) the team was sure those are indeed errors.
However, further analysis has shown that cases with other source values are also incorrect (as they are also logged in Osmose). Moreover, a great majority of them can be solved automatically as well.
Phase 1
The problem was found mainly in data coming from CanVec imports, but it was also verified on a sample that data without any source in the tag follow exactly the same pattern. Should we consider running the bot only on cases where we have data imported from CanVec? Or can data without any source be updated as well?
The values of "source" tag and their counts proposed in the first iteration of the bot (embolden below):
source of relation (tag: source=*) | count |
---|---|
NRCan-CanVec-10.0 | 299568 |
NRCan-CanVec-8.0 | 109732 |
NRCan-CanVec-7.0 | 102282 |
CanVec 6.0 - NRCan | 53093 |
NRCan-CanVec-10.0 + Bing aerial | 1103 |
NRCan-CanVec-10.0;NRCan-CanVec-8.0 | 187 |
NRCan-CanVec-10.0 + Bing aerial + DigitalGlobe | 166 |
CanVec_Import_2009 | 145 |
CanVec 4.0 - NRCan | 122 |
NRCan-CanVec-10.0 + Bing aerial + DigitalGlobe | 115 |
(blank) | 3214 |
Announcement and discussion was initiated on Canada mailing list (Dec 2021) - https://lists.openstreetmap.org/pipermail/talk-ca/2021-December/010184.html
Opt out
To opt out of this automated update, please write an e-mail (in English) to TTmechanicalupdates@groups.tomtom.com describing which area or source version should be excluded from the update scope and why.
When
Phase 2
Runs performed between 1 Feb 2022 - 4 Feb 2022.
Phase 1
Runs and further analysis performed between 27 Dec 2021 - 12 Jan 2022.
Test run
Test run (Southwestern Ontario) done on 13 Dec 2021.
Outcome
We will be populating this section as we are running the bot.
General summary (phase 1 and 2)
96% of the issues logged by Osmose at the beginning of December were fixed.
Total violations | Fixed violations |
---|---|
572148 | 554258 |
Phase 2 details
Scope: Canada
Start Date: 1 Feb 2022
General summary of second iteration:
Opened changesets | Total violations | Fixed violations | Found duplicates* | Fixed duplicates | Filtered out by verifiers** | Others rejected*** |
---|---|---|---|---|---|---|
4234 | 176384 | 160847 | 150012 | 139716 | 176 | 5649 |
Below you can see the results of the second bot iteration per region:
Run No. | Region | Opened changesets | Total violations | Fixed violations | Found duplicates* | Fixed duplicates | Filtered out by verifiers** | Others rejected*** |
---|---|---|---|---|---|---|---|---|
41 | canada_ontario_southwestern_ontario | 2 | 64 | 24 | 40 | 2 | 0 | 0 |
49 | canada_quebec_gaspesie_iles_de_la_madeleine | 4 | 63 | 56 | 28 | 24 | 0 | 4 |
50 | canada_new_brunswick | 61 | 1009 | 887 | 865 | 770 | 1 | 50 |
51 | canada_quebec_chaudiere_appalaches | 6 | 136 | 92 | 72 | 44 | 0 | 29 |
52 | canada_quebec_bas_saint_laurent | 10 | 149 | 83 | 72 | 62 | 1 | 55 |
53 | canada_british_columbia | 5 | 241 | 223 | 32 | 0 | 0 | 34 |
42 | canada_quebec_montreal | 1 | 6 | 5 | 0 | 0 | 0 | 0 |
43 | canada_quebec_laval | 1 | 9 | 9 | 0 | 0 | 0 | 0 |
44 | canada_quebec_centre_du_quebec | 3 | 28 | 27 | 8 | 6 | 0 | 1 |
45 | canada_prince_edward_island | 2 | 36 | 11 | 24 | 4 | 0 | 25 |
46 | canada_ontario_golden_horseshoe | 2 | 59 | 55 | 4 | 0 | 0 | 4 |
47 | canada_quebec_estrie | 3 | 99 | 74 | 24 | 4 | 0 | 25 |
48 | canada_quebec_monteregie | 2 | 38 | 18 | 24 | 4 | 0 | 20 |
54 | canada_newfoundland_and_labrador | 20 | 287 | 215 | 186 | 118 | 0 | 4 |
55 | canada_quebec_capitale_nationale | 12 | 349 | 320 | 108 | 88 | 0 | 10 |
56 | canada_ontario_eastern_ontario | 5 | 370 | 43 | 322 | 14 | 0 | 43 |
57 | canada_yukon | 10 | 376 | 279 | 328 | 250 | 0 | 19 |
58 | canada_nunavut | 10 | 394 | 374 | 388 | 370 | 0 | 12 |
59 | canada_quebec_lanaudiere | 13 | 591 | 552 | 60 | 44 | 1 | 22 |
60 | canada_nova_scotia | 22 | 906 | 493 | 426 | 138 | 0 | 195 |
61 | canada_saskatchewan | 29 | 1229 | 866 | 890 | 798 | 1 | 270 |
62 | canada_ontario_central_ontario | 24 | 1444 | 297 | 1246 | 114 | 0 | 16 |
63 | canada_ontario_northeastern_ontario | 223 | 6815 | 5708 | 6078 | 5146 | 4 | 171 |
64 | canada_quebec_abitibi_temiscamingue | 44 | 2042 | 1829 | 602 | 442 | 2 | 33 |
65 | canada_quebec_mauricie | 47 | 2128 | 1909 | 746 | 570 | 0 | 43 |
66 | canada_quebec_saguenay_lac_saint_jean | 102 | 2827 | 1284 | 2570 | 1214 | 0 | 182 |
67 | canada_quebec_cote_nord | 127 | 5539 | 3537 | 3407 | 2138 | 0 | 1099 |
68 | canada_quebec_laurentides | 126 | 6211 | 5973 | 204 | 64 | 1 | 15 |
70 | canada_quebec_outaouais | 143 | 7130 | 6964 | 206 | 46 | 0 | 4 |
71 | canada_quebec_nord_du_quebec | 475 | 16157 | 15821 | 15722 | 15722 | 0 | 118 |
72 | canada_northwest_territories | 619 | 21573 | 17303 | 20368 | 17282 | 162 | 1202 |
73 | canada_alberta | 692 | 31269 | 29133 | 29106 | 28916 | 2 | 1944 |
74 | canada_manitoba | 1389 | 66810 | 66383 | 65856 | 65322 | 1 | 0 |
*Found duplicates - cases were there are more than 2 duplicated ways
**Filtered out by verifiers - cases which are rejected because they are not passing all the algorithm criteria
***Others rejected - other reasons of rejection, usually incomplete data, rather not solvable in an automatic way
Phase 1 details
Scope: Canada
Start Date: 27 Dec 2021
General summary of the first bot iteration:
Opened changesets | Total violations | Fixed violations | Filtered duplicates* | Filtered out by verifiers** | Others rejected*** |
---|---|---|---|---|---|
8460 | 572148 | 393411 | 151339 | 21628 | 5770 |
Below you can see the results of the first bot iteration per region:
Run No. | Region | Opened changesets | Total violations | Fixed violations | Filtered duplicates* | Filtered out by verifiers** | Others rejected*** |
---|---|---|---|---|---|---|---|
1 | canada_southwesternontario (test run) | 21 | 1149 | 748 | 80 | 311 | 10 |
2 |
canada_quebec_montreal | 1 | 6 | 1 | 0 | 5 | 0 |
canada_quebec_laval | 1 | 10 | 1 | 0 | 9 | 0 | |
canada_quebec_centre_du_quebec | 5 | 211 | 183 | 6 | 21 | 1 | |
canada_prince_edward_island | 5 | 244 | 206 | 14 | 7 | 17 | |
canada_ontario_golden_horseshoe | 8 | 360 | 301 | 4 | 55 | 0 | |
canada_quebec_estrie | 14 | 675 | 577 | 24 | 71 | 3 | |
canada_quebec_monteregie | 13 | 672 | 598 | 28 | 36 | 10 | |
3 | canada_quebec_chaudiere_appalaches | 22 | 1127 | 991 | 68 | 48 | 20 |
4 | canada_quebec_gaspesie_iles_de_la_madeleine | 42 | 2116 | 2053 | 26 | 33 | 4 |
5 | canada_nunavut | 44 | 2560 | 2162 | 388 | 4 | 6 |
6 | canada_quebec_bas_saint_laurent | 52 | 2662 | 2550 | 72 | 20 | 20 |
7 | canada_ontario_eastern_ontario | 98 | 5194 | 4822 | 324 | 29 | 19 |
8 | canada_british_columbia | 245 | 6086 | 5845 | 16 | 223 | 2 |
9 | canada_quebec_lanaudiere | 121 | 6144 | 5551 | 60 | 511 | 22 |
10 | canada_quebec_capitale_nationale | 127 | 6433 | 6077 | 108 | 236 | 12 |
11 | canada_quebec_laurentides | 79 | 6806 | 585 | 206 | 5914 | 101 |
12 | canada_ontario_northwestern_ontario | 133 | 7562 | 5973 | 919 | 666 | 4 |
13 | canada_saskatchewan | 159 | 8798 | 7566 | 890 | 72 | 270 |
14 | canada_quebec_outaouais | 119 | 9093 | 1951 | 206 | 6930 | 6 |
15 | canada_nova_scotia | 174 | 9133 | 8170 | 460 | 368 | 135 |
16 | canada_yukon | 187 | 9644 | 9266 | 330 | 29 | 19 |
17 | canada_new_brunswick | 184 | 10033 | 8994 | 853 | 142 | 44 |
18 | canada_ontario_central_ontario | 181 | 10350 | 8886 | 1279 | 171 | 14 |
19 | canada_quebec_abitibi_temiscamingue | 220 | 11894 | 9851 | 622 | 1387 | 34 |
20 | canada_newfoundland_and_labrador | 379 | 19093 | 18806 | 186 | 97 | 4 |
21 | canada_quebec_mauricie | 376 | 19518 | 17388 | 748 | 1339 | 43 |
22 | canada_ontario_northeastern_ontario | 350 | 23560 | 16718 | 6090 | 581 | 171 |
23 | canada_alberta | 212 | 39730 | 8469 | 29106 | 211 | 1944 |
24 | canada_quebec_nord_du_quebec | 534 | 42627 | 26470 | 15940 | 99 | 118 |
25 | canada_quebec_saguenay_lac_saint_jean | 924 | 48749 | 45919 | 2574 | 74 | 182 |
26 | canada_northwest_territories | 864 | 63542 | 41789 | 20368 | 183 | 1202 |
27 | canada_quebec_cote_nord | 1356 | 70055 | 64442 | 3488 | 1545 | 580 |
28 | canada_manitoba | 1210 | 126312 | 59502 | 65856 | 201 | 753 |
*Filtered duplicates - cases were there are more than 2 duplicated ways
**Filtered out by verifiers - cases which are rejected because they are not passing all the algorithm criteria
***Others rejected - other reasons of rejection, usually incomplete data, rather not solvable in an automatic way
Test run details
Scope: Canada, Region: Canada_SouthWesternOntario
Date: 13 Dec 2021
Violations source date: 2021.12.14
Total Violations count: 812
Uploaded fixes: 748
Source verifier rejected: 20 (due to absence of source tag on relation and/or way)
Filtered out due to duplicated way id: 40
Incomplete data (inner and duplicated had tags): 4
Total opened changesets: 21 (114877687, 114877723, 114877771, 114877830, 114880554, 114912615, 114912679, 114912722, 114912759, 114912829, 114912911, 114912962, 114913049, 114913121, 114913138, 114966317, 114966347, 114966387, 114966424, 114966446, 114966479)
Total time of run: app 26 minutes
Additional comment: During first run execution we have encountered issues with more than 2 duplicated ways. Such situations shouldn't be fixed automatically (intention was to handle "double" polygons, other cases might require manual review), so we have modified the code for now to find and skip such cases.