Super-relation/Implementation
Implementing super- and sub-relations ideally results in one or more trees or the flat representation thereof. As the nesting of relations is not restricted in Osm, there is no highest relation and any relation can contain primitive data as well as other relations, complete processing routines should contain error trapping.
Procedure
- Tree assembling: The first aim of processing super- and sub-relations is getting correct hierarchy trees. To order a given collection of relations, usually many trees 'grow', for a given relation it can be one tree or also more of them. If just the sub-tree of a given relation is examined, you get one tree.
- Division: When this is done, the 'wood' can be divided. A tree usually contains different relations. It can be split up to form trees of just same relations. That requires to know what is the same and what isn't. Without restriction, all tags of the relation count. Are they the same, they belong to the same tree. Depending on the purpose of the application, those relevant tags can be resticted. If you want to collect geometries of all relations and sub-relations containing wikipedia tags, just this tag makes a difference. In this step, sub-relations are copied, if they belong to more than one relation. This makes data redundant.
- Hierarchy-simplification: When this is also done, the trees can be simplified. That is, same relations as parent and child are merged together. Same applies to relations that contain a 'blank' child without any tags but still belonging to a parent that has them. And also child-relations that are foreign to the tree as they were copied in the step before, merge like children without tags. As these types are all that the trees in this step are made of, each tree can be collapsed to a single relation.
- Non-hierarchy simplification: Finally hierarchies are gone, but still same relations exist side-by-side, so to say. They can be merged in this last step.
Note not all of these steps need to be done for all purposes. Also simplification can be done before division without going on further. For a straightforward approach, also the tree establishing can be done without error correction, that is without closed loop detection mainly. Note also that relations in this context are nodes in hierarchy context and the relations between them are called connections here for clarity.
To illustrate the procedure described above and to proove possible algorithms against, there is a collection of examples with assumption and desired result.
Tree assembling
To form a tree and to eliminate closed loops is a graph problem. Data source are the graph-relations between graph-nodes, in this case connections between relations. Osmosis' relation_members table for example is a simple adjacency list (without transitive closure). Its an unstructured arbitrary graph representation with possible loops, which prevent the transformation to a tree. It connects graph-nodes locally to their neighbors aka parent- and child-graph-nodes, that is, their global position in the tree is not provided. Hierarchical queries and cycle detection can be done with these processing method:
They base on directed graph hierarchies represented in a list or database. The following list combines a specific storage method whith a specific implementation of the above processing methods:
- Adjacency list, can be processed with non standard SQL extensions like WITH RECURSIVE in SQL Server, Oracle and PostgreSQL 8.4 usually in a Breadth-first search.
- A tree encoding system (more in Troels' links, Bill Karwin's examples, Quassnoi's pgsql examples, Broersma's examples):
- Transitive closure tables (special adjacency list). The triggered way by Depesz.
- Path enumerations / materialized paths. Is enhanceable with pgsql's ltree.
- Nested sets. Invalid connections, like back edges, can be detected. Depth first search is related to nested sets.
- A combination of different storage- and processing-methods are programming language modules like Django treebeard. Incorporates 3 storage ways to choose from.
Example of Adjacency list using SQL extensions
A Breadth-first search to delete cycles.
- For each edge e
- For each subedge
- If start node (of e) found in the aggregated path == endnode
- Delete current subedge
- For each subedge and each of their subedges: repeat this procedure until no more subedges are found or endnode is in path (including start node (of e)).
- If start node (of e) found in the aggregated path == endnode
- For each subedge
Untested variation of the loop-prone query cycle detection query of the postgresql doku under the open PostgreSQL license, with many sub iterations:
WITH RECURSIVE search_graph (relation_id, member_id, path, hasCycle, isCycle) AS (
SELECT relation_id, member_id,
ARRAY[relation_id],
false, false
FROM relation_members g
WHERE member_role = 'R'
UNION ALL
SELECT g.relation_id, g.member_id,
path || relation_id,
g.member_id = ANY(path),
g.member_id = path[0]
FROM relation_members g, search_graph sg
WHERE g.relation_id = sg.member_id AND NOT hasCycle AND g.member_role = 'R'
)
DELETE FROM relation_members
WHERE relation_id, member_id IN
( SELECT relation_id, member_id FROM search_graph WHERE isCycle );
Finding a criterion to distinguish nested relations
Prior to division and simplification of any sort, a way to distinguish nested relations has to be found.
Example
Relation A Eurovelo6 name=Riverway ref=EV6 member Way 51 some way highway=cycleway v Relation A Dutch part of Eurovelo6. name=Riverway ref=EV6 member Way 56
Merge? possible. Sub-relations with same or no relevant tags have no additional information. Member way 56 can be treated like way 51.
Relation A Eurovelo6 name=Riverway ref=EV6 member Way 51 v Relation X Greek part of Eurovelo6 that is also used by Eurovelo12. no tags member Way 58
Merge? possible. Same. Member way 58 can be treated like way 51.
Relation A Eurovelo6 name=Riverway ref=EV6 member Way 51 v Relation B Spanish part of Eurovelo6. name=Riverway ref=EV6 wikipedia=es:EV6_in_España member Way 57
Merge? information lost or missinterpreted. Sub-relation has an additional tag valid in the subrelation only. Copying the way members to both relation gives 2 EV6 cycleways in Spain where there is just one. Merging them drops or missinterprets the wikipedia tag.
Relation A Eurovelo6 name=Riverway ref=EV6 member Way 51 v Relation B German part of EV6 is actually D6. name=Danubway ref=D6 member Way 72
Merge? information lost. Sub-relation holds information about a different cycleway. Merging the relation to the parent would drop the D6 cycleway, and just keep it as EV6, so copying the way members to both relations is an alternative for further processing, as well as leaving the structure might be.
So deciding if the relations in a super-relation construction belong together or not, that is, should live seperately or not, is important for division and simplification. If you are just interested in the ways corresponding to wikipedia-tags then the wikipedia tags make the difference, in contrast to comparing all tags like the name and ref tags in this example. Is the wikipedia tag in the sub-relation the same than in the parent or is it missing in the sub relation then merge, is it different divide. And for completeness, would the tag be missing in the parent relation, then the parent wouldn't be examined.
The hiking and cycling map Lonvia obviously uses the network tag to distuingish between parent and child relations that match or those that are different from each other, the later are differently labelled then.[1]
Algorithm
- Imagine two relations that can be combined to one and still fit your requirements. Imagine two that don't. Find the difference.
Division and simplification
Example
Now assuming a method for distinguishing is found then lets say matching relations are called with the same capital, like A.
Relation A' W1 v Relation A" W2 |
=> |
Relation A W1, W2 |
FrTo A'A" |
ReCl A'A A"A |
=> |
ClMem A A' A A" |
Ways or other content of two of the same kind can be merged, as stated above. Usually the parent relation A' has no or few way members, but you never know. The sketched tables are
- The many to many relations connection table FrTo with From and To rows from the tree assembling step, containing the edges in graph terminology.
- A table called ReCl here that adds classification to each relation. It can be an extension to an existing relations or graph-nodes table.
- A classification's members table, ClMem, listing their relation members, for now.
In this example:
- FrTo lists just one connection as there is just one arrow down from A' to A" in the graph.
- The relations both are classified as A in ReCl, hence the name of A' and A" in this example. Of course the real identifier of the relations is their id number and their classification is probably a combination of tags, like 'wikipedia=es:EV6_in_España' or a reference thereof.
- Result: Members of classification A in ClMem are of course relations A' and A".
Relation A W1 v Relation B W2 |
=> |
Relation A W1, W2 |
Relation B W2 |
FrTo A B |
ReCl A A B B |
=> |
ClMem A A A B B B |
Different are split like above.
Relation A' W1 v v Relation A" Relation A° W2 W3 |
=> |
Relation A W1 v Relation A W3 |
=> |
Relation A W1, W2, W3 |
Relation A Relation A W1, W2 W3 |
=> * |
Relation A W1, W2, W3 |
FrTo A'A" A'A° |
ReCl A'A A"A A°A |
=> |
ClMem A A' A A" A A° |
Both transformations possible, the * indicating non-hierarchical relation consolidation, no more hierarchy here.
Example with walkthrough
Relation A' W1 v Relation X W6 v v Relation C Relation A" W3 W2 |
=> |
Relation A W1, W6, W3, W2 |
Relation C W3 |
, |
FrTo A'X X C X A" |
ReClSubs A'A X,C,A" A"A X C,A" C C |
=> |
ClMem A A' A X A C A A" C C |
X has no relevant tags and gets dropped, but not its ways. Process to build up classification members table ClMem starts (and ends) at ReClSubs, the graph-nodes table.
- Start with classification A and member A'. A' and all subs should appear in the result. As it isn't there already since we just started, add A' together with the current classification A to ClMem. A' has 3 sub-relations as the tree data model applied in step tree assembling knows or as it can be found in a recursive lookup on 'FrTo'..
- It's first sub-relation is X. As they aren't there already, add X together with the current classification A to ClMem.
- Do the same for the rest, C and A". Go back to classification A in ClRel.
- Continue with classification A and member A". A" and all subs should appear in the result. As it's there already, don't add it to ClMem.
- A" has no sub-relations. Go back to ReClSubs.
- Continue with classification C and member C. C and all subs should appear in the result. As it isn't there already, add C together with the current classification C to ClMem.
- C has no sub-relations. Go back to ReClSubs.
- ReClSubs has no more classifications.
An extra connection from A' to C would be already removed in step tree assembling.
Algorithm
The same in general speech:
- Classify matching relations with the test criterion found in the step #Finding a criterion to distinguish nested relations.
- For each classifications (A, B, C).
- For each of their relations (various As).
- Collect the relation and all of their sub relations, using hierarchy scheme created in #Tree assembling. Avoid duplicates that arise. They always do because the trees overlap.
- For each of their relations (various As).
Alternative without a priori duplicates:
- Classify matching relations with the test criterion found in the step #Finding a criterion to distinguish nested relations.
- For each classification (A, B, C)
- Find the highest tree node(s) / relation(s). For each top tree node (various As)
- Collect the relation and all of their sub relations, using hierarchy scheme created in #Tree assembling.
- Find the highest tree node(s) / relation(s). For each top tree node (various As)
Now all the unique groups with their belonging relations are in place. A certain group need not just have belonging relations of the matching type but also 'blank' ones without relevant tags and 'foreign' ones with other tags if they are referred to as a sub relation. At least 'foreign' relations are mentioned somewhere else, at least in the group where they belong to. Thats why the collection or table is redundant.
- Find the collected relation's primitive members (ways etc.)
Example (continued)
As mentioned already, two relations 'side-by-side', without hierarchical connection, cannot be merged hierarchically, but still they are the same according to the match-criterion. So they can be merged non-hierarchically. Demonstration is done on a closed loop or what's left of it: A closed loop shouldn'd occur at this stage or the step before any more. Connections have been removed already. Hierarchies of relations connected to a closed loop are removed in the previous step. All that's left are single relations, that may match:
Relation A' W1 |
Relation A" W2 |
Relation C W3 |
=>* |
Relation A W1, W2 |
Relation C W3 |
FrTo |
ReCl A'A A"A C C |
=> |
ClMem A A' A A" C C |
If so, they can be merged in a * non-hierarchical relation consolidation, no more hierarchy here. This is also done by the algorithm above.