TfL Cycling Infrastructure Database

From OpenStreetMap Wiki
Jump to navigation Jump to search

Transport for London (TfL) have created a database of cycling infrastructure, containing 240,000 assets, covering all of Greater London. This has been released as open data on 1st August 2019.

TfL CID - cycle track.png

This groundbreaking database contains key cycle infrastructure asset within Greater London, including assets on and off-carriageway.

A map of the data has been made available (see below).

TfL's wish was to conflate this database with OSM to make this data more accessible for the benefit of cyclists.

About the CID database

TfL’s official press release stated:

“The world’s first Cycling Infrastructure Database will be the most comprehensive database of cycling infrastructure ever collected in London. [...] TfL has amassed data on every street in London, cataloguing almost 146,000 cycle parking spaces, 2,000 km of cycle lanes and more than 58,000 cycle signs and street markings. This information will be released as open data alongside a new digital map of cycle routes, will make journey planning and cycle parking much easier, as well as offering valuable information to TfL and the boroughs for planning future investment in cycling.”

Each asset is accompanied by two photos illustrating it, which will considerably enhance the ability of OSM mappers to merge data in remotely.

The collected data is a snapshot in time ranging between January 2017 and May 2018. The data was professionally surveyed by a team of surveyors.

TfL is keen to make this available to the OpenStreetMap community under a compatible open license, to ensure maximum use of the CID. TfL is also potentially willing to consider tool development to help facilitate sensitive merging in of this data.

Goals

To conflate the TfL CID database with OSM  in order to add additional cycling related assets and give richer attributes for existing cycling assets within the Greater London area. The conflation process will make the detailed CID dataset more readily available to the OSM and cycling communities with the hope that it will benefit new and future cycle based users of Open Street Map, for example through improvements to cycle routing.

CID schema

TfL has published the CID Schema.

TfL have a version of the database which adds a further field which associates the asset feature with the relevant OSM Way nearby, using a GIS analysis.

Two images accompany each asset. These have been processed to meet data and privacy regulation.

Licensing

The OSMF Licensing Working Group have confirmed that they "believe that it is unproblematic to use this data in or as a source for OpenStreetMap", as noted in their minutes and as posted on talk-gb

Earlier detail:

TfL has made the data available under its open data license. This is the Transport Data Service, which is "based on version 2.0 of the Open Government Licence with specific amendments for Transport for London".

The OSMF Licensing Working Group has been contacted for their view on the compatibility of this license with the OSM Contributor Terms. We note the LWG's comments about Open Government Licence (OGL) based licences

Discussions during the summer have established the following, as noted on talk-gb:

  • The license is indeed that here: , which is based on Open Government Licence v2 with some changes.
  • The license now contains mention of containing Geomni UKMap data, as of 17th July 2019.
  • The data was collected by the surveyors using UKMap as a background map, and then checking was later performed using aerial imagery from the same supplier.
  • Geomni have confirmed they do not regard themselves as having residual data rights in the released data, because TfL "haven't simply copied features from our data".
  • There is no use of Ordnance Survey data at all.
  • TfL are happy with commercial / non-commercial use of the released data.

Work to Date

CycleStreets were commissioned by TfL to create a report aimed at facilitating re-use of this data within OSM. This was delivered to TfL on 18th November 2019 and may be available from them.

The deliverables of this report were:

  1. Establish a mapping between the CID schema and geography types and the OpenStreetMap tagging system and geography types.
  2. Review the TfL open data licence and provide recommendations on licence compatibility with OpenStreetMap in regards to adding the CID data into OpenStreetMap.
  3. Identify options (e.g. tools, other arrangements) whereby TfL can utilise crowdsourcing to keep the CID up to date without introducing licence restrictions which are incompatible with their own open data licence.
  4. Undertake a comprehensive review of existing specialist OpenStreetMap data import (conflation) and data collection/data update tools. Provide recommendations as to which tool(s) are most suitable and whether any further tool development is required.
  5. If further tool development is required, outline the scope and engage with the relevant parties to provide TfL an estimate of the range of potential cost.
  6. Commence community engagement and report initial findings to TfL. In particular, is the OpenStreetMap community supportive of the data being added and are they likely to engage with the process of adding and maintaining the data? This will give TfL a better view how to proceed (e.g. is it worth proceeding to tool development and will the tools be used by the community or should TfL plan for the tools to be used by paid mappers?)
    TfL CID - cycle parking.png

CycleStreets undertook a full analysis of the CID data and how each asset and field might be converted to OSM and invited comments from the community on the proposed mapping of CID fields <> OSM tags.

A demonstrator map, was created by CycleStreets and comments were sought on data quality and usefulness of this data from the OSM community. Analysis by CycleStreets was that the data is of excellent quality, and very suitable for conflation into OSM, to increase both comprehensiveness and metadata quality.

Demonstrator Map Link

Usage notes: The controls on the right of the map allow the different feature types to be selected. The OSM layer (available at zoom level 19+) also provides a live feed from the OSM API, to enable quick comparisons. The two photos of each asset are shown, which will be particularly useful for OSM to verify; all c. half-million photos have been cleared for GDPR purposes.

In February 2022, Sweco and GHD were commissioned by TfL to undertake a programme completing the migration of the CID to OSM.

Utilising a suite of scripts developed by CycleStreets to compare differences between the CID and existing OSM, the outstanding assets are conflated through manual validation and upload using JOSM.

Optimisations to this process have been identified for certain type of assets using the OpenStreetMap api. This has been assessed as low risk and determined to meet the 'acceptable usage' threshold, as it is only applicable for amending asset tags without geometry change. Necessary manual validation is carried out when inconsistent tag value is spotted.

The programme is coordinated closely with TfL who are also supporting through additional quality checks.

During July 2022, Sweco commissioned CycleStreets[1] for a small piece of work to resolve the remaining conversion definition issues in the Github repo. On behalf of CycleStreets, User:Richard will implement any determined changes in the conversion script.

This conflation effort was abandoned, with the work incomplete in January 2023.

Process

Import Data

Background

  • Data source site and Data license
  • Type of license: Based on version 2.0 of the Open Government License with specific amendments for Transport for London
  • OdbL Compliance verified: yes (see above)

OSM Data Files

  • OSM files are generated directly from CID JSON data then manually or semi/manually conflated with OSM (see below)

Import Type

  • One-time import
  • Majority of assets conflated using a manual process involving JOSM
  • Some simple assets to be conflated using a semi-automated script

Data Preparation

Data Reduction & Simplification

Only CID data that can be readily conflated with OSM will be imported. This means that the following CID asset types will not be conflated as part of this project:

  • Advance stop lines
  • Restricted points
  • Signage
  • Signals

Tagging Plans

CID attributes will be converted to OSM compatible tags as described on the project attribute conversion page

Changeset Tags

Changesets will contain conflated data related to a specific asset type and London Borough. The Comment tag will describe the feature type being conflated, e.g. “add traffic calming”. Changesets will be published from individual OSM accounts related to to the individual involved in the conflation process. Accounts have been created soley for and will only be used for the conflation of TfL CID data. These are:

Data Transformation

To transform the CID data to OSM, a Ruby script has been produced. This script is ran on a daily basis and generates candidate OSM entities to be manually inspected and conflated in JOSM. The basic process is as below:

  • Download current TfL CID data
  • Download current OSM data
  • Upload to a local PostgreSQL/PostGIS database
  • Apply a range of tests to compare CID assets with OSM data. Each asset type is then classified for manual conflation:
    • New – Not identified in OSM
    • To Check – Requires further manual checking against OSM
    • Full – Asset is matched to an existing OSM feature for the OSM features entire length
    • Partial – Asset is matched to an existing OSM feature for part of the OSM features length
    • Unmatched – Asset is not matched to an existing OSM feature

Data Transformation Results

The latest OSM output files from the script can be found here.

Data Merge Workflow

Team Approach

Initial conflation work will be conducted by an individual supported by GHD/Sweco/TfL. Once The process is suitably advanced, multiple resources will be brought in to assist conflation. All team members will take part in weekly reviews to consider the progress made and discuss any issues discovered during the previous weeks conflation process.

Process Workflow

Conflation to be conducted asset by asset

  • Obtain the latest osm.xml files from here
  • Import into JOSM
  • Obtain latest OSM data as a JOSM layer
  • Manually parse through CID features and determine if CID feature is either new, existing, or not relevant:
    • New Features - Copy feature from CID layer to OSM layer and inspect tagging for OSM compliance
    • Existing Feature – Check existing Osm tagging against CID and add additional or more detailed tagging where required and where OSM compliant
    • Not Relevant – Mark CID feature so will not appear in later CID exports

Changeset Size Policy

  • Changesets will be geographically local to a London Borough
  • Changesets will be limited to a maximum size of 100 conflated assets per commit

Modification to Process (November 2022)

It was recognised that there will be instances where there have been upgrades to cycle infrastructure in recent years and the survey data - which evidences the TfL CID database (captured in 2017-18) - may often no longer accurately reflect the current status. Feedback from the OSM community has highlighted that whilst there is a lot of valuable detail in TfL CID, the conflation of these assets should not overwrite edits to OSM that have been made since 2018 (as the latter are likely to have considered these infrastructure upgrades).

Therefore, the table below outlines the enhanced filtering that has been developed to ensure that the conflation meets these objectives.

Asset Class Previous Process Observations on Previous Process Update Approach Full Description of New Process Import Completed
cycle_parking The ruby-script readers have compared the TfL CID database against the current/existing OSM data.

This produces the following two categories of cycle assets which are candidates for import:

  • New, nearby: locations that are in the TfL CID, not in OSM, but have OSM parking nearby
  • New, isolated: locations that are in the TfL CID, not in OSM, and have no OSM parking nearby


These assets were then manually imported using JOSM

A sampling approach involving a practicable volume of site inspection determined the effectiveness of the ruby scripts in isolating cycle parking assets not already in OSM. This examined a range of buffers to reliably excluded TfL CID assets which were already in the OSM, but often at a slightly incorrect location. The optimal buffer is 30 metres. The outputs of the ruby-script readers will now be imported into QGIS for additional pre-filtering using the described buffer to maximise confidence in the final import.


As the cycle_parking dataset has a high volume of assets and each of these are single nodes not joined to ways, it is proposed that the pre-filtered dataset is suitably low risk to be imported using the OSM API. All scripts will be tested in the sandbox environment and made available for community review.

1. Source latest files:

2. Pre-filtering Using QGIS:

  • Load the layers into QGIS
  • Filter the OSM data: “Other Tags” LIKE ‘%bicycle_parking%’
  • Create a 30m radius buffer around these existing (in OSM) bicycle parking assets
  • Filter to those TfL CID assets we can be confident are not already in OSM, by selecting all parking_nearby and parking_isolated assets which are not within range of the buffered OSM assets – this subset are the ‘to-be-imported-assets’
  • Create a 350m square grid, a spatial aggregation which ensures that the volume of assets within any grid will not exceed 50 assets. Join the grid to a vector layer which contains London Ward detail, to add context for the changeset label
  • Spatially group the to-be-imported-assets using this 350m square grid
  • Export the to-be-imported-assets to CSV along with its grid id and London ward name(s) associated to each grid.

3. OpenStreetMap API Import using sandbox tested scripts

23-12-2022
traffic_calming The Ruby-script readers have compared the TfL CID database against the

current/existing OSM data, to produce the following three categories of assets which are candidates

for import:

  • On-road or otherwise very easily matched
  • On cycleways
  • Do not readily match to an existing OSM feature
A sampling approach was taken to determine the effectiveness of the ruby scripts in isolating speed bumps not already in OSM. This analysis included a practicable volume of site inspection and determined that a simple additional threshold supported confidence in the robustness of the identification of these ‘new’ speed bumps. These were:
  • The bump should be either be located on a way, or otherwise within 5 metres of a way

Edits by other OSM members since May 2018 may be overwriten using this approach.

The outputs of the ruby-script readers will now be imported into QGIS for additional pre-filtering based on the descibed condition. If an asset fails these conditions, further manual checks are required.
  • Where the CID asset is inferred to correspond to an existing OSM asset, the overpass API is used used to check if it has been edited since May 2018 - if so, it will be excluded from this import. If this is not the case, the API is used to adjust the OSM tag using the OpenStreetMap API
  • Otherwise assets will be imported into JOSM in batches, visually inspected, snapped to the nearest way and imported.
1. Source latest files:

2. Load the layers into QGIS

  • Filter the OSM data: “Other Tags” LIKE ‘%”traffic_calming”=>”hump”%’
  • Create a 1 metre spatial buffer around the road network ways - assets within are inferred to belong to the associate way
  • Create a 5 metre spatial buffer around the road network ways - assets within, which are associated to only a single way are inferred to belong to the associate way

3. For assets which the reader has associated with an existing OSM object, the OSM overpass API is queried to check to identify if the OSM object has been edited since 31st May 2018 (the end of the TfL survey period), if so, the asset is excluded. The remaining assets are conflated in OpenStreetMap API Import using sandbox tested scripts:

  • Compare TfL CID tags with existing OSM tags, if TfL CID tags are not in existing OSM tags list, append.

5. For assets which the reader has not associated with an existing OSM object, load these batches into JOSM:

Visually inspect, and ensure these are all snapped to their nearest way

  • Merge the batch assets into OSM live layer
  • Create filter to find the traffic calming assets on OSM layer that are from TfL CID database
  • Select all (ctrl + shift + A) the filtered items and use Tools -> Join Node To Way to join traffic calming assets to their nearest way
  • Visually inspect, and ensure these is no floating node
  • Commit changeset with descriptive changeset label
23-12-2022
Further asset classes to follow

Revert Plans

Existing checks on the quality of CID data and ensuring the conflation task is conducted by well trained and supervised staff should reduce the risk for reversion of commits. However, the need for reverting commits may be generated by:

  • Comments made by OSM users against commits (we will regularly check for comments)
  • Issues encountered during the conflation process
  • Issues identified during QA checks

Should the reversion of commits be required, the Reverter plugin within JOSM will be used on selected changesets

QA

Quality Assurance will be conducted by both GHD/Sweco and TfL during the lifetime of the conflation process:

  • A random sample of 5% of CID conflated features will be independently checked against OSM
  • Checking will confirm asset location and tagging matches original CID data
  • Checking will confirm that existing relevant OSM tagging has not been lost or overly simplified during conflation
  • Checking will confirm that conflation follows OSM guidelines for feature tagging

Feedback

Feedback is very strongly encouraged, as soon as possible in the repo. We are seeking to resolve feedback and questions flagged with the approach and process as quickly as possible.

Please do discuss the data and related aspects noted above on the talk-gb mailing list.

We are happy to provide any clarifications, which will be added to this page, as a central repository of information about the project.