TfL Cycling Infrastructure Database
Transport for London (TfL) have created a database of cycling infrastructure, containing 240,000 assets, covering all of Greater London. This has been released as open data on 1st August 2019.
This groundbreaking database contains key cycle infrastructure asset within Greater London, including assets on and off-carriageway.
A map of the data has been made available (see below).
TfL's wish was to conflate this database with OSM to make this data more accessible for the benefit of cyclists.
About the CID database
TfL’s official press release stated:
“The world’s first Cycling Infrastructure Database will be the most comprehensive database of cycling infrastructure ever collected in London. [...] TfL has amassed data on every street in London, cataloguing almost 146,000 cycle parking spaces, 2,000 km of cycle lanes and more than 58,000 cycle signs and street markings. This information will be released as open data alongside a new digital map of cycle routes, will make journey planning and cycle parking much easier, as well as offering valuable information to TfL and the boroughs for planning future investment in cycling.”
Each asset is accompanied by two photos illustrating it, which will considerably enhance the ability of OSM mappers to merge data in remotely.
The collected data is a snapshot in time ranging between January 2017 and May 2018. The data was professionally surveyed by a team of surveyors.
TfL is keen to make this available to the OpenStreetMap community under a compatible open license, to ensure maximum use of the CID. TfL is also potentially willing to consider tool development to help facilitate sensitive merging in of this data.
Goals
To conflate the TfL CID database with OSM in order to add additional cycling related assets and give richer attributes for existing cycling assets within the Greater London area. The conflation process will make the detailed CID dataset more readily available to the OSM and cycling communities with the hope that it will benefit new and future cycle based users of Open Street Map, for example through improvements to cycle routing.
CID schema
TfL has published the CID Schema.
TfL have a version of the database which adds a further field which associates the asset feature with the relevant OSM Way nearby, using a GIS analysis.
Two images accompany each asset. These have been processed to meet data and privacy regulation.
Licensing
The OSMF Licensing Working Group have confirmed that they "believe that it is unproblematic to use this data in or as a source for OpenStreetMap", as noted in their minutes and as posted on talk-gb
Earlier detail:
TfL has made the data available under its open data license. This is the Transport Data Service, which is "based on version 2.0 of the Open Government Licence with specific amendments for Transport for London".
The OSMF Licensing Working Group has been contacted for their view on the compatibility of this license with the OSM Contributor Terms. We note the LWG's comments about Open Government Licence (OGL) based licences
Discussions during the summer have established the following, as noted on talk-gb:
- The license is indeed that here: , which is based on Open Government Licence v2 with some changes.
- The license now contains mention of containing Geomni UKMap data, as of 17th July 2019.
- The data was collected by the surveyors using UKMap as a background map, and then checking was later performed using aerial imagery from the same supplier.
- Geomni have confirmed they do not regard themselves as having residual data rights in the released data, because TfL "haven't simply copied features from our data".
- There is no use of Ordnance Survey data at all.
- TfL are happy with commercial / non-commercial use of the released data.
Work to Date
CycleStreets were commissioned by TfL to create a report aimed at facilitating re-use of this data within OSM. This was delivered to TfL on 18th November 2019 and may be available from them.
The deliverables of this report were:
- Establish a mapping between the CID schema and geography types and the OpenStreetMap tagging system and geography types.
- Review the TfL open data licence and provide recommendations on licence compatibility with OpenStreetMap in regards to adding the CID data into OpenStreetMap.
- Identify options (e.g. tools, other arrangements) whereby TfL can utilise crowdsourcing to keep the CID up to date without introducing licence restrictions which are incompatible with their own open data licence.
- Undertake a comprehensive review of existing specialist OpenStreetMap data import (conflation) and data collection/data update tools. Provide recommendations as to which tool(s) are most suitable and whether any further tool development is required.
- If further tool development is required, outline the scope and engage with the relevant parties to provide TfL an estimate of the range of potential cost.
- Commence community engagement and report initial findings to TfL. In particular, is the OpenStreetMap community supportive of the data being added and are they likely to engage with the process of adding and maintaining the data? This will give TfL a better view how to proceed (e.g. is it worth proceeding to tool development and will the tools be used by the community or should TfL plan for the tools to be used by paid mappers?)
CycleStreets undertook a full analysis of the CID data and how each asset and field might be converted to OSM and invited comments from the community on the proposed mapping of CID fields <> OSM tags.
A demonstrator map, was created by CycleStreets and comments were sought on data quality and usefulness of this data from the OSM community. Analysis by CycleStreets was that the data is of excellent quality, and very suitable for conflation into OSM, to increase both comprehensiveness and metadata quality.
Usage notes: The controls on the right of the map allow the different feature types to be selected. The OSM layer (available at zoom level 19+) also provides a live feed from the OSM API, to enable quick comparisons. The two photos of each asset are shown, which will be particularly useful for OSM to verify; all c. half-million photos have been cleared for GDPR purposes.
In February 2022, Sweco and GHD were commissioned by TfL to undertake a programme completing the migration of the CID to OSM.
Utilising a suite of scripts developed by CycleStreets to compare differences between the CID and existing OSM, the outstanding assets are conflated through manual validation and upload using JOSM.
Optimisations to this process have been identified for certain type of assets using the OpenStreetMap api. This has been assessed as low risk and determined to meet the 'acceptable usage' threshold, as it is only applicable for amending asset tags without geometry change. Necessary manual validation is carried out when inconsistent tag value is spotted.
The programme is coordinated closely with TfL who are also supporting through additional quality checks.
During July 2022, Sweco commissioned CycleStreets[1] for a small piece of work to resolve the remaining conversion definition issues in the Github repo. On behalf of CycleStreets, User:Richard will implement any determined changes in the conversion script.
This conflation effort was abandoned, with the work incomplete in January 2023.
Process
Import Data
Background
- Data source site and Data license
- Type of license: Based on version 2.0 of the Open Government License with specific amendments for Transport for London
- OdbL Compliance verified: yes (see above)
OSM Data Files
- OSM files are generated directly from CID JSON data then manually or semi/manually conflated with OSM (see below)
Import Type
- One-time import
- Majority of assets conflated using a manual process involving JOSM
- Some simple assets to be conflated using a semi-automated script
Data Preparation
Data Reduction & Simplification
Only CID data that can be readily conflated with OSM will be imported. This means that the following CID asset types will not be conflated as part of this project:
- Advance stop lines
- Restricted points
- Signage
- Signals
Tagging Plans
CID attributes will be converted to OSM compatible tags as described on the project attribute conversion page
Changeset Tags
Changesets will contain conflated data related to a specific asset type and London Borough. The Comment tag will describe the feature type being conflated, e.g. “add traffic calming”. Changesets will be published from individual OSM accounts related to to the individual involved in the conflation process. Accounts have been created soley for and will only be used for the conflation of TfL CID data. These are:
Data Transformation
To transform the CID data to OSM, a Ruby script has been produced. This script is ran on a daily basis and generates candidate OSM entities to be manually inspected and conflated in JOSM. The basic process is as below:
- Download current TfL CID data
- Download current OSM data
- Upload to a local PostgreSQL/PostGIS database
- Apply a range of tests to compare CID assets with OSM data. Each asset type is then classified for manual conflation:
- New – Not identified in OSM
- To Check – Requires further manual checking against OSM
- Full – Asset is matched to an existing OSM feature for the OSM features entire length
- Partial – Asset is matched to an existing OSM feature for part of the OSM features length
- Unmatched – Asset is not matched to an existing OSM feature
Data Transformation Results
The latest OSM output files from the script can be found here.
Data Merge Workflow
Team Approach
Initial conflation work will be conducted by an individual supported by GHD/Sweco/TfL. Once The process is suitably advanced, multiple resources will be brought in to assist conflation. All team members will take part in weekly reviews to consider the progress made and discuss any issues discovered during the previous weeks conflation process.
Process Workflow
Conflation to be conducted asset by asset
- Obtain the latest osm.xml files from here
- Import into JOSM
- Obtain latest OSM data as a JOSM layer
- Manually parse through CID features and determine if CID feature is either new, existing, or not relevant:
- New Features - Copy feature from CID layer to OSM layer and inspect tagging for OSM compliance
- Existing Feature – Check existing Osm tagging against CID and add additional or more detailed tagging where required and where OSM compliant
- Not Relevant – Mark CID feature so will not appear in later CID exports
Changeset Size Policy
- Changesets will be geographically local to a London Borough
- Changesets will be limited to a maximum size of 100 conflated assets per commit
Modification to Process (November 2022)
It was recognised that there will be instances where there have been upgrades to cycle infrastructure in recent years and the survey data - which evidences the TfL CID database (captured in 2017-18) - may often no longer accurately reflect the current status. Feedback from the OSM community has highlighted that whilst there is a lot of valuable detail in TfL CID, the conflation of these assets should not overwrite edits to OSM that have been made since 2018 (as the latter are likely to have considered these infrastructure upgrades).
Therefore, the table below outlines the enhanced filtering that has been developed to ensure that the conflation meets these objectives.
Asset Class | Previous Process | Observations on Previous Process | Update Approach | Full Description of New Process | Import Completed |
---|---|---|---|---|---|
cycle_parking | The ruby-script readers have compared the TfL CID database against the current/existing OSM data.
This produces the following two categories of cycle assets which are candidates for import:
|
A sampling approach involving a practicable volume of site inspection determined the effectiveness of the ruby scripts in isolating cycle parking assets not already in OSM. This examined a range of buffers to reliably excluded TfL CID assets which were already in the OSM, but often at a slightly incorrect location. The optimal buffer is 30 metres. | The outputs of the ruby-script readers will now be imported into QGIS for additional pre-filtering using the described buffer to maximise confidence in the final import.
|
1. Source latest files:
2. Pre-filtering Using QGIS:
|
23-12-2022 |
traffic_calming | The Ruby-script readers have compared the TfL CID database against the
current/existing OSM data, to produce the following three categories of assets which are candidates for import:
|
A sampling approach was taken to determine the effectiveness of the ruby scripts in isolating speed bumps not already in OSM. This analysis included a practicable volume of site inspection and determined that a simple additional threshold supported confidence in the robustness of the identification of these ‘new’ speed bumps. These were:
Edits by other OSM members since May 2018 may be overwriten using this approach. |
The outputs of the ruby-script readers will now be imported into QGIS for additional pre-filtering based on the descibed condition. If an asset fails these conditions, further manual checks are required.
|
1. Source latest files:
2. Load the layers into QGIS
3. For assets which the reader has associated with an existing OSM object, the OSM overpass API is queried to check to identify if the OSM object has been edited since 31st May 2018 (the end of the TfL survey period), if so, the asset is excluded. The remaining assets are conflated in OpenStreetMap API Import using sandbox tested scripts:
5. For assets which the reader has not associated with an existing OSM object, load these batches into JOSM: Visually inspect, and ensure these are all snapped to their nearest way
|
23-12-2022 |
Further asset classes to follow |
Revert Plans
Existing checks on the quality of CID data and ensuring the conflation task is conducted by well trained and supervised staff should reduce the risk for reversion of commits. However, the need for reverting commits may be generated by:
- Comments made by OSM users against commits (we will regularly check for comments)
- Issues encountered during the conflation process
- Issues identified during QA checks
Should the reversion of commits be required, the Reverter plugin within JOSM will be used on selected changesets
QA
Quality Assurance will be conducted by both GHD/Sweco and TfL during the lifetime of the conflation process:
- A random sample of 5% of CID conflated features will be independently checked against OSM
- Checking will confirm asset location and tagging matches original CID data
- Checking will confirm that existing relevant OSM tagging has not been lost or overly simplified during conflation
- Checking will confirm that conflation follows OSM guidelines for feature tagging
Feedback
Feedback is very strongly encouraged, as soon as possible in the repo. We are seeking to resolve feedback and questions flagged with the approach and process as quickly as possible.
Please do discuss the data and related aspects noted above on the talk-gb mailing list.
We are happy to provide any clarifications, which will be added to this page, as a central repository of information about the project.