Import/All the Places US data
The All the Places data import is a one-time import of All the Places data in the US onto existing OSM objects. All the Places scrapes data from companies' public directories and parses it into rough OSM tagging scheme. This method of information gathering was recently given an "it depends" green-light for use in OSM that appears to grant permission for use in at least the US, per this Licensing Working Group recommendation and this OSM Community thread. This import will not overwrite any data, including if it is incorrect or out of date; it will only add tags that are missing on the object but present in the reference data.
This import uses the Atlus API to parse raw address strings.
The import is currently (as of mid-February 2024) ongoing.
Goals
Primary goal:
- enrich existing OSM POIs in the US with additional information, such as address, website, and phone data
Secondary goal:
- identify locations of chains that are new and unmapped (to be added), or stale and shuttered (to be removed). I will put both of these types of non-matches into MapRoulette challenges for editors to investigate (see below)
Schedule
This import will be conducted in stages, and almost certainly will not include all All the Places data. I plan to work by consumer sector, starting with grocery stores, hotels, and other sectors if I'm feeling ambitious and things go well. Each import for a new sector will be advertised in the US OSM Community forum.
Import Data
Background
Data source site: https://www.alltheplaces.xyz/
Data license: https://creativecommons.org/publicdomain/zero/1.0/
Type of license: Creative Commons’ CC0-1.0
ODbL Compliance verified: yes, see Licensing Working Group recommendation
Import type
This will be a rolling, one-time import. Any non-matches will be uploaded as separate challenges to MapRoulette (see below) for users to check individually. I will seek to maintain ref
values to facilitate future semi-automated syncing if such tooling becomes available.
Data preparation
Data prep will primarily be handled by a custom Python script as well as JOSM validation rules that the community has developed. These tools will split the address field appropriately (usually in All the Places as a unified addr:street_address
or addr:full
tag), fix unexpanded street directions and types, format phone numbers, check ref
and image
tags, fix improper name and street capitalization, and other QA processes. The files will be saved and processed in the GeoJSON file format, as that is how All the Places provides them and that format is easier to manipulate natively in Python.
Tagging plans
How I handle tags will depend on the All the Places spider, as each scrapes different content and each brand provides data of differing quality. I will always defer to existing OSM data, except for overwriting brand:wikidata=* information when it is incorrect (some large hotel chains, for example, are tagged with the parent company's Wikidata object). I will provide greater detail for each brand I plan to import below. All the Places-specific tags, like @spider
and nsi_id
will always be removed.
Changeset tags
I will upload changesets from my standard OSM account, as this import should not result in huge changesets (by number of objects changed).
whammo (on osm, edits, contrib, heatmap, chngset com.)
Key | Value |
---|---|
comment | [description of the type of information conflated] |
import | yes |
source | All the Places |
source:url | https://www.alltheplaces.xyz/ |
import:page | Import/All the Places US data |
source:license | CC0-1.0 |
Data transformation and cleaning
Most of the data transformation and cleaning will be handled by a custom Python script, available to review here. In addition, ad-hoc data cleaning will be performed as needed, for example to split out branch
information from the name
, or remove ref
information from the branch
, using both one-time Python code and JOSM validation rules.
Workflow
Team approach
I plan on taking this import on alone, but if anyone is interested in helping, please let me know. Community assistance with the MapRoulette challenges for the non-matching objects will be critical, as that will not be my focus.
Uploading process
All upload steps will be taken in JOSM after initial cleaning in Python unless stated otherwise. I will work brand by brand within the same JOSM layer, and upload changes after finishing several brands from the same market segment (all supermarkets, all hotels, etc.).
- Overpass query (sample) and download based on the primary tag and the brand:wikidata=*, within the US. I will also download objects that may be lacking a brand:wikidata=* tag with a narrow query based on regex of the brand name along with the primary tag.
- JOSM Conflation plugin to find matches (with distance threshold of 500-1000m depending on the concentration of locations), and add data tag by tag, to ensure there are no conflicts and no overwriting of existing data.
- Merge "Reference only" non-matches into a separate GeoJSON file for upload to a MapRoulette challenge.
- Mark "Subject only" non-matches with a descriptive
fixme:atp
comment that will feed into a MapRoulette challenge as an Overpass query.
- Address lingering validation errors.
- Upload in regional chunks (maybe six to eight across the US grouped regionally to keep changeset sizes manageable).
MapRoulette challenges
Potentially missing POIs
The MapRoulette challenge for POIs that exist in All the Places data but not in OSM is available here. Please use the provided data, including the website if there is one, and/or local knowledge to verify whether the POI actually exists or if it is just a location that has since closed or is "coming soon."
Because this is a cooperative challenge where the data is pre-processed for users' use, users can only make edits in JOSM.
Check the state-by-state status of this MR challenge at the atp-count GitHub page.
Potentially stale POIs
The MapRoulette challenge for POIs that exist in OSM but not in All the Places is available here. Please use your research skills, including any tags on the object, and/or local knowledge to research whether the location is still operational. If it is, please remove the fixme:atp
comment, which is what MapRoulette uses in an Overpass query to build the challenge. If it isn't, please remove the POI from the database. This challenge works in any standard OSM editor, including JOSM, iD, and Rapid.
Tagging scheme by brand
This import will not touch keys for which there is already a value. As such, if an object has incorrect or out-of-date data on it, it will remain that way. The following schema and data will only come into play if an object does not already have a value for the given key
All cleaned reference data is available in the project's Github repository.
Grocery
A topic on the community forum about grocery store data was posted on 29 January 2024 and can be found here.
Albertsons
The file for Albertsons contains POIs for 17 different brands under the Albertsons umbrella, the largest of which is Albertsons. Removed amenity=fuel and amenity=pharmacy POIs.
ATP | example | processed | example |
---|---|---|---|
@spider | albertsons | delete | |
addr:city | Easton | Easton | |
addr:country | US | delete | |
addr:street_address | 210 W Marlboro Ave | addr:street | West Marlboro Avenue |
addr:housenumber | 210 | ||
addr:unit | |||
addr:postcode | 21601 | 21601 | |
addr:state | MD | MD | |
brand | ACME Markets | ACME Markets | |
brand:wikidata | Q341975 | overwrite | Q341975 |
image | https://dynl.mktgcdn.com/p/... | delete | |
name | ACME Markets | ACME Markets | |
nsi_id | -1 | delete | |
opening_hours | Mo-Sa 07:00-22:00; Su 06:00-22:00 | Mo-Sa 07:00-22:00; Su 06:00-22:00 | |
phone | +1 410-822-7073 | +1 410-822-7073 | |
ref | https://local.acmemarkets.com/#5603075 | delete | |
shop | supermarket | supermarket | |
website | https://local.acmemarkets.com/md/easton/210-w-marlboro-ave.html | https://local.acmemarkets.com/md/easton/210-w-marlboro-ave.html |
Source | Processed | |
---|---|---|
Download | ATP direct download link | cleaned .geojson file |
Count | 1,325 |
ALDI
ATP | example | processed | example |
---|---|---|---|
@spider | aldi_sud_us | delete | |
addr:city | Alexandria | Alexandria | |
addr:country | US | delete | |
addr:street_address | 425 E Monroe Ave | addr:street | East Monroe Avenue |
addr:housenumber | 425 | ||
addr:unit | |||
addr:postcode | 22301 | 22301 | |
addr:state | VA | VA | |
brand | ALDI | ALDI | |
brand:wikidata | Q41171672 | overwrite | Q41171672 |
contact:facebook | https://www.facebook.com/ALDI.USA | delete | |
contact:twitter | AldiUSA | delete | |
image | https://dynl.mktgcdn.com/p/... | delete | |
name | ALDI 425 E Monroe Ave | ALDI | |
nsi_id | aldi-68f0e3 | delete | |
opening_hours | Mo-Su 09:00-20:30 | Mo-Su 09:00-20:30 | |
phone | +1 833-471-7067 | +1 833-471-7067 | |
ref | https://stores.aldi.us/#4420173 | delete | https://stores.aldi.us/#4420173 |
shop | supermarket | supermarket | |
website | https://stores.aldi.us/va/alexandria/425-e-monroe-ave | https://stores.aldi.us/va/alexandria/425-e-monroe-ave |
Source | Processed | |
---|---|---|
Download | ATP direct download link | cleaned .geojson file |
Count | 2,357 |
Whole Foods
ATP | example | processed | example |
---|---|---|---|
@spider | whole_foods | delete | |
addr:city | Arlington | Arlington | |
addr:country | US | delete | |
addr:full | 520 12th St South | addr:street | 12th Street South |
addr:housenumber | 520 | ||
addr:unit | |||
addr:postcode | 22202 | 22202 | |
addr:state | VA | VA | |
brand | Whole Foods | Whole Foods | |
brand:wikidata | Q1809448 | overwrite | Q1809448 |
name | Pentagon City | branch | Pentagon City |
name | Whole Foods Market | ||
nsi_id | wholefoodsmarket-90050a | delete | |
opening_hours | Mo-Su 07:00-22:00 | Mo-Su 07:00-22:00 | |
phone | +1 571-777-3948 | +1 571-777-3948 | |
ref | pentagoncity | pentagoncity | |
shop | supermarket | supermarket | |
website | https://www.wholefoodsmarket.com/stores/pentagoncity | https://www.wholefoodsmarket.com/stores/pentagoncity |
Source | Processed | |
---|---|---|
Download | ATP direct download link | cleaned .geojson file |
Count | 528 | 528 |
Safeway
Removed amenity=fuel and amenity=pharmacy POIs.
ATP | example | processed | example |
---|---|---|---|
@spider | safeway | delete | |
addr:city | Arlington | Arlington | |
addr:country | US | delete | |
addr:street_address | 1525 Wilson Blvd | addr:street | Wilson Boulevard |
addr:housenumber | 1525 | ||
addr:unit | |||
addr:postcode | 22209 | 22209 | |
addr:state | VA | VA | |
brand | Safeway | Safeway | |
brand:wikidata | Q1508234 | overwrite | Q1508234 |
contact:facebook | https://www.facebook.com/safeway | delete | |
contact:twitter | Safeway | delete | |
image | https://dynl.mktgcdn.com/p/... | delete | |
name | Safeway | Safeway | |
nsi_id | N/A | delete | |
opening_hours | Mo-Su 06:00-23:00 | Mo-Su 06:00-23:00 | |
phone | +1 703-276-9315 | +1 703-276-9315 | |
ref | https://local.safeway.com/#5603910 | delete | |
shop | supermarket | supermarket | |
website | https://local.safeway.com/safeway/va/arlington/1525-wilson-blvd.html | https://local.safeway.com/safeway/va/arlington/1525-wilson-blvd.html |
Source | Processed | |
---|---|---|
Download | ATP direct download link | cleaned .geojson file |
Count | 1,935 | 915 |
Trader Joe's
ATP | example | processed | example |
---|---|---|---|
@spider | trader_joes_us | delete | |
addr:city | Arlington | Arlington | |
addr:country | US | delete | |
addr:street_address | 1109 N Highland St | addr:street | North Highland Street |
addr:housenumber | 1109 | ||
addr:unit | |||
addr:postcode | 22201 | 22201 | |
addr:state | VA | VA | |
brand | Trader Joe's | Trader Joe's | |
brand:wikidata | Q688825 | overwrite | Q688825 |
name | Trader Joe's Arlington - Clarendon (640) | Trader Joe's | |
branch | Clarendon | ||
nsi_id | traderjoes-dde59d | delete | |
opening_hours | Mo-Su 08:00-21:00 | Mo-Su 08:00-21:00 | |
phone | +1 703-351-8015 | +1 703-351-8015 | |
ref | 640 | 640 | |
shop | supermarket | supermarket | |
website | https://locations.traderjoes.com/va/arlington/640/ | https://locations.traderjoes.com/va/arlington/640/ |
Source | Processed | |
---|---|---|
Download | ATP direct download link | cleaned .geojson file |
Count | 564 | 564 |
Kroger
The file for Kroger contains POIs for 25 different brands under the Kroger umbrella, the largest of which is Harris Teeter.
ATP | example | processed | example |
---|---|---|---|
@spider | kroger_us | delete | |
addr:city | Arlington | Arlington | |
addr:country | US | delete | |
addr:street_address | 900 Army Navy Dr | addr:street | Army Navy Drive |
addr:housenumber | 900 | ||
addr:unit | |||
addr:postcode | 22202 | 22202 | |
addr:state | VA | VA | |
branch | Pentagon Row | Pentagon Row | |
brand | Harris Teeter | Harris Teeter | |
brand:wikidata | Q5665067 | overwrite | Q5665067 |
name | Harris Teeter | Harris Teeter | |
nsi_id | harristeeter-dde59d | delete | |
opening_hours | Mo-Su 06:00-23:00 | Mo-Su 06:00-23:00 | |
operator | Harris Teeter Supermarkets, Inc. | delete | |
phone | +1 703-413-7112 | +1 703-413-7112 | |
ref | 09700083 | 09700083 | |
shop | supermarket | supermarket | |
website | https://www.harristeeter.com/stores/grocery/va/arlington/pentagon-row/097/00083 | https://www.harristeeter.com/stores/grocery/va/arlington/pentagon-row/097/00083 |
Source | Processed | |
---|---|---|
Download | ATP direct download link | cleaned .geojson file |
Count | 6,796 | 2,857 |
IGA
The file for IGA contains POIs for 2 different brands under the IGA umbrella, IGA and IGA Express.
ATP | example | processed | example |
---|---|---|---|
@spider | iga | delete | |
addr:city | Urbanna | Urbanna | |
addr:country | US | delete | |
addr:full | 335 Virginia Street, Urbanna, VA, 23175 | delete | |
addr:street_address | 335 Virginia Street | addr:street | Virginia Street |
addr:housenumber | 335 | ||
addr:unit | |||
addr:postcode | 23175 | 23175 | |
addr:state | VA | VA | |
branch | Pentagon Row | Pentagon Row | |
brand | IGA | IGA | |
brand:wikidata | Q3146662 | overwrite | Q3146662 |
name | Urbanna Market IGA | Urbanna Market IGA | |
nsi_id | iga-166dbe | delete | |
phone | +1 803-854-5165 | +1 803-854-5165 | |
ref | Urbanna Market IGA | Urbanna Market IGA | |
shop | supermarket | supermarket | |
website | https://urbannamarket.iga.com | https://urbannamarket.iga.com |
Source | Processed | |
---|---|---|
Download | ATP direct download link | cleaned .geojson file |
Count | 714 | 712 |
Regional chains
These grocery chains have significantly smaller distribution footprints.
Source | Processed | Count | |
---|---|---|---|
Giant Eagle | direct download | cleaned file | 480 |
Hannaford | direct download | cleaned file | 187 |
Key Food | direct download | cleaned file | 411 |
Piggly Wiggly | direct download | cleaned file | 493 |
Publix | direct download | cleaned file | 1,423 |
Shoprite | direct download | cleaned file | 282 |
Sprouts | direct download | cleaned file | 413 |
Stater Bros | direct download | cleaned file | 169 |
Tops | direct download | cleaned file | 148 |
Winn Dixie | direct download | cleaned file | 546 |
Wegmans | direct download | cleaned file | 110 |
Hotels
The import of hotel data will be almost identical to that of grocery store POIs. I do not plan to overwrite any values except for brand
and brand:wikidata
where they are clearly outdated or wrong. The post about this stage of the import in the OSM Community forum is here.
The following are the hotel chains I plan to import, with multiple sub-brands in each:
Processed | Count | Sub-brands (not exhaustive) | |
---|---|---|---|
Best Western | cleaned file | 2,283 | 'Best Western': 1083, 'Best Western Plus': 902, 'Surestay Plus': 118, 'Surestay': 117 |
Choice Hotels | cleaned file | 6,360 | 'Quality Inn': 1036, 'Econo Lodge': 697, 'Comfort Inn': 627, 'Comfort Inn & Suites': 605, 'Quality Inn & Suites': 577 |
Hilton | cleaned file | 5,534 | 'Hampton': 1411, 'Hampton Inn & Suites': 1022, 'Hilton Garden Inn': 777, 'Home2 Suites': 596 |
Hyatt | cleaned file | 771 | 'Hyatt Place': 353, 'Hyatt': 271, 'Hyatt House': 120, 'Destination': 27 |
IHG | cleaned file | 1,757 | 'Holiday Inn Express': 994, 'Candlewood Suites': 354 |
Marriott | cleaned file | 5,381 | 'Fairfield': 1170, 'Courtyard': 1057, 'Residence Inn': 863, 'SpringHill Suites': 553, 'Townplace Suites': 515 |
Wyndham | cleaned file | 6,519 | 'Super 8': 1534, 'Days Inn': 1355, 'La Quinta Inn & Suites': 750, 'Baymont': 537, 'Travelodge': 438, 'Ramada': 350 |
Fast Food and Cafes
The import of fast food and cafe data will be almost identical to that of other POIs. I will not overwrite any values except for brand
and brand:wikidata
where they are clearly outdated or wrong. Some brand POIs contain fast food relevant tags, like takeaway=* or drive_through=*. Otherwise, it's mostly the usual address tags, phone
, website
, etc. The post about this stage of the import in the OSM Community forum is here.
The following are the chains I plan to import:
Count | File | |
---|---|---|
Carl's Jr. | 1067 | cleaned file |
Domino's | 6909 | cleaned file |
In-N-Out Burger | 402 | cleaned file |
Peet's Coffee | 289 | cleaned file |
Cook Out | 359 | cleaned file |
Five Guys | 1476 | cleaned file |
Subway | 21218 | cleaned file |
Dairy Queen | 4269 | cleaned file |
Long John Silver's | 523 | cleaned file |
Dunkin' | 9241 | cleaned file |
Wingstop | 1998 | cleaned file |
Wendy's | 6031 | cleaned file |
El Pollo Loco | 501 | cleaned file |
Pizza Hut | 6780 | cleaned file |
Shake Shack | 331 | cleaned file |
Popeyes | 3034 | cleaned file |
Burger King | 6713 | cleaned file |
Panda Express | 2147 | cleaned file |
Chipotle | 3387 | cleaned file |
Papa John's | 3129 | cleaned file |
Arby's | 3318 | cleaned file |
Potbelly | 422 | cleaned file |
Jamba | 721 | cleaned file |
Qdoba | 745 | cleaned file |
Bojangles' | 822 | cleaned file |
Tim Hortons | 640 | cleaned file |
Church's Chicken | 641 | cleaned file |
Baskin-Robbins | 2197 | cleaned file |
Jimmy John's | 2678 | cleaned file |
Starbucks | 16425 | cleaned file |
KFC | 4280 | cleaned file |
Jack in the Box | 2196 | cleaned file |
Chick-fil-A | 2970 | cleaned file |
Whataburger | 1025 | cleaned file |
Quizno's | 147 | cleaned file |
Hardee's | 1621 | cleaned file |
Taco Bell | 7947 | cleaned file |
Moe's Southwest Grill | 618 | cleaned file |
Culver's | 964 | cleaned file |
Zaxby's | 924 | cleaned file |
Panera Bread | 2127 | cleaned file |
A&W | 448 | cleaned file |
Dutch Bros. Coffee | 880 | cleaned file |
MOD Pizza | 526 | cleaned file |
Einstein Bros. Bagels | 688 | cleaned file |
The Habit Burger Grill | 381 | cleaned file |
Scooter's Coffee | 744 | cleaned file |
Big Box Stores
The import of big box stores will be almost identical to that of other POIs. This "category" of big box stores is broader than the other categories I've imported thus far, but I am considering any store brand that is often an anchor or dominant occupant of a mall. I will not overwrite any values except for brand
and brand:wikidata
where they are clearly outdated or wrong. It will be mostly the usual address tags, phone
, website
, etc. The post about this stage of the import in the OSM Community forum is forthcoming.
The following are the chains I plan to import:
Count | File | |
---|---|---|
Party City | 750 | cleaned file |
JOANN Fabrics and Crafts | 804 | cleaned file |
Ace Hardware | 4574 | cleaned file |
Mattress Firm | 2409 | cleaned file |
Michaels | 1205 | cleaned file |
Walmart | 4520 | cleaned file |
Harbor Freight Tools | 1556 | cleaned file |
Staples | 967 | cleaned file |
TJ Maxx | 3520 | cleaned file |
HomeGoods | 911 | cleaned file |
DSW | 491 | cleaned file |
Menards | 341 | cleaned file |
Dick's Sporting Goods | 762 | cleaned file |
True Value | 1181 | cleaned file |
Office Depot | 892 | cleaned file |
Burlington | 1001 | cleaned file |
Total Wine | 257 | cleaned file |
Best Buy | 1048 | cleaned file |
Big Lots | 1392 | cleaned file |
Target | 1962 | cleaned file |
Sam's Club | 597 | cleaned file |
Kohl's | 1174 | cleaned file |
Tractor Supply Company | 2232 | cleaned file |
The Home Depot | 1731 | cleaned file |
Lowe's | 1678 | cleaned file |
Ross | 1772 | cleaned file |
Ulta Beauty | 1406 | cleaned file |
Costco | 603 | cleaned file |
PetSmart | 1528 | cleaned file |
Ashley HomeStore | 749 | cleaned file |