Import of the Brazilian National Register of Health Facilities
Goal
Import health facility data from the Brazilian Ministry of Health into OpenStreetMap
The primary goal is to adapt the National register of Health Facilities (Acronym in Portuguese - CNES - Cadastro Nacional de Estabelecimentos de Saúde) data source to the OpenStreetMap standard and validate the possibility of importing data to OSM.
The secondary goal is to do imports for the unknown health facilities records with coordinates following the Import Guidelines with specific user IDs to leverage the availability of Brazilian health data in the OSM data sources.
To accomplish the defined goal we plan to follow two different approaches:
Approach 1- Maproulette:
The first step is to add/improve the known health facility records to OSM from the available directories with the help of the individual local users.
Approach 2- Import for small amount of records:
In the next step for the unknown health facilities records which are having coordinates will be added to OSM by performing small imports based on the district/block.
Schedule
- Planning: Late 2020 early 2021
- Import: Second quarter of 2021
- QA: Post-import
- Announce Import
Import Data
- Data source site: https://opendatasus.saude.gov.br/dataset/cadastro-nacional-de-estabelecimentos-de-saude-cnes
- http://cnes.saude.gov.br/pages/downloads/arquivosBaseDados.jsp
- Data license: https://opendefinition.org/od/2.1/en/
- Type of License: Public Domain with attribution.
- ODbL Compliance verified: yes
Background
Several system of information are available in Brazil, most of which are publicly accessible and administered by the Ministry of Health, through the Department of Informatics of the Unified Health System (DATASUS), whose data has guided the conduction of studies that address the analysis of epidemiological, health and service provision structure and infrastructure parameters. One example is the National Registry of Health Facilities - CNES, a database that contains data on all the Brazilian health facilities.
A health facility is included in the National Registry of Health Facilities - CNES by filling in specific forms with data on physical area, human resources, equipment and outpatient and hospital services in operation, regardless of whether or not they provide care to public health users. Once registered, the Ministry of Health generates a numerical code for each facility, the National Registry of Health Facilities - CNES code. National Registry of Health Facilities - CNES data are important for health planning, control and evaluation and should reflect the real situation of the health system.
As part of the efforts to tackle the COVID-19, the Brazilian Department of Informatics of the Unified Health System (DATASUS) released a geolocated version of the National Registry of Health Facilities - CNES comprising all the 400,000 Brazilian health care facilities. The data was released under the the open data creative commons distribution creating conditions to support the OpenStreetMap and the Healthsites.io initiatives.
OSM Data Files
The OSM files derived from the raw National Registry of Health Facilities - CNES datasource can be found here: brazil_OSM_updated.xlsx . This file is the result of the pre-processing steps described below. The spreadsheet follows the OSM standard in terms of columns and labels.
Import Type
This is an OSM Brazilian community-based, one-time import. There are currently no plans for taking in or processing subsequent updates that DATASUS might provide.This would be a nice capability, but it is outside the scope of this immediate effort.
Method of import: All the imports will be done using JOSM with the import specific OSM accounts.
Data Preparation
Data reduction & simplification
The original Government data-sets are in .csv format and it contains several attributes in each directories. To cover the basic variables needed to fill the OSM standard we selected the information characterizing each health facility, its location, address, type of facility, operator, municipality of location and number of Beds. The raw source of information can be checked here: ftp://ftp.datasus.gov.br/cnes/ . This ftp link shares all raw data concerning the National register of health facilities. Over the information from this source we performed the cleaning steps described below to get the most accurate possible result concerning the Brazilian facilities registered.
We run descriptive analysis to validate the attribute regarding each facility. The url column of the spreadsheet (brazil_OSM_updated.xlsx) refers to the official government webpage associated with each facility. The link (http://cnes.datasus.gov.br/pages/estabelecimentos/consulta.jsp) allows any user to insert the CNES-code of each facility to check its information. If you insert the CNES code in the red box, the website will answer the mirror register of each facility in the brazil_OSM_updated.xlsx file. The CNES code is the last 7 digits of each url in the column “URL” of the brazil_OSM_updated.xlsx file.
All data obtained from the DATASUS warehouse was pre-processed to clean inconsistencies attributes and retain the attributes which are useful combinations for the Health care related facilities. The list of attributes which have been retained from the Open Government Data for the import are listed below:
Attributes | Description |
OBJECTID | ID of each facility |
osm_id | Field to register a future OSM id |
amenity | Type of amenity considering the match between Portuguese tags and the OSM standard |
healthcare | Type of facility according to the National Register of Health Facilities - CNES codebook (Codebook.pdf ) |
name | Name of the facility according to the National Register of Health Facilities - CNES |
operator | Operator responsible for the management of the facility |
source | Link for the open data source of the facilities analyzed. |
speciality | Health care specialty of each facility |
operator_t | Operator of the current facility (public,/private or just government) |
contact_nu | Phone number of each facility |
operationa | Operational status of the facility |
opening_ho | Opening hours |
beds | Number of beds available |
staff_doct | Number of physicians in the facility |
staff_nurs | Number of nurses in the facility |
health_ame | Type of equipment available at facility |
dispensing | Existence of dispensing pharmacy in the facility |
wheelchair | Facility suitable for wheelchair use |
emergency | Existence of emergency services. |
insurance | Type of health insurance accepted by the facility |
water_sour | Source of water |
electricit | Source of power gnerated |
is_in_heal | Health area comprising the facility |
is_in_he_1 | Health zone comprising the facility |
URL | Url describing the facility register |
addr_house | Address - house |
addr_stree | Address - street |
addr_postc | Address - postcode |
addr_city | Address - city |
changeset_ | Versioning control |
changeset1 | Versioning control |
changeset2 | Versioning control |
changeset3 | Versioning control |
latitude | Latitude of the facility |
longitude | Longitude of the facility |
CNTRY_TERR | Country- territory |
SOVEREIGN | Country- territory |
ISO_3_CODE | Country code 3 digits- ISO |
ISO_2_CODE | Country code 2 digits- ISO |
UN_CODE | United Nations country code |
WHO_CODE | World health organization country code |
WHO_STATUS | Status regarding WHO alliance |
The CNES dataset comprises hundreds of variables. The data source has information on physical structure, professionals, beds, emergency capacity, list of equipment, and dozens of other data categories. All this information is published consistently on a monthly basis. Our initial approach is to import the information of beds and geolocation to OSM, future efforts can be performed to incorporate the other information available.
Input Data Cleaning:
The input data has few quality issues which will be addressed and further cleaned on the basis of values before the import .The steps performed during the cleaning phase are:
- Removal of invalid values - N/A, 0, /N etc…,
- Removal of duplicate records,
- Modification of the source data into OSM compatible values - separating the multiple values with ";", changing the opening hours Syntax etc.
- Remove facilities without the geolocation coordinates,
- Adding to each facility a link with the complete data information, so anyone can help in the future to add new information.
After cleaning : The final data after removing all the issues from the raw data can be found here: brazil_OSM_updated.xlsx
Tagging plans
Here we list all the original tags with their corresponding translation into the OSM tagging schema, plus additional tags for all segments. The conversion dictionary from the type of facility in Portuguse, and the corresponding label according to the OSM key standard is described below:
Health Facility Related tags converted | |
Government Facility Type | OSM Key |
CENTRAL DE GESTAO EM SAUDE | amenity = government |
CENTRAL DE NOTIFICACAO,CAPTACAO E DISTRIB DE ORGAOS ESTADUAL | amenity = government |
CENTRAL DE REGULACAO DE SERVICOS DE SAUDE | amenity = government |
CENTRAL DE REGULACAO DO ACESSO | amenity = government |
CENTRAL DE REGULACAO MEDICA DAS URGENCIAS | amenity = government |
CENTRO DE APOIO A SAUDE DA FAMILIA | amenity = clinic |
CENTRO DE ATENCAO HEMOTERAPIA E OU HEMATOLOGICA | amenity = clinic |
CENTRO DE ATENCAO PSICOSSOCIAL | amenity = social_facility |
CENTRO DE PARTO NORMAL - ISOLADO | amenity = hospital |
CENTRO DE SAUDE/UNIDADE BASICA | amenity = clinic |
CLINICA/CENTRO DE ESPECIALIDADE | amenity = clinic |
CONSULTORIO ISOLADO | amenity = clinic |
COOPERATIVA OU EMPRESA DE CESSAO DE TRABALHADORES NA SAUDE | amenity = social_facility |
FARMACIA | amenity = pharmacy |
HOSPITAL ESPECIALIZADO | amenity = hospital |
HOSPITAL GERAL | amenity = hospital |
HOSPITAL/DIA - ISOLADO | amenity = hospital |
LABORATORIO CENTRAL DE SAUDE PUBLICA LACEN | amenity = clinic |
LABORATORIO DE SAUDE PUBLICA | amenity = government |
OFICINA ORTOPEDICA | amenity = clinic |
POLICLINICA | amenity = clinic |
POLO ACADEMIA DA SAUDE | amenity = social_facility |
POLO DE PREVENCAO DE DOENCAS E AGRAVOS E PROMOCAO DA SAUDE | amenity = social_facility |
POSTO DE SAUDE | amenity = clinic |
PRONTO ATENDIMENTO | amenity = hospital |
PRONTO SOCORRO ESPECIALIZADO | amenity = hospital |
PRONTO SOCORRO GERAL | amenity = hospital |
SERVICO DE ATENCAO DOMICILIAR ISOLADO(HOME CARE) | amenity = clinic |
TELESSAUDE | amenity = clinic |
UNIDADE DE APOIO DIAGNOSE E TERAPIA (SADT ISOLADO) | amenity = clinic |
UNIDADE DE ATENCAO A SAUDE INDIGENA | amenity = clinic |
UNIDADE DE ATENCAO EM REGIME RESIDENCIAL | amenity = clinic |
UNIDADE DE VIGILANCIA EM SAUDE | amenity = clinic |
UNIDADE MISTA | amenity = clinic |
UNIDADE MOVEL DE NIVEL PRE-HOSPITALAR NA AREA DE URGENCIA | amenity = clinic |
UNIDADE MOVEL FLUVIAL | amenity = clinic |
UNIDADE MOVEL TERRESTRE | amenity = clinic |
Other General attributes:
The values for addressing and contact details are tagged based on the global OSM Keys from the Key:addr_house, Key:addr_stree, Key:aaddr_postc, Key:addr_city, Key:URL, and Key:contact_nu. The general attributes described in the keys above, highlights additional details regarding the address of each facility, the phone number for contact, as well as the URL with detailed information characterizing each one of the facilities processed.
Data transformation
A total of 354,805 health facilities were processed. The result of the processing step is saved here: https://drive.google.com/file/d/1V70Fjhg3sza_z3kgg2uQ4fdR7NzSldOv/view?usp=sharing
Data transformation results
Changeset tags
We will use the following changeset tags:
- comment=Brazilian National Registry of Health Facilities - CNES
- source=datasus.gov.br
- source:date=YYYY[-MM[-DD]]..YYYY[-MM[-DD]],
- import=yes
- url=https://wiki.openstreetmap.org/wiki/Import_of_the_Brazilian_National_Register_of_Health_Facilities
Element Tags
- Source=OpenGovernmentData
Data merge workflow
Team Approach
The import will be done manually by JOSM experienced users with dedicated usernames.
References
Each user will consider the following information when importing the data by:
Maproulette:
- Local knowledge
- Ground Truth Verification
- Contacting the facilities
- Satellite Imagery
- Existing OSM data
- Building outline
- Health Facility data from MR challenge
Imports:
- Satellite Imagery
- Existing OSM data
- Building outline
- Health Facility data Json file
Workflow
We will start with the Manaus data set and split into city or district level before assigning to local OSM users.
Following on from this data import we will perform the same process with priority cities before moving to the Brazilian states.
The cleaned up data sets will be divided into the 27 states of Brazil and assigned to users based on region.