ColOpenData can be used to access open geospatial data from Colombia. This data is retrieved from the National Geostatistical Framework (MGN), published by the National Administrative Department of Statistics (DANE). The MGN contains the political-administrative division and is used to reference census statistical information.
This package contains the 2018’s version of the MGN, which also
included a summarized version of the National Population and Dwelling
Census (CNPV) in different aggregation levels. Each level is stored in a
different dataset, which can be retrieved using the
download_geospatial()
function, which requires three
arguments:
spatial_level
character with the spatial level to be
consultedsimplified
logical for indicating if the downloaded
spatial data should be a simplified version of the geometries.
Simplified versions are lighter but less precise, and are recommended
for easier applications like plots. Default is .include_geom
logical for including (or not) geometry.
Default is TRUE
include_cnpv
logical for including (or not) CNPV
demographic and socioeconomic information Default is
TRUE
.Available levels of aggregation come from the official spatial division provided by DANE, with their names corresponding to:
In this vignette you will learn:
We will be using geospatial data at the level of Department (“dpto”) and we will calculate the percentage of dwellings with internet connection at each department. Later, we will build some plots using the previously mentioned approaches for dynamic and static plots.
We will start by importing the needed libraries.
Disclaimer: all data is loaded to the environment in the user’s R session, but is not downloaded to user’s computer. Spatial datasets can be very long and might take a while to be loaded in the environment
First, we download the data using the function
download_geospatial()
, including the geometries and the
census related information. The simplified
parameter is
used to download a lighter version, since simple plots do not require
precise spatial information.
dpto <- download_geospatial(
spatial_level = "dpto",
simplified = TRUE,
include_geom = TRUE,
include_cnpv = TRUE
)
head(dpto)
To understand which column contains the internet related information,
we will need the corresponding dataset dictionary. To download the
dictionary we can use the geospatial_dictionary()
function.
This function uses as parameters the dataset name to download the
associated information and language of this information. For further
information please refer to the documentation on dictionaries previously
mentioned.
To calculate the percentage of dwellings with internet connection, we will need to know the number of dwellings with internet connection and the total of dwellings in each department. From the dictionary, we get that the number of dwellings with internet connection is viv_internet and the total of dwellings is viviendas. We will calculate the percentage as follows:
ggplot2
can
be used to generate static plots of spatial data by using the geometry
geom_sf()
. Color palettes and themes can be defined for
each plot using the aesthetic and scales, which can be consulted in the
ggplot2
documentation.
We will use a gradient with a two-color diverging palette, to make the
differences more visible.
ggplot(data = internet_cov) +
geom_sf(mapping = aes(fill = internet), color = NA) +
theme_minimal() +
theme(
plot.background = element_rect(fill = "white", colour = "white"),
panel.background = element_rect(fill = "white", colour = "white"),
panel.grid = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank()
) +
scale_fill_gradient("Percentage", low = "#10bed2", high = "#deff00") +
ggtitle(
label = "Internet coverage",
subtitle = "Colombia"
)
For dynamic plots, we can use leaflet
,
which is an open-source library for interactive maps. To create the same
plot we first will create the color palette.
colfunc <- colorRampPalette(c("#10bed2", "#deff00"))
pal <- colorNumeric(
palette = colfunc(100),
domain = internet_cov[["internet"]]
)
With the previous color palette we can generate the interactive plot.
The package also includes open source maps for the base map like OpenStreetMap
and CartoDB. For further
details on leaflet
, please refer to the package’s documentation.
leaflet(internet_cov) %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addPolygons(
stroke = TRUE,
weight = 0,
color = NA,
fillColor = ~ pal(internet_cov[["internet"]]),
fillOpacity = 1,
popup = paste0(internet_cov[["internet"]])
) %>%
addLegend(
position = "bottomright",
pal = pal,
values = ~ internet_cov[["internet"]],
opacity = 1,
title = "Internet Coverage"
)