How to download climate data using ColOpenData

ColOpenData can be used to access open climate data from Colombia. This climate data is retrieved from the Institute of Hydrology, Meteorology and Environmental Studies (IDEAM). The climate module allows the user to consult climate data for any Region of Interest (ROI) inside the country and retrieve the information for each station contained inside.

The available information from IDEAM can be accessed using specific internal tags as follows:

Tags Variable
TSSM_CON Dry-bulb Temperature
THSM_CON Wet-bulb Temperature
TMN_CON Minimum Temperature
TMX_CON Maximum Temperature
TSTG_CON Dry-bulb Temperature (Termograph)
HR_CAL Relative Humidity
HRHG_CON Relative Humidity (Hydrograph)
TV_CAL Vapour Pressure
TPR_CAL Dew Point
PTPM_CON Precipitation (Daily)
PTPG_CON Precipitation (Hourly)
EVTE_CON Evaporation
FA_CON Atmospheric Phenomenon
NB_CON Cloudiness
RCAM_CON Wind Trajectory
BSHG_CON Sunshine Duration
VVAG_CON Wind Speed
DVAG_CON Wind Direction
VVMXAG_CON Maximum Wind Speed
DVMXAG_CON Maximum Wind Direction

Each observation is subject to the availability of stations in the ROI and the stations’ status (active, maintenance or suspended), as well as quality filters implemented by IDEAM.

In this vignette you will learn:

  1. How to download climate data using ColOpenData.
  2. How to aggregate climate data by different frequencies
  3. How to plot downloaded climate data

For this example we will retrieve data for the municipality of Espinal in Colombia. We will download Dry-Bulb Temperature (TSSM_CON) from 2013 to 2016, to observe the increase in the average temperature during 2015 and 2016 due to the impact of El Nino (ENSO).

ColOpenData offers three methods to do this, using different functions: - download_climate_stations() to download climate data from previously selected stations - download_climate_geom() to download climate data from a specified geometry (ROI) - download_climate() to download climate data from municipalities’ or departments’ already loaded geometries

In this example, we will follow the three methods to get the same results, exploring the included functions. We will start by loading the needed libraries.

library(ColOpenData)
library(dplyr)
library(sf)
library(leaflet)
library(ggplot2)

Disclaimer: all data is loaded to the environment in the user’s R session, but is not downloaded to user’s computer.

Retrieving climate data for a ROI using stations’ data

For this example, we will need to create a spatial polygon around the municipality of Espinal and use that as our ROI to retrieve the climate data. To create the spatial polygon we need to introduce the coordinates of the geometry. For simplicity, we will build a bounding box by introducing the 4 points which bound the municipality, and transform the created geometry into an sf object (see sf library for further details).

lat <- c(4.263744, 4.263744, 4.078156, 4.078156, 4.263744)
lon <- c(-75.042067, -74.777022, -74.777022, -75.042067, -75.042067)
polygon <- st_polygon(x = list(cbind(lon, lat))) %>% st_sfc()
roi <- st_as_sf(polygon)

With our created ROI, we can make a simple visualization using leaflet.

leaflet(roi) %>%
  addProviderTiles("OpenStreetMap") %>%
  addPolygons(
    stroke = TRUE,
    weight = 2,
    color = "#2e6930",
    fillColor = "#2e6930",
    opacity = 0.6
  )

We can make a first exploration to check if there are any stations contained inside of it, using the function stations_in_roi().

stations <- stations_in_roi(geometry = roi)

head(stations)
#> # A tibble: 6 × 21
#>     codigo nombre     categoria tecnologia estado departamento municipio latitud
#>      <dbl> <chr>      <chr>     <chr>      <chr>  <chr>        <chr>     <chr>  
#> 1 21185090 NATAIMA -… Agromete… Automátic… Activa Tolima       Espinal   4.1881…
#> 2 21170020 DOS AGUAS… Pluviomé… Convencio… Activa Tolima       Suárez (… 4.2582…
#> 3 21180220 AEROPUERT… Pluviomé… Convencio… Suspe… Tolima       Espinal   4.15   
#> 4 21180230 BAMBU EL … Pluviomé… Convencio… Suspe… Tolima       Espinal   4.2    
#> 5 21215090 MARANONES… Climátic… Convencio… Suspe… Tolima       Espinal   4.2166…
#> 6 21215080 CHICORAL … Climátic… Convencio… Activa Tolima       Espinal   4.2315…
#> # ℹ 13 more variables: longitud <chr>, altitud <int>, fecha_instalacion <chr>,
#> #   area_operativa <chr>, corriente <chr>, area_hidrografica <chr>,
#> #   zona_hidrografica <chr>, subzona_hidrografica <chr>, entidad <chr>,
#> #   fecha_suspension <chr>, codigo_municipio <chr>, codigo_departamento <chr>,
#> #   geometry <POINT>

We can see that in the region there are 24 stations. Different categories are recorded by different stations, and can be checked at the column categoria. Stations under the categories Climática Principal and Climática Ordinaria have records of temperature.

Some stations are suspended, which means they are not taking measurements at the moment. This information is found at the column estado where, if suspended, the observation would be Suspendida Also, at the column fecha_suspension the observation would be different from NA, since suspended stations would have an associated suspension date. However, even if a station is suspended, the historical data (up to the suspension date) can be accessed.

To filter the stations that recorded information during the desired period, we can delete the stations with suspension dates before 2013.

cw_stations <- stations %>%
  filter(
    as.Date(fecha_suspension) > as.Date("2013-01-01") | estado == "Activa",
    categoria %in% c("Climática Principal", "Climática Ordinaria")
  )

head(cw_stations)
#> # A tibble: 1 × 21
#>     codigo nombre     categoria tecnologia estado departamento municipio latitud
#>      <dbl> <chr>      <chr>     <chr>      <chr>  <chr>        <chr>     <chr>  
#> 1 21215080 CHICORAL … Climátic… Convencio… Activa Tolima       Espinal   4.2315…
#> # ℹ 13 more variables: longitud <chr>, altitud <int>, fecha_instalacion <chr>,
#> #   area_operativa <chr>, corriente <chr>, area_hidrografica <chr>,
#> #   zona_hidrografica <chr>, subzona_hidrografica <chr>, entidad <chr>,
#> #   fecha_suspension <chr>, codigo_municipio <chr>, codigo_departamento <chr>,
#> #   geometry <POINT>

From the original 24 stations, only 1 was working for some or the whole period of interest and collected information for Dry-Bulb Temperature (TSSM_CON). It is important to consider that after data collection, some information might be lost due to quality attributes.

With the stations, we can access TMX_CON from 2013 to 2016. To do so, we can use the function download_climate_stations(). This function has the following parameters:

tssm_stations <- download_climate_stations(
  stations = cw_stations,
  start_date = "2013-01-01",
  end_date = "2016-12-31",
  tag = "TSSM_CON"
)
#> Original data is retrieved from the Institute of Hydrology, Meteorology
#> and Environmental Studies (Instituto de Hidrología, Meteorología y
#> Estudios Ambientales - IDEAM).
#> Reformatted by package authors.
#> Stored by Universidad de Los Andes under the Epiverse TRACE iniative.

head(tssm_stations)
#> # A tibble: 6 × 7
#>    station longitude    latitude   date       hour     tag      value
#>      <dbl> <chr>        <chr>      <chr>      <chr>    <chr>    <dbl>
#> 1 21215080 -74.99536111 4.23152778 2013-01-01 07:00:00 TSSM_CON  23.2
#> 2 21215080 -74.99536111 4.23152778 2013-01-01 13:00:00 TSSM_CON  32  
#> 3 21215080 -74.99536111 4.23152778 2013-01-01 18:00:00 TSSM_CON  27.2
#> 4 21215080 -74.99536111 4.23152778 2013-01-02 07:00:00 TSSM_CON  22.6
#> 5 21215080 -74.99536111 4.23152778 2013-01-02 13:00:00 TSSM_CON  32  
#> 6 21215080 -74.99536111 4.23152778 2013-01-02 18:00:00 TSSM_CON  27

The returned tidy data.frame includes: individual and unique station code, longitude, latitude, date, hour, tag requested and value recorded at the specified time. The tidy structure reports a row for each observation, which makes the subset and plot easier.

To plot a time series of the stations’ data we can use ggplot() function from ggplot2 package as follows:

ggplot(data = tssm_stations) +
  geom_line(aes(x = date, y = value, group = station), color = "#106ba0") +
  ggtitle("Dry-bulb Temperature in Espinal by station") +
  xlab("Date") +
  ylab("Temperature [°C]") +
  facet_grid(rows = vars(station)) +
  theme_minimal() +
  theme(
    plot.background = element_rect(fill = "white", colour = "white"),
    panel.background = element_rect(fill = "white", colour = "white"),
    plot.title = element_text(hjust = 0.5)
  )

As we can see, only one station has data for the selected period. However, by having the data measured by hours, we cannot easily observe changes in the temperature patterns along time. To aid this issue, we will use the aggregation function aggregate_climate(), which aggregates climate data by time. This function takes by parameter the desired aggregation.

tssm_month <- tssm_stations %>% aggregate_climate(frequency = "month")

ggplot(data = tssm_month) +
  geom_line(aes(x = date, y = value, group = station), color = "#106ba0") +
  ggtitle("Dry-bulb Temperature in Espinal by station") +
  xlab("Date") +
  ylab("Dry-bulb temperature [C]") +
  facet_grid(rows = vars(station)) +
  theme_minimal() +
  theme(
    plot.background = element_rect(fill = "white", colour = "white"),
    panel.background = element_rect(fill = "white", colour = "white"),
    plot.title = element_text(hjust = 0.5)
  )

## Other methods ::: {style=“text-align: justify;”} To retrieve climate data for any ROI in the country, without manually extracting the stations’ data, we can use the function download_climate_geom(). The function has the following parameters:

To replicate the previous example, we can just use the previously created ROI and add the aggregation for month. We can add the aggregation function to the workflow using the pipe operator %>%. The following code should retrieve the same results as the previous one. :::

tssm_roi <- download_climate_geom(
  geometry = roi,
  start_date = "2013-01-01",
  end_date = "2016-12-31",
  tag = "TSSM_CON"
) %>% aggregate_climate(frequency = "month")

To make the download process even easier, and avoid the creation of already known geometries like municipalities or departments, ColOpenData offers an extra function to download data using the areas’ DIVIPOLA code.

DIVIPOLA codification is standardized for the whole country, and contains departments’ and municipalities’ codes. For further details on DIVIPOLA codification and functions please refer to Documentation and Dictionaries. We will filter for the city of Espinal in the department Tolima. :::

espinal_code <- name_to_code_mun("Tolima", "Espinal")
espinal_code
#> [1] "73268"

The function download_climate() will require almost the same arguments as download_climate_geom(), but instead of an sf object, it will take a character containing the DIVIPOLA code:

The code below can be used to get the same results as the previous two examples, without the need to create a whole geometry or filtering individual stations.

tssm_mpio <- download_climate(
  code = espinal_code,
  start_date = "2013-01-01",
  end_date = "2016-12-31",
  tag = "TMX_CON"
) %>% aggregate_climate(frequency = "month")

Disclaimer