Quick start guide

Martin Westgate & Dax Kellie

2024-11-19

galah is an R interface to biodiversity data hosted by the Global Biodiversity Information Facility (GBIF) and its subsidiary node organisations. GBIF and its partner nodes collate and store observations of individual life forms using the ‘Darwin Core’ data standard.

Installation

To install from CRAN:

install.packages("galah")

Or install the development version from GitHub:

install.packages("remotes")
remotes::install_github("AtlasOfLivingAustralia/galah")

Load the package

library(galah)

Configuration

By default, galah downloads information from the Atlas of Living Australia (ALA). To show the full list of organisations currently supported by galah, use show_all(atlases).

show_all(atlases)
## # A tibble: 10 × 4
##    region         institution                                                             acronym url                    
##    <chr>          <chr>                                                                   <chr>   <chr>                  
##  1 Australia      Atlas of Living Australia                                               ALA     https://www.ala.org.au 
##  2 Austria        Biodiversitäts-Atlas Österreich                                         BAO     https://biodiversityat…
##  3 Brazil         Sistemas de Informações sobre a Biodiversidade Brasileira               SiBBr   https://sibbr.gov.br   
##  4 France         Portail français d'accès aux données d'observation sur les espèces      OpenObs https://openobs.mnhn.fr
##  5 Global         Global Biodiversity Information Facility                                GBIF    https://gbif.org       
##  6 Guatemala      Sistema Nacional de Información sobre Diversidad Biológica de Guatemala SNIBgt  https://snib.conap.gob…
##  7 Portugal       GBIF Portugal                                                           GBIF.pt https://www.gbif.pt    
##  8 Spain          GBIF Spain                                                              GBIF.es https://gbif.es        
##  9 Sweden         Swedish Biodiversity Data Infrastructure                                SBDI    https://biodiversityda…
## 10 United Kingdom National Biodiversity Network                                           NBN     https://nbn.org.uk

Use galah_config() to set the node organisation using its region, name, or acronym. Once set, galah will automatically populate the server configuration for your selected GBIF node. To download occurrence records from your chosen GBIF node, you will need to register an account with them (using their website), then provide your registration email to galah. To download from GBIF, you will need to provide the email, username, and password.

galah_config(atlas = "GBIF",
             username = "user1",
             email = "email@email.com",
             password = "my_password")

You can find a full list of configuration options by running ?galah_config.

Basic syntax

The standard method to construct queries in {galah} is via piped functions. Pipes in galah start with the galah_call() function, and typically end with collect(), though collapse() and compute() are also supported. The development team use the base pipe by default (|>), but the {magrittr} pipe (%>%) should work too.

galah_config(atlas = "ALA",
             verbose = FALSE)
galah_call() |>
  count() |>
  collect()
## # A tibble: 1 × 1
##       count
##       <int>
## 1 146185520

To pass more complex queries, you can use additional {dplyr} functions such as filter(), select(), and group_by().

galah_call() |> 
  filter(year >= 2020) |> 
  count() |>
  collect()
## # A tibble: 1 × 1
##      count
##      <int>
## 1 40200358

Each GBIF node allows you to query using their own set of in-built fields. You can investigate which fields are available using show_all() and search_all():

search_all(fields, "australian states")
## # A tibble: 2 × 3
##   id     description                            type  
##   <chr>  <chr>                                  <chr> 
## 1 cl2013 ASGS Australian States and Territories fields
## 2 cl22   Australian States and Territories      fields

Taxonomic searches

To narrow your search to a particular taxonomic group, use identify(). Note that this function only accepts scientific names and is not case sensitive. It’s good practice to first use search_taxa() to check that the taxa you provide returns the correct taxonomic results.

search_taxa("reptilia") # Check whether taxonomic info is correct
## # A tibble: 1 × 9
##   search_term scientific_name taxon_concept_id                               rank  match_type kingdom phylum class issues
##   <chr>       <chr>           <chr>                                          <chr> <chr>      <chr>   <chr>  <chr> <chr> 
## 1 reptilia    REPTILIA        https://biodiversity.org.au/afd/taxa/682e1228… class exactMatch Animal… Chord… Rept… noIss…
galah_call() |>
  identify("reptilia") |> 
  filter(year >= 2020) |> 
  count() |>
  collect()
## # A tibble: 1 × 1
##    count
##    <int>
## 1 338434

If you want to query something other than the number of records, modify the type argument in galah_call(). Here we’ll query the number of species:

galah_call(type = "species") |>
  identify("reptilia") |> 
  filter(year >= 2020) |> 
  count() |>
  collect()
## # A tibble: 1 × 1
##   count
##   <int>
## 1   883

Download

To download records—rather than find how many records are available—simply remove the count() function from your pipe.

result <- galah_call() |>
  identify("Litoria") |>
  filter(year >= 2020, cl22 == "Tasmania") |>
  select(basisOfRecord, group = "basic") |>
  collect()
## Retrying in 1 seconds.
result |> head()
## # A tibble: 6 × 9
##   recordID            scientificName taxonConceptID decimalLatitude decimalLongitude eventDate           occurrenceStatus
##   <chr>               <chr>          <chr>                    <dbl>            <dbl> <dttm>              <chr>           
## 1 00052544-d943-42e9… Litoria ewing… https://biodi…           -42.9             147. 2022-09-19 00:00:00 PRESENT         
## 2 00168ca6-84d0-4af1… Litoria ranif… https://biodi…           -41.2             146. 2023-12-21 10:20:19 PRESENT         
## 3 001a43fe-8586-4064… Litoria ewing… https://biodi…           -43.0             147. 2021-08-07 00:00:00 PRESENT         
## 4 00250163-ec50-4eda… Litoria ranif… https://biodi…           -41.2             147. 2023-08-23 11:49:28 PRESENT         
## 5 003e0f63-9f95-4af9… Litoria ewing… https://biodi…           -42.9             148. 2022-12-24 06:27:00 PRESENT         
## 6 0070521f-bb45-46fb… Litoria ewing… https://biodi…           -43.1             147. 2023-12-20 14:29:23 PRESENT         
## # ℹ 2 more variables: dataResourceName <chr>, basisOfRecord <chr>

Check out our other vignettes for more detail on how to use these functions.