galah is an R interface to biodiversity data hosted by the Global Biodiversity Information Facility (GBIF) and its subsidiary node organisations. GBIF and its partner nodes collate and store observations of individual life forms using the ‘Darwin Core’ data standard.
To install from CRAN:
Or install the development version from GitHub:
Load the package
By default, galah downloads information from the Atlas of Living
Australia (ALA). To show the full list of organisations currently
supported by galah, use show_all(atlases)
.
## # A tibble: 10 × 4
## region institution acronym url
## <chr> <chr> <chr> <chr>
## 1 Australia Atlas of Living Australia ALA https://www.ala.org.au
## 2 Austria Biodiversitäts-Atlas Österreich BAO https://biodiversityat…
## 3 Brazil Sistemas de Informações sobre a Biodiversidade Brasileira SiBBr https://sibbr.gov.br
## 4 France Portail français d'accès aux données d'observation sur les espèces OpenObs https://openobs.mnhn.fr
## 5 Global Global Biodiversity Information Facility GBIF https://gbif.org
## 6 Guatemala Sistema Nacional de Información sobre Diversidad Biológica de Guatemala SNIBgt https://snib.conap.gob…
## 7 Portugal GBIF Portugal GBIF.pt https://www.gbif.pt
## 8 Spain GBIF Spain GBIF.es https://gbif.es
## 9 Sweden Swedish Biodiversity Data Infrastructure SBDI https://biodiversityda…
## 10 United Kingdom National Biodiversity Network NBN https://nbn.org.uk
Use galah_config()
to set the node organisation using
its region, name, or acronym. Once set, galah
will
automatically populate the server configuration for your selected GBIF
node. To download occurrence records from your chosen GBIF node, you
will need to register an account with them (using their website), then
provide your registration email to galah. To download from GBIF, you
will need to provide the email, username, and password.
galah_config(atlas = "GBIF",
username = "user1",
email = "email@email.com",
password = "my_password")
You can find a full list of configuration options by running
?galah_config
.
The standard method to construct queries in {galah}
is
via piped functions. Pipes in galah
start with the
galah_call()
function, and typically end with
collect()
, though collapse()
and
compute()
are also supported. The development team use the
base pipe by default (|>
), but the
{magrittr}
pipe (%>%
) should work too.
## # A tibble: 1 × 1
## count
## <int>
## 1 146185520
To pass more complex queries, you can use additional
{dplyr}
functions such as filter()
,
select()
, and group_by()
.
## # A tibble: 1 × 1
## count
## <int>
## 1 40200358
Each GBIF node allows you to query using their own set of in-built
fields. You can investigate which fields are available using
show_all()
and search_all()
:
## # A tibble: 2 × 3
## id description type
## <chr> <chr> <chr>
## 1 cl2013 ASGS Australian States and Territories fields
## 2 cl22 Australian States and Territories fields
To narrow your search to a particular taxonomic group, use
identify()
. Note that this function only accepts scientific
names and is not case sensitive. It’s good practice to first use
search_taxa()
to check that the taxa you provide returns
the correct taxonomic results.
## # A tibble: 1 × 9
## search_term scientific_name taxon_concept_id rank match_type kingdom phylum class issues
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 reptilia REPTILIA https://biodiversity.org.au/afd/taxa/682e1228… class exactMatch Animal… Chord… Rept… noIss…
## # A tibble: 1 × 1
## count
## <int>
## 1 338434
If you want to query something other than the number of records,
modify the type
argument in galah_call()
. Here
we’ll query the number of species:
galah_call(type = "species") |>
identify("reptilia") |>
filter(year >= 2020) |>
count() |>
collect()
## # A tibble: 1 × 1
## count
## <int>
## 1 883
To download records—rather than find how many records are
available—simply remove the count()
function from your
pipe.
result <- galah_call() |>
identify("Litoria") |>
filter(year >= 2020, cl22 == "Tasmania") |>
select(basisOfRecord, group = "basic") |>
collect()
## Retrying in 1 seconds.
## # A tibble: 6 × 9
## recordID scientificName taxonConceptID decimalLatitude decimalLongitude eventDate occurrenceStatus
## <chr> <chr> <chr> <dbl> <dbl> <dttm> <chr>
## 1 00052544-d943-42e9… Litoria ewing… https://biodi… -42.9 147. 2022-09-19 00:00:00 PRESENT
## 2 00168ca6-84d0-4af1… Litoria ranif… https://biodi… -41.2 146. 2023-12-21 10:20:19 PRESENT
## 3 001a43fe-8586-4064… Litoria ewing… https://biodi… -43.0 147. 2021-08-07 00:00:00 PRESENT
## 4 00250163-ec50-4eda… Litoria ranif… https://biodi… -41.2 147. 2023-08-23 11:49:28 PRESENT
## 5 003e0f63-9f95-4af9… Litoria ewing… https://biodi… -42.9 148. 2022-12-24 06:27:00 PRESENT
## 6 0070521f-bb45-46fb… Litoria ewing… https://biodi… -43.1 147. 2023-12-20 14:29:23 PRESENT
## # ℹ 2 more variables: dataResourceName <chr>, basisOfRecord <chr>
Check out our other vignettes for more detail on how to use these functions.