Download data

Martin Westgate & Dax Kellie

2024-11-19

The atlas_ functions are used to return data from the atlas chosen using galah_config(). They are:

The final atlas_ function—atlas_citation()—is unusual: It does not return any new data, but instead provides a citation for an existing dataset (downloaded using atlas_occurrences()) with an associated DOI. The other functions are described below.

It is equally permissable to use the type argument of galah_call() to specify the kind of data you want, and then retrieve the data using collect(). Here we use the atlas_ prefix for consistency with earlier versions of galah, and because many atlas_ functions sometimes include shortcuts to make life easier.

Record counts

atlas_counts() provides summary counts of records in the specified atlas without needing to download all the records first.

galah_config(atlas = "Australia")
# Total number of records in the ALA
atlas_counts()
## # A tibble: 1 × 1
##       count
##       <int>
## 1 146185520

Group and summarise record counts by specific fields using galah_group_by().

galah_call() |>
  galah_group_by(kingdom) |>
  atlas_counts()
## # A tibble: 12 × 2
##    kingdom           count
##    <chr>             <int>
##  1 Animalia      113408280
##  2 Plantae        27572183
##  3 Fungi           2448600
##  4 Chromista       1057157
##  5 Protista         316541
##  6 Bacteria         113480
##  7 Archaea            4120
##  8 Virus              2382
##  9 Bamfordvirae        210
## 10 Orthornavirae       138
## 11 Viroid              104
## 12 Shotokuvirae         41

Species lists

A common use case of atlas data is to identify which species occur in a specified region, time period, or taxonomic group. atlas_species() is similar to search_taxa(), in that it returns taxonomic information and unique identifiers, but differs by returning information only on species and is far more flexible by supporting filtering.

species <- galah_call() |>
  galah_identify("Rodentia") |>
  galah_filter(stateProvince == "Northern Territory") |>
  atlas_species()
  
species |> head()
## # A tibble: 6 × 11
##   taxon_concept_id species_name scientific_name_auth…¹ taxon_rank kingdom phylum class order family genus vernacular_name
##   <chr>            <chr>        <chr>                  <chr>      <chr>   <chr>  <chr> <chr> <chr>  <chr> <chr>          
## 1 https://biodive… Pseudomys d… (Gould, 1842)          species    Animal… Chord… Mamm… Rode… Murid… Pseu… Delicate Mouse 
## 2 https://biodive… Mesembriomy… (J.E. Gray, 1843)      species    Animal… Chord… Mamm… Rode… Murid… Mese… Black-footed T…
## 3 https://biodive… Zyzomys arg… (Thomas, 1889)         species    Animal… Chord… Mamm… Rode… Murid… Zyzo… Common Rock-rat
## 4 https://biodive… Pseudomys h… (Waite, 1896)          species    Animal… Chord… Mamm… Rode… Murid… Pseu… Sandy Inland M…
## 5 https://biodive… Melomys bur… (Ramsay, 1887)         species    Animal… Chord… Mamm… Rode… Murid… Melo… Grassland Melo…
## 6 https://biodive… Notomys ale… Thomas, 1922           species    Animal… Chord… Mamm… Rode… Murid… Noto… Spinifex Hoppi…
## # ℹ abbreviated name: ¹​scientific_name_authorship

Occurrence data

To download occurrence data you will need to specify an email in galah_config() that has been registered to an account with your selected GBIF node. See more information in the config section.

galah_config(email = "your_email@email.com", atlas = "Australia")

Download occurrence records for Eolophus roseicapilla.

occ <- galah_call() |>
  galah_identify("Eolophus roseicapilla") |>
  galah_filter(
    stateProvince == "Australian Capital Territory",
    year >= 2010,
    profile = "ALA"
  ) |>
  galah_select(institutionID, group = "basic") |>
  atlas_occurrences()
## Retrying in 1 seconds.
## Retrying in 2 seconds.
## Retrying in 4 seconds.
occ |> head()
## # A tibble: 6 × 9
##   recordID            scientificName taxonConceptID decimalLatitude decimalLongitude eventDate           occurrenceStatus
##   <chr>               <chr>          <chr>                    <dbl>            <dbl> <dttm>              <chr>           
## 1 0000a928-d756-42eb… Eolophus rose… https://biodi…           -35.6             149. 2017-04-19 09:11:00 PRESENT         
## 2 0001bc78-d2e9-48aa… Eolophus rose… https://biodi…           -35.2             149. 2019-08-13 15:13:00 PRESENT         
## 3 0002064f-08ea-425b… Eolophus rose… https://biodi…           -35.3             149. 2014-03-16 06:48:00 PRESENT         
## 4 00022dd2-9f85-4802… Eolophus rose… https://biodi…           -35.3             149. 2022-05-08 08:20:00 PRESENT         
## 5 0002cc35-8d5a-4d20… Eolophus rose… https://biodi…           -35.3             149. 2015-11-01 08:00:00 PRESENT         
## 6 00030a8c-082f-44f0… Eolophus rose… https://biodi…           -35.3             149. 2022-01-06 11:47:00 PRESENT         
## # ℹ 2 more variables: dataResourceName <chr>, institutionID <lgl>

Media metadata

In addition to text data describing individual occurrences and their attributes, ALA stores images, sounds and videos associated with a given record. Metadata on these records can be downloaded using atlas_media().

media_data <- galah_call() |>
  galah_identify("Eolophus roseicapilla") |>
  galah_filter(
    year == 2020,
    cl22 == "Australian Capital Territory") |>
  atlas_media()
  
media_data |> head()
## # A tibble: 6 × 19
##   media_id   recordID scientificName taxonConceptID decimalLatitude decimalLongitude eventDate           occurrenceStatus
##   <chr>      <chr>    <chr>          <chr>                    <dbl>            <dbl> <dttm>              <chr>           
## 1 ff8322d0-… 003a192… Eolophus rose… https://biodi…           -35.3             149. 2020-09-12 16:11:00 PRESENT         
## 2 c66fc819-… 015ee7c… Eolophus rose… https://biodi…           -35.4             149. 2020-08-09 15:11:00 PRESENT         
## 3 fe6d7b94-… 05e86b7… Eolophus rose… https://biodi…           -35.4             149. 2020-11-13 22:29:00 PRESENT         
## 4 2f4d32c0-… 063bb0f… Eolophus rose… https://biodi…           -35.6             149. 2020-08-04 11:50:00 PRESENT         
## 5 73407414-… 063bb0f… Eolophus rose… https://biodi…           -35.6             149. 2020-08-04 11:50:00 PRESENT         
## 6 89171c49-… 063bb0f… Eolophus rose… https://biodi…           -35.6             149. 2020-08-04 11:50:00 PRESENT         
## # ℹ 11 more variables: dataResourceName <chr>, multimedia <chr>, images <chr>, sounds <lgl>, videos <lgl>,
## #   creator <chr>, license <chr>, mimetype <chr>, width <int>, height <int>, image_url <chr>

To actually download the media files to your computer, use [collect_media()].

media_data |>
  collect_media()

Taxonomic trees

atlas_taxonomy() provides a way to build taxonomic trees from one clade down to another using each GBIF node’s internal taxonomy. Specify which taxonomic level your tree will go down to with galah_filter() using the rank argument.

galah_call() |>
  galah_identify("chordata") |>
  galah_filter(rank == class) |>
  atlas_taxonomy()
## # A tibble: 19 × 4
##    name            rank      parent_taxon_concept_id                                                   taxon_concept_id  
##    <chr>           <chr>     <chr>                                                                     <chr>             
##  1 Chordata        phylum    <NA>                                                                      https://biodivers…
##  2 Cephalochordata subphylum https://biodiversity.org.au/afd/taxa/065f1da4-53cd-40b8-a396-80fa5c74dedd https://biodivers…
##  3 Tunicata        subphylum https://biodiversity.org.au/afd/taxa/065f1da4-53cd-40b8-a396-80fa5c74dedd https://biodivers…
##  4 Appendicularia  class     https://biodiversity.org.au/afd/taxa/1c20ed62-d918-4e42-b625-8b86d533cc51 https://biodivers…
##  5 Ascidiacea      class     https://biodiversity.org.au/afd/taxa/1c20ed62-d918-4e42-b625-8b86d533cc51 https://biodivers…
##  6 Thaliacea       class     https://biodiversity.org.au/afd/taxa/1c20ed62-d918-4e42-b625-8b86d533cc51 https://biodivers…
##  7 Vertebrata      subphylum https://biodiversity.org.au/afd/taxa/065f1da4-53cd-40b8-a396-80fa5c74dedd https://biodivers…
##  8 Agnatha         informal  https://biodiversity.org.au/afd/taxa/5d6076b1-b7c7-487f-9d61-0fea0111cc7e https://biodivers…
##  9 Myxini          informal  https://biodiversity.org.au/afd/taxa/66db22c8-891d-4b16-a1a2-b66feaeaa3e0 https://biodivers…
## 10 Petromyzontida  informal  https://biodiversity.org.au/afd/taxa/66db22c8-891d-4b16-a1a2-b66feaeaa3e0 https://biodivers…
## 11 Gnathostomata   informal  https://biodiversity.org.au/afd/taxa/5d6076b1-b7c7-487f-9d61-0fea0111cc7e https://biodivers…
## 12 Amphibia        class     https://biodiversity.org.au/afd/taxa/ef5515fd-a0a2-4e16-b61a-0f19f8900f76 https://biodivers…
## 13 Aves            class     https://biodiversity.org.au/afd/taxa/ef5515fd-a0a2-4e16-b61a-0f19f8900f76 https://biodivers…
## 14 Mammalia        class     https://biodiversity.org.au/afd/taxa/ef5515fd-a0a2-4e16-b61a-0f19f8900f76 https://biodivers…
## 15 Reptilia        class     https://biodiversity.org.au/afd/taxa/ef5515fd-a0a2-4e16-b61a-0f19f8900f76 https://biodivers…
## 16 Pisces          informal  https://biodiversity.org.au/afd/taxa/ef5515fd-a0a2-4e16-b61a-0f19f8900f76 https://biodivers…
## 17 Actinopterygii  class     https://biodiversity.org.au/afd/taxa/e22efeb4-2cb5-4250-8d71-61c48bdaa051 https://biodivers…
## 18 Chondrichthyes  class     https://biodiversity.org.au/afd/taxa/e22efeb4-2cb5-4250-8d71-61c48bdaa051 https://biodivers…
## 19 Sarcopterygii   class     https://biodiversity.org.au/afd/taxa/e22efeb4-2cb5-4250-8d71-61c48bdaa051 https://biodivers…

Configuring galah

Various aspects of the galah package can be customized.

Email

To download occurrence records, species lists or media, you will need to provide an email address registered with the service that you want to use (e.g. for the ALA you can create an account here).

Once an email is registered, it should be stored in the config:

galah_config(email = "myemail@gmail.com")

Setting your directory

By default, galah stores downloads in a temporary folder, meaning that the local files are automatically deleted when the R session is ended. This behaviour can be altered so that downloaded files are preserved by setting the directory to a non-temporary location.

galah_config(directory = "example/dir")

Setting the download reason

ALA requires that you provide a reason when downloading occurrence data (via the galah atlas_occurrences() function). reason is set as “scientific research” by default, but you can change this using galah_config(). See show_all(reasons) for valid download reasons.

galah_config(download_reason_id = your_reason_id)

Debugging

If things aren’t working as expected, more detail (particularly about web requests and caching behaviour) can be obtained by setting verbose = TRUE.

galah_config(verbose = TRUE)