Currently, there are 6 functions associated with the
sample
verb in the sgsR
package:
sample_srs()
- simple random sampling
sample_systematic()
- systematic sampling in a grid
or hexagon tessellation
sample_strat()
- stratified sampling within a
sraster
sample_clhs()
- Latin hypercube sampling
sample_balanced()
- see BalancedSampling
sample_ahels()
- adapted hypercube evaluation of a
legacy sample (ahels)
One key feature of using some sample_*
functions is its
ability to define access
corridors. Users can supply a road
access
network (must be sf
line objects) and
define buffers around access
where samples should be
excluded and included.
Relevant and applicable parameters when access
is
defined are:
buff_inner
- Can be left as NULL
(default). Inner buffer parameter that defines the distance from
access
where samples cannot be taken (i.e. if you don’t
want samples within 50 m of your access
layer set
buff_inner = 50
).
buff_outer
- Outer buffer parameter that defines the
maximum distance that the samples can be located from
access
(i.e. if you don’t want samples more than 200 meters
from your access
layer set
buff_inner = 200
).
sample_srs
We have demonstrated a simple example of using the
sample_srs()
function in vignette("sgsR")
. We
will demonstrate additional examples below.
The input required for sample_srs()
is a
raster
. This means that sraster
and
mraster
are supported for this function.
#--- perform simple random sampling ---#
sample_srs(raster = sraster, # input sraster
nSamp = 200, # number of desired samples
plot = TRUE) # plot
#> Simple feature collection with 200 features and 0 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431150 ymin: 5337710 xmax: 438510 ymax: 5343230
#> CRS: +proj=utm +zone=17 +ellps=GRS80 +units=m +no_defs
#> First 10 features:
#> geometry
#> 1 POINT (434130 5342370)
#> 2 POINT (434130 5342370)
#> 3 POINT (435830 5340750)
#> 4 POINT (435910 5340630)
#> 5 POINT (435490 5339090)
#> 6 POINT (436970 5343030)
#> 7 POINT (438110 5341770)
#> 8 POINT (432290 5338050)
#> 9 POINT (438070 5339350)
#> 10 POINT (432410 5341450)
sample_srs(raster = mraster, # input mraster
nSamp = 200, # number of desired samples
access = access, # define access road network
mindist = 200, # minimum distance samples must be apart from one another
buff_inner = 50, # inner buffer - no samples within this distance from road
buff_outer = 200, # outer buffer - no samples further than this distance from road
plot = TRUE) # plot
#> Simple feature collection with 200 features and 0 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431170 ymin: 5337810 xmax: 438510 ymax: 5343230
#> CRS: +proj=utm +zone=17 +ellps=GRS80 +units=m +no_defs
#> First 10 features:
#> geometry
#> 1 POINT (436190 5342970)
#> 2 POINT (435150 5342830)
#> 3 POINT (435990 5339610)
#> 4 POINT (434590 5341350)
#> 5 POINT (434990 5342210)
#> 6 POINT (438490 5338930)
#> 7 POINT (435050 5342010)
#> 8 POINT (434790 5339650)
#> 9 POINT (438310 5340850)
#> 10 POINT (436430 5342910)
sample_systematic
The sample_systematic()
function applies systematic
sampling across an area with the cellsize
parameter
defining the resolution of the tessellation. The tessellation shape can
be modified using the square
parameter. Assigning
TRUE
(default) to the square
parameter results
in a regular grid and assigning FALSE
results in a
hexagonal grid. The location of samples can also be adjusted using the
locations
parameter, where centers
takes the
center, corners
takes all corners, and random
takes a random location within each tessellation.
#--- perform grid sampling ---#
sample_systematic(raster = sraster, # input sraster
cellsize = 1000, # grid distance
plot = TRUE) # plot
#> Simple feature collection with 40 features and 0 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431600 ymin: 5338200 xmax: 437600 ymax: 5343200
#> CRS: +proj=utm +zone=17 +ellps=GRS80 +units=m +no_defs
#> First 10 features:
#> geometry
#> 1 POINT (431600 5338200)
#> 2 POINT (432600 5338200)
#> 3 POINT (433600 5338200)
#> 4 POINT (434600 5338200)
#> 5 POINT (435600 5338200)
#> 6 POINT (436600 5338200)
#> 7 POINT (437600 5338200)
#> 8 POINT (432600 5339200)
#> 9 POINT (433600 5339200)
#> 10 POINT (434600 5339200)
#--- perform grid sampling ---#
sample_systematic(raster = sraster, # input sraster
cellsize = 500, # grid distance
square = FALSE, # hexagonal tessellation
location = "random", # random sample within tessellation
plot = TRUE) # plot
#> Simple feature collection with 172 features and 0 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431256.3 ymin: 5337713 xmax: 438538 ymax: 5343209
#> CRS: +proj=utm +zone=17 +ellps=GRS80 +units=m +no_defs
#> First 10 features:
#> geometry
#> 1 POINT (431256.3 5339502)
#> 2 POINT (431263.2 5341056)
#> 3 POINT (431400.5 5338328)
#> 4 POINT (431286.7 5339189)
#> 5 POINT (431267.2 5340060)
#> 6 POINT (431282.8 5340762)
#> 7 POINT (431349.8 5341644)
#> 8 POINT (431397.7 5342293)
#> 9 POINT (431523.3 5337713)
#> 10 POINT (431525.2 5339505)
sample_systematic(raster = sraster, # input sraster
cellsize = 500, # grid distance
access = access, # define access road network
buff_outer = 200, # outer buffer - no samples further than this distance from road
square = FALSE, # hexagonal tessellation
location = "corners", # take corners instead of centers
plot = TRUE)
#> Simple feature collection with 645 features and 0 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431100 ymin: 5337844 xmax: 438350 ymax: 5343185
#> CRS: +proj=utm +zone=17 +ellps=GRS80 +units=m +no_defs
#> First 10 features:
#> geometry
#> 1 POINT (431100 5340875)
#> 2 POINT (431100 5340587)
#> 3 POINT (431100 5342607)
#> 4 POINT (431100 5340587)
#> 5 POINT (431350 5340442)
#> 6 POINT (431100 5340875)
#> 7 POINT (431100 5340875)
#> 8 POINT (431100 5342607)
#> 9 POINT (431350 5342752)
#> 10 POINT (431100 5342607)
sample_strat
The sample_strat()
function contains a hierarchical
sampling algorithm, which was originally developed by Martin
Queinnec.
Queinnec, M., White, J. C., & Coops, N. C. (2021). Comparing airborne and spaceborne photon-counting LiDAR canopy structural estimates across different boreal forest types. Remote Sensing of Environment, 262(August 2020), 112510.
This algorithm uses moving window (wrow
and
wcol
parameters) to filter the input sraster
to prioritize sample locations where stratum pixels are spatially
grouped, rather than dispersed individuals across the landscape.
Sampling is performed using 2 rules:
Rule 1 - Sample within spatially grouped stratum
pixels. Moving window defined by wrow
and
wcol
.
Rule 2 - If no more samples exist to satisfy desired sampling count, individual stratum pixels are sampled.
The rule applied to a select a particular sample is defined in the
rule
attribute of output samples. We give a few examples
below:
#--- perform stratified sampling random sampling ---#
sample_strat(sraster = sraster, # input sraster
nSamp = 200, # desired sample number
plot = TRUE) # plot
#> Simple feature collection with 200 features and 3 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431210 ymin: 5337710 xmax: 438510 ymax: 5343230
#> CRS: +proj=utm +zone=17 +ellps=GRS80 +units=m +no_defs
#> First 10 features:
#> strata type rule geometry
#> x 1 new rule1 POINT (434550 5342510)
#> x1 1 new rule2 POINT (433910 5342930)
#> x2 1 new rule2 POINT (436610 5339890)
#> x3 1 new rule2 POINT (434290 5340870)
#> x4 1 new rule2 POINT (433410 5340990)
#> x5 1 new rule2 POINT (437510 5338010)
#> x6 1 new rule2 POINT (434350 5340910)
#> x7 1 new rule2 POINT (432010 5341550)
#> x8 1 new rule2 POINT (434250 5340270)
#> x9 1 new rule2 POINT (433330 5341390)
In some cases, users might want to include existing
samples within the algorithm. In order to adjust the total number of
samples needed per stratum to reflect those already present in
existing
, we can use the intermediate function
extract_strata()
.
This function uses the sraster
and existing
samples and extracts the stratum for each. These samples can be included
within sample_strat()
, which adjusts total samples required
per class based on representation in existing
.
#--- extract strata values to existing samples ---#
<- extract_strata(sraster = sraster, # input sraster
e.sr existing = existing) # existing samples to add strata value to
e.sr#> Simple feature collection with 200 features and 4 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431110 ymin: 5337730 xmax: 438550 ymax: 5343210
#> Projected CRS: UTM Zone 17, Northern Hemisphere
#> First 10 features:
#> strata strata.1 type rule geometry
#> 1 1 1 new rule1 POINT (433670 5340630)
#> 2 1 1 new rule2 POINT (436570 5340290)
#> 3 1 1 new rule2 POINT (433730 5342950)
#> 4 1 1 new rule2 POINT (434630 5341570)
#> 5 1 1 new rule2 POINT (437870 5338770)
#> 6 1 1 new rule2 POINT (435370 5339170)
#> 7 1 1 new rule2 POINT (435990 5339650)
#> 8 1 1 new rule2 POINT (434150 5342690)
#> 9 1 1 new rule2 POINT (437230 5338090)
#> 10 1 1 new rule2 POINT (434330 5341730)
Notice that e.sr
now has an attribute named strata. If
that parameter is not there, sample_strat()
will give an
error.
sample_strat(sraster = sraster, # input sraster
nSamp = 200, # desired sample number
access = access, # define access road network
existing = e.sr, # existing samples with strata values
mindist = 200, # minimum distance samples must be apart from one another
buff_inner = 50, # inner buffer - no samples within this distance from road
buff_outer = 200, # outer buffer - no samples further than this distance from road
plot = TRUE) # plot
#> Simple feature collection with 400 features and 3 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431110 ymin: 5337710 xmax: 438550 ymax: 5343230
#> CRS: +proj=utm +zone=17 +ellps=GRS80 +units=m +no_defs
#> First 10 features:
#> strata type rule geometry
#> 1 1 existing existing POINT (433670 5340630)
#> 2 1 existing existing POINT (436570 5340290)
#> 3 1 existing existing POINT (433730 5342950)
#> 4 1 existing existing POINT (434630 5341570)
#> 5 1 existing existing POINT (437870 5338770)
#> 6 1 existing existing POINT (435370 5339170)
#> 7 1 existing existing POINT (435990 5339650)
#> 8 1 existing existing POINT (434150 5342690)
#> 9 1 existing existing POINT (437230 5338090)
#> 10 1 existing existing POINT (434330 5341730)
As seen on the code in the example above, the defined
mindist
parameter specifies the minimum euclidean distance
that samples must be apart from one another.
Notice that the sample outputs have type
and
rule
attributes which outline whether the samples are
existing
or new
and whether rule1
or rule2
were used to select the individual samples.
sample_strat(sraster = sraster, # input
nSamp = 200, # desired sample number
access = access, # define access road network
existing = e.sr, # existing samples with strata values
include = TRUE, # include existing plots in nSamp total
buff_outer = 200, # outer buffer - no samples further than this distance from road
plot = TRUE) # plot
#> Simple feature collection with 200 features and 3 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431110 ymin: 5337730 xmax: 438550 ymax: 5343210
#> CRS: +proj=utm +zone=17 +ellps=GRS80 +units=m +no_defs
#> First 10 features:
#> strata type rule geometry
#> 1 1 existing existing POINT (433670 5340630)
#> 2 1 existing existing POINT (436570 5340290)
#> 3 1 existing existing POINT (433730 5342950)
#> 4 1 existing existing POINT (434630 5341570)
#> 5 1 existing existing POINT (437870 5338770)
#> 6 1 existing existing POINT (435370 5339170)
#> 7 1 existing existing POINT (435990 5339650)
#> 8 1 existing existing POINT (434150 5342690)
#> 9 1 existing existing POINT (437230 5338090)
#> 10 1 existing existing POINT (434330 5341730)
The include
parameter determines whether
existing
samples should be included in the total count of
samples defined by nSamp
. By default, the
include
parameter is set as FALSE
.
sample_clhs
sample_clhs()
function implements conditioned Latin
hypercube (clhs) sampling methodology from the clhs
package. A number of other functions in the sgsR
package
help to provide guidance on clhs sampling including
calculate_pop()
and calculate_lhsOpt()
. Check
out these functions to better understand how sample numbers could be
optimized.
The syntax for this function is similar to others shown above,
although parameters like iter
, which define the number of
iterations within the Metropolis-Hastings process are important to
consider. In these examples we use a low iter
value because
it takes less time to run. Default values for iter
within
the clhs
package are 10,000.
sample_clhs(mraster = mraster, # input
nSamp = 200, # desired sample number
plot = TRUE, # plot
iter = 100) # number of iterations
sample_clhs(mraster = mraster, # input
nSamp = 300, # desired sample number
iter = 100, # number of iterations
existing = existing, # existing samples
access = access, # define access road network
buff_inner = 100, # inner buffer - no samples within this distance from road
buff_outer = 300, # outer buffer - no samples further than this distance from road
plot = TRUE) # plot
The cost
parameter defines the mraster
covariate, which is used to constrain the clhs sampling. This could be
any number of variables. An example could be the distance a pixel is
from road access
(e.g. from
calculate_distance()
see example below), terrain slope, the
output from calculate_coobs()
, or many others.
#--- cost constrained examples ---#
#--- calculate distance to access layer for each pixel in mr ---#
<- calculate_distance(raster = mraster, # input
mr.c access = access,
plot = TRUE) # define access road network
sample_clhs(mraster = mr.c, # input
nSamp = 250, # desired sample number
iter = 100, # number of iterations
cost = "dist2access", # cost parameter - name defined in calculate_distance()
plot = TRUE) # plot
sample_balanced
The sample_balanced()
algorithm performs a balanced
sampling methodology from the stratifyR / SamplingBigData
packages.
sample_balanced(mraster = mraster, # input
nSamp = 200, # desired sample number
plot = TRUE) # plot
#> Simple feature collection with 200 features and 0 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431110 ymin: 5337710 xmax: 438550 ymax: 5343190
#> CRS: +proj=utm +zone=17 +ellps=GRS80 +units=m +no_defs
#> First 10 features:
#> geometry
#> 1 POINT (437770 5343190)
#> 2 POINT (435730 5343170)
#> 3 POINT (431590 5343110)
#> 4 POINT (438210 5343090)
#> 5 POINT (431290 5343070)
#> 6 POINT (434790 5343070)
#> 7 POINT (437950 5343050)
#> 8 POINT (437710 5343030)
#> 9 POINT (438110 5342990)
#> 10 POINT (434070 5342950)
sample_balanced(mraster = mraster, # input
nSamp = 100, # desired sample number
algorithm = "lcube", # algorithm type
access = access, # define access road network
buff_inner = 50, # inner buffer - no samples within this distance from road
buff_outer = 200) # outer buffer - no samples further than this distance from road
#> Simple feature collection with 100 features and 0 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431670 ymin: 5337770 xmax: 438470 ymax: 5343230
#> CRS: +proj=utm +zone=17 +ellps=GRS80 +units=m +no_defs
#> First 10 features:
#> geometry
#> 1 POINT (432730 5341170)
#> 2 POINT (437010 5341530)
#> 3 POINT (434850 5337970)
#> 4 POINT (437930 5342410)
#> 5 POINT (435850 5342330)
#> 6 POINT (432450 5341410)
#> 7 POINT (436150 5339550)
#> 8 POINT (435670 5340050)
#> 9 POINT (432850 5340190)
#> 10 POINT (434350 5338790)
sample_ahels
The sample_ahels()
function performs the adapted
Hypercube Evaluation of a Legacy Sample (ahels) algorithm
usingexisting
sample data and an mraster
. New
samples are allocated based on quantile ratios between the
existing
sample and mraster
covariate
dataset.
This algorithm was adapted from that presented in the paper below, which we highly recommend.
Malone BP, Minansy B, Brungard C. 2019. Some methods to improve the utility of conditioned Latin hypercube sampling. PeerJ 7:e6451 DOI 10.7717/peerj.6451
This algorithm:
Determines the quantile distributions of existing
samples and mraster
covariates.
Determines quantiles where there is a disparity between samples and covariates.
Prioritizes sampling within those quantile to improve representation.
To use this function, user must first specify the number of quantiles
(nQuant
) followed by either the nSamp
(total
number of desired samples to be added) or the threshold
(sampling ratio vs. covariate coverage ratio for quantiles - default is
0.9) parameters. We recommended you setting the threshold
values at or below 0.9.
sample_ahels(mraster = mraster,
existing = existing, # existing samples
plot = TRUE) # plot
#> Simple feature collection with 276 features and 4 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431110 ymin: 5337730 xmax: 438550 ymax: 5343210
#> CRS: +proj=utm +zone=17 +ellps=GRS80 +units=m +no_defs
#> First 10 features:
#> type zq90 pzabove2 zsd geometry
#> 1 existing 6.66 21.8 1.62 POINT (433670 5340630)
#> 2 existing 7.59 75.0 1.90 POINT (436570 5340290)
#> 3 existing 5.20 45.5 1.17 POINT (433730 5342950)
#> 4 existing 7.41 26.5 2.13 POINT (434630 5341570)
#> 5 existing 2.66 10.7 0.42 POINT (437870 5338770)
#> 6 existing 3.00 5.0 0.59 POINT (435370 5339170)
#> 7 existing 8.51 45.0 2.20 POINT (435990 5339650)
#> 8 existing 8.06 19.8 2.14 POINT (434150 5342690)
#> 9 existing 7.58 80.2 1.73 POINT (437230 5338090)
#> 10 existing 3.12 9.5 0.56 POINT (434330 5341730)
Notice that no threshold
, nSamp
, or
nQuant
were defined. That is because the default setting
for threshold = 0.9
and nQuant = 10
.
The first matrix output shows the quantile ratios between the sample and the covariates. A value of 1.0 indicates that samples are represented relative to the quantile coverage. Values > 1.0 indicate over representation of samples, while < 1.0 indicate under representation of samples.
sample_ahels(mraster = mraster,
existing = existing, # existing samples
nQuant = 20, # define 20 quantiles
nSamp = 300) # total samples desired
#> Simple feature collection with 500 features and 4 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 431110 ymin: 5337710 xmax: 438550 ymax: 5343230
#> CRS: +proj=utm +zone=17 +ellps=GRS80 +units=m +no_defs
#> First 10 features:
#> type zq90 pzabove2 zsd geometry
#> 1 existing 6.66 21.8 1.62 POINT (433670 5340630)
#> 2 existing 7.59 75.0 1.90 POINT (436570 5340290)
#> 3 existing 5.20 45.5 1.17 POINT (433730 5342950)
#> 4 existing 7.41 26.5 2.13 POINT (434630 5341570)
#> 5 existing 2.66 10.7 0.42 POINT (437870 5338770)
#> 6 existing 3.00 5.0 0.59 POINT (435370 5339170)
#> 7 existing 8.51 45.0 2.20 POINT (435990 5339650)
#> 8 existing 8.06 19.8 2.14 POINT (434150 5342690)
#> 9 existing 7.58 80.2 1.73 POINT (437230 5338090)
#> 10 existing 3.12 9.5 0.56 POINT (434330 5341730)
Notice that the total number of samples is 500. This value is the sum
of existing samples (200) and number of samples defined by
nSamp = 300
.