Multi-stratum experimental designs: a practical example

The MultiDoE package

The MultiDoE package can be used to construct multi-stratum experimental designs (for any number of strata) that optimize up to six statistical criteria simultaneously. To solve such optimization problems, the innovative MS-Opt and MS-TPLS algorithms are implemented. The former relies on a local search procedure to find a locally optimal experimental design. More precisely, it is an extension of the Coordinate-Exchange (CE) algorithm that allows both the search of experimental designs for any type of nested multi-stratum experiment and the optimization of multiple criteria simultaneously. The latter, by embedding MS-Opt in a Two-Phase Local Search framework, is able to generate a good Pareto front approximation for the optimization problem under study. The package provides different ways to choose the final optimal experimental design among those belonging to the Pareto front.

In what follows, we report a practical example.

The protein extraction experiment

In this experiment, a split-plot design is used. The final aim is to investigate the effect of five factors on protein extraction. More precisely, a mixture containing two valuable proteins, among other components, is considered after fermentation and purification processes.

The experiment is intended to separate the two proteins from the mixture, and the responses were the yields and purities of the two proteins. The factors are: \(x_1\), the feed position for the inflow of a mixture, which is hard to set; \(x_2\) the feed flow rate; \(x_3\) the gas flow rate; \(x_4\) the concentration of the first protein; \(x_5\), the concentration of the second protein. Three levels are used for each factor. The split-plot design was set up as follows: one whole-plot factor, four subplot factors, twenty-one whole plots of size two, and 42 runs.

Let use the package for simultaneously optmize more than one criterion.

First step is to upload the package.

library(multiDoE)

It is time to initialize the main arguments for defining the experimental design.

backup_options <- options()

set.seed(13)
options(digits = 15)

facts <- list(1, 2:5)
units <- list(21, 2)
levels <- 3
etas <- list(1)
criteria <- c('Id', 'Ds')
model <- "quadratic"

facts is a list of vectors representing the distribution of factors across strata. Each item in the list represents a stratum and the first item is the highest stratum of the multi-stratum structure of the experiment. Within the vectors, experimental factors are indicated by progressive integer from 1 (the first factor of the highest stratum) to the total number of experimental factors (the last factor of the lowest stratum). Blocking factors are differently denoted by empty vectors.

units is a list whose \(i\)-th element, \(n_i\), is the number of experimental units within each unit at the previous stratum (\(i-1\)).

levels is the number of available levels for each experimental factor.

etas is used to specify the ratios of error variance between subsequent strata, starting from the highest strata.

criteria is a list specifying the criteria to be optimized among I-, Id-, D-, A-, Ds, and As-optimality.

model is a string which indicates the type of model, among main effects only (“main”), interaction (“interaction”) and full quadratic (“quadratic”).

Now, we need to declare the main arguments for the MS-TPLS algorithms in order to get the final Pareto Front.

iters <- 5 * length(criteria)
restarts <- 10
restInit <- 2

model is an integer indicating the number of iterations of the MS-TPLS algorithm. In this case, we set the iteration as 5 times the number of criteria.

restarts defines the number of times the MS-Opt procedure is altogether called within each iteration of the MS-TPLS algorithm.

restInit determines how many of the iterations of MS-Opt should be used for each criterion in the first step of the MS-TPLS algorithm.

restarts and restInitcould be tuned.

We now simultaneously minimize Id- and Ds-criteria.

tpls <- runTPLS(facts,units, criteria, model, iters, 
                "Restarts", restarts, 
                "RestInit", restInit)
#> [1] 1
#> [1] 2
#> [1] 3
#> [1] 4
#> [1] 5
#> [1] 6
#> [1] 7
#> [1] 8
#> [1] 9
#> [1] 10

Given the tpls object, you have now different options. First of all, you can inspect your Pareto front by using the plotPareto function. As input of the plotPareto function, you should use the megaAr output of the runTPLS function. megaAR contains different information, among them it includes a matrix on which every row contains the criteria values for each Pareto front design.

plotPareto(tpls$megaAR)

Now, you would probably like to find the best compromise between Id- and Ds-criteria. For this purpose, the optMultiCrit is the function you are searching for. The optMultiCrit function provides an objective criterion for the selection of the best experimental design among all Pareto front solutions. The selection is based on minimizing the euclidean distance in the criteria space between all the Pareto front points and an approximate utopian point. By default, the coordinates of the utopian point correspond to the minimum value reached by each criterion during the runTPLS optimization procedure. Alternatively, the utopian point can be chosen by the user (please see the documentation).

optMultiCrit(tpls$megaAR)
#> $solution
#> $solution[[1]]
#>       [,1] [,2] [,3] [,4] [,5]
#>  [1,]    1    1    1    1   -1
#>  [2,]    1   -1   -1    1    0
#>  [3,]    0    0   -1    0    0
#>  [4,]    0   -1    0    1    1
#>  [5,]    1    0    0    0    0
#>  [6,]    1    1   -1   -1    1
#>  [7,]    0    1    0   -1    0
#>  [8,]    0    0    1    0    1
#>  [9,]    1   -1   -1   -1    1
#> [10,]    1    1    1   -1    1
#> [11,]   -1   -1    1    1    1
#> [12,]   -1   -1   -1   -1    0
#> [13,]    0    0    1    1   -1
#> [14,]    0    1    0    0    0
#> [15,]   -1    1   -1   -1   -1
#> [16,]   -1   -1    0    1    0
#> [17,]    0    1    0    0    1
#> [18,]    0    0    1   -1    0
#> [19,]    1   -1    1    1    0
#> [20,]    1    1   -1    1   -1
#> [21,]    1   -1   -1   -1   -1
#> [22,]    1    0   -1    1    1
#> [23,]   -1    1   -1    1    1
#> [24,]   -1    0    0    0   -1
#> [25,]    0    1   -1    0    0
#> [26,]    0    0    0   -1   -1
#> [27,]    1    1    1   -1   -1
#> [28,]    1   -1    0    1   -1
#> [29,]   -1    1    1    0   -1
#> [30,]   -1   -1    1   -1    1
#> [31,]    0    0    0    0    0
#> [32,]    0   -1   -1    1   -1
#> [33,]   -1    1    1    1    0
#> [34,]   -1   -1   -1    0    1
#> [35,]    1    0   -1   -1   -1
#> [36,]    1   -1    1   -1    1
#> [37,]   -1   -1    1   -1   -1
#> [38,]   -1    0    0   -1    1
#> [39,]   -1    1    1   -1    1
#> [40,]   -1    0   -1    1   -1
#> [41,]    1   -1    1    0   -1
#> [42,]    1    1    1    1    1
#> 
#> 
#> $scores
#>                 Id                 Ds 
#> 0.3327840041018626 0.0784106692294139

Another option is the use of the topsisOpt function. This approach is based on the principle that the best solutions must be near to a positive ideal solution \((I+)\) and far from a negative ideal solution \((I-)\) in the criteria space. Please see the main reference for more details.

M. Méndez, M. Frutos, F. Miguel and R. Aguasca-Colomo. TOPSIS Decision on Approximate Pareto Fronts by Using Evolutionary Algorithms: Application to an Engineering Design Problem. Mathematics, 2020.

You can access the scores of the best compromised design found by the TOPSIS procedure.

topsis_solutions <- topsisOpt(tpls)

topsis_solutions$bestScore
#>                 Id                 Ds 
#> 0.3357919136841818 0.0778816088473012

Then, you can also investigate the best design matrix.

topsis_solutions$bestSol
#>       [,1] [,2] [,3] [,4] [,5]
#>  [1,]    1    1    0    0    0
#>  [2,]    1   -1   -1    1   -1
#>  [3,]    0    0    0    0    0
#>  [4,]    0   -1    1   -1   -1
#>  [5,]    1    1    1    1    1
#>  [6,]    1   -1   -1    0    1
#>  [7,]   -1   -1    1   -1    1
#>  [8,]   -1    0   -1    0   -1
#>  [9,]    1    1    1   -1    1
#> [10,]    1   -1    0   -1   -1
#> [11,]   -1   -1   -1   -1    1
#> [12,]   -1    1    1    1    1
#> [13,]   -1    1    1    1   -1
#> [14,]   -1   -1   -1   -1   -1
#> [15,]   -1    0    1   -1    0
#> [16,]   -1   -1   -1    1    1
#> [17,]   -1    1    0   -1    0
#> [18,]   -1   -1    1    1   -1
#> [19,]    0    1    1    0    0
#> [20,]    0    0    0    1    1
#> [21,]    1    0   -1   -1    1
#> [22,]    1    1   -1    0    0
#> [23,]    1    0    1   -1   -1
#> [24,]    1    1   -1    1    1
#> [25,]    1   -1    1   -1    1
#> [26,]    1    1    1    1   -1
#> [27,]   -1    1   -1    1   -1
#> [28,]   -1    0    1    0    1
#> [29,]    0    1    0   -1    1
#> [30,]    0   -1    1    1    1
#> [31,]   -1    1   -1   -1    1
#> [32,]   -1    1    1   -1   -1
#> [33,]    1    1   -1   -1   -1
#> [34,]    1   -1    1    1    0
#> [35,]    0   -1   -1   -1    0
#> [36,]    0    0    0    0   -1
#> [37,]    0    1   -1   -1   -1
#> [38,]    0    0    0    0    0
#> [39,]    0    0   -1    1    0
#> [40,]    0   -1    0    0   -1
#> [41,]   -1   -1    0    1    0
#> [42,]   -1    1   -1    0    1

At this point, another interesting analysis is related to the identification of the best design matrices for each criterion in the Pareto front. optSingleCrit function selects he best design from the Pareto Front for each criterion.

optSingleCrit(tpls$megaAR)
#> $Id
#> $Id$scores
#>                 Id                 Ds 
#> 0.3327840041018626 0.0784106692294139 
#> 
#> $Id$solution
#>       [,1] [,2] [,3] [,4] [,5]
#>  [1,]    1    1    1    1   -1
#>  [2,]    1   -1   -1    1    0
#>  [3,]    0    0   -1    0    0
#>  [4,]    0   -1    0    1    1
#>  [5,]    1    0    0    0    0
#>  [6,]    1    1   -1   -1    1
#>  [7,]    0    1    0   -1    0
#>  [8,]    0    0    1    0    1
#>  [9,]    1   -1   -1   -1    1
#> [10,]    1    1    1   -1    1
#> [11,]   -1   -1    1    1    1
#> [12,]   -1   -1   -1   -1    0
#> [13,]    0    0    1    1   -1
#> [14,]    0    1    0    0    0
#> [15,]   -1    1   -1   -1   -1
#> [16,]   -1   -1    0    1    0
#> [17,]    0    1    0    0    1
#> [18,]    0    0    1   -1    0
#> [19,]    1   -1    1    1    0
#> [20,]    1    1   -1    1   -1
#> [21,]    1   -1   -1   -1   -1
#> [22,]    1    0   -1    1    1
#> [23,]   -1    1   -1    1    1
#> [24,]   -1    0    0    0   -1
#> [25,]    0    1   -1    0    0
#> [26,]    0    0    0   -1   -1
#> [27,]    1    1    1   -1   -1
#> [28,]    1   -1    0    1   -1
#> [29,]   -1    1    1    0   -1
#> [30,]   -1   -1    1   -1    1
#> [31,]    0    0    0    0    0
#> [32,]    0   -1   -1    1   -1
#> [33,]   -1    1    1    1    0
#> [34,]   -1   -1   -1    0    1
#> [35,]    1    0   -1   -1   -1
#> [36,]    1   -1    1   -1    1
#> [37,]   -1   -1    1   -1   -1
#> [38,]   -1    0    0   -1    1
#> [39,]   -1    1    1   -1    1
#> [40,]   -1    0   -1    1   -1
#> [41,]    1   -1    1    0   -1
#> [42,]    1    1    1    1    1
#> 
#> 
#> $Ds
#> $Ds$scores
#>                 Id                 Ds 
#> 0.4112907857518297 0.0741784584427387 
#> 
#> $Ds$solution
#>       [,1] [,2] [,3] [,4] [,5]
#>  [1,]    1    1   -1   -1    0
#>  [2,]    1   -1   -1    1   -1
#>  [3,]    1    0   -1   -1    1
#>  [4,]    1   -1    1   -1   -1
#>  [5,]    1    1    1    1    1
#>  [6,]    1   -1   -1    1    1
#>  [7,]   -1   -1    1   -1    1
#>  [8,]   -1    0   -1    0   -1
#>  [9,]    1    1    1   -1    1
#> [10,]    1   -1   -1   -1   -1
#> [11,]   -1   -1   -1   -1    1
#> [12,]   -1    1    1    1    1
#> [13,]   -1    1    1    1   -1
#> [14,]   -1   -1   -1   -1   -1
#> [15,]   -1   -1    1   -1   -1
#> [16,]   -1   -1   -1    1    1
#> [17,]   -1    1    0   -1    0
#> [18,]   -1   -1    1    1   -1
#> [19,]    0    1    1    0    0
#> [20,]    0    0    0    1    1
#> [21,]    1   -1    0   -1    1
#> [22,]    1    1   -1    1   -1
#> [23,]    1    0    1   -1   -1
#> [24,]    1    1   -1    1    1
#> [25,]    1   -1    1    0    1
#> [26,]    1    1    1    1   -1
#> [27,]   -1    1   -1    1   -1
#> [28,]   -1    0    1    0    1
#> [29,]    0    1    1   -1    1
#> [30,]    0   -1    1    1    1
#> [31,]   -1    1   -1   -1    1
#> [32,]   -1    1    1   -1   -1
#> [33,]    1    1   -1   -1   -1
#> [34,]    1   -1    1    1    0
#> [35,]    0   -1   -1   -1    0
#> [36,]    0   -1    0    1   -1
#> [37,]   -1    1   -1   -1   -1
#> [38,]   -1   -1    0    0    0
#> [39,]    0    0   -1    1    0
#> [40,]    0    1    0    0   -1
#> [41,]   -1   -1    0    1    0
#> [42,]   -1    1   -1    0    1
options(backup_options)