When working with Generalized Linear Models it is often useful to
create informative and beautiful summaries of the fitted model
coefficients. The goal of prettyglm
is to provide a set of
functions to visualize the Generalized Linear Models coefficients and
performance in interactive plots which can easily be embedded in
rmarkdown reports or separately exported and shared with stakeholders.
This document introduces prettyglm
’s main sets of
functions, and shows you how to apply them.
Please see the website prettyglm for more detailed documentation with html outputs, some of the outputs have been excluded from this documentation for publication on CRAN.
If you don’t find the function you are looking for in
prettyglm
consider checking out some other great packages
which help visualize the output from glms:
tidycat
jtools
You can install the latest CRAN release with:
install.packages('prettyglm')
To explore the functionality of prettyglm
we will use
the titanic data set to perform logistic regression. This data set was
sourced from kaggle
and contains information about passengers aboard the titanic, and a
target variable which indicates if they survived.
library(dplyr)
library(prettyglm)
data('titanic')
head(titanic) %>%
select(-c(PassengerId, Name, Ticket)) %>%
knitr::kable(table.attr = "style='width:10%;'" ) %>%
kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Survived | Pclass | Sex | Age | SibSp | Parch | Fare | Cabin | Embarked | Cabintype |
---|---|---|---|---|---|---|---|---|---|
0 | 3 | male | 22 | 1 | 0 | 7.2500 | Missing | S | Missing |
1 | 1 | female | 38 | 1 | 0 | 71.2833 | C85 | C | C |
1 | 3 | female | 26 | 0 | 0 | 7.9250 | Missing | S | Missing |
1 | 1 | female | 35 | 1 | 0 | 53.1000 | C123 | S | C |
0 | 3 | male | 35 | 0 | 0 | 8.0500 | Missing | S | Missing |
0 | 3 | male | NA | 0 | 0 | 8.4583 | Missing | Q | Missing |
A critical step for this package to work is to set all categorical predictors as factors.
# Easy way to convert multiple columns to a factor.
columns_to_factor <- c('Pclass',
'Sex',
'Cabin',
'Embarked',
'Cabintype')
meanage <- base::mean(titanic$Age, na.rm=T)
titanic <- titanic %>%
dplyr::mutate_at(columns_to_factor, list(~factor(.))) %>%
dplyr::mutate(Age =base::ifelse(is.na(Age)==T,meanage,Age))
For this vignette we will use stats::glm()
to build a
logistic regression model. Currently working on support for
parsnip
and workflow
model objects which use
the glm
model engine.
survival_model <- stats::glm(Survived ~ Pclass +
Sex +
Fare +
Age +
Embarked +
SibSp +
Parch,
data = titanic,
family = binomial(link = 'logit'))
pretty_coefficients()
The function pretty_coefficients()
allows you to create
a pretty table of model coefficients, which by default includes
categorical base levels.
The simplest way to call this function is just with the model object.
pretty_coefficients(model_object = survival_model)
You can also complete a type III test on the coefficients by
specifying a type_iii
argument. Warning Wald
type III tests will fail if there are aliased coefficients in the
model.
You can change the significance level highlighted in the table with
significance_level
.
pretty_coefficients(survival_model, type_iii = 'Wald', significance_level = 0.1)
By default pretty_coefficients
shows “model” variable
importance. But vimethod
also accepts “permute” and “firm”
methods from . Additional parameters for these methods should also be
passed into pretty_coefficients
.
pretty_coefficients(model_object = survival_model,
type_iii = 'Wald',
significance_level = 0.1,
vimethod = 'permute',
target = 'Survived',
metric = 'auc',
pred_wrapper = predict.glm,
reference_class = 0)
pretty_relativities()
pretty_relativities()
will create a plot of the desired
model variable. A different plot will be generated depending on the
class of the variable.
A model relativity is a transform of the model estimate. By default
pretty_relativities()
uses ‘exp(estimate)-1’ which is
useful for GLM’s which use a log or logit link function.
The term ‘relativity’ is some times referred to as “odds-ratio” or
“Likelihood”. You can customize the label with the
relativity_label
input.
For categorical variables pretty_relativities()
creates
an interactive duel axis plot, which plots the fitted relativity on one
y axis, and the number of records in that category on the other y
axis.
pretty_relativities(feature_to_plot= 'Embarked',
model_object = survival_model,
relativity_label = 'Liklihood of Survival'
)
For continuous variables pretty_relativities
will plot
the relativity over the variables range, and the density of that
variable on a duel axis.
If desired you can cut off the tail end of the distributions with
upper_percentile_to_cut
or
lower_percentile_to_cut
.
pretty_relativities(feature_to_plot= 'Fare',
model_object = survival_model,
relativity_label = 'Liklihood of Survival',
upper_percentile_to_cut = 0.1)
To highlight some more of prettyglm
’s functionality we
will now build a logistic regression model with some interactions.
survival_model2 <- stats::glm(Survived ~ Pclass:Fare +
Age +
Embarked:Sex +
SibSp +
Parch,
data = titanic,
family = binomial(link = 'logit'))
You can also choose to facet the plots by one of the variables.
pretty_relativities(feature_to_plot= 'Embarked:Sex',
model_object = survival_model2,
relativity_label = 'Liklihood of Survival',
iteractionplottype = 'facet',
facetorcolourby = 'Sex'
)
You can also choose to colour the plots by one of the variables.
pretty_relativities(feature_to_plot= 'Embarked:Sex',
model_object = survival_model2,
relativity_label = 'Liklihood of Survival',
iteractionplottype = 'colour',
facetorcolourby = 'Embarked'
)
You can create these relativity plots as you would for a non-interaction.
pretty_relativities(feature_to_plot= 'Embarked:Sex',
model_object = survival_model2,
relativity_label = 'Liklihood of Survival'
)
By default continuous and factor interaction plots will colour by the factor variable.
pretty_relativities(feature_to_plot= 'Pclass:Fare',
model_object = survival_model2,
relativity_label = 'Liklihood of Survival',
upper_percentile_to_cut = 0.03
)
You can also facet by the factor variable.
pretty_relativities(feature_to_plot= 'Pclass:Fare',
model_object = survival_model2,
relativity_label = 'Liklihood of Survival',
iteractionplottype = 'facet',
upper_percentile_to_cut = 0.03,
height = 800
)
To highlight some more of prettyglm
’s functionality we
will now build a logistic regression model with a
spline.
prettyglm
includes a function splineit
to
help construct splines. This can be incorporated in the dplyr workflow
as follows.
For splines to work nicely in prettyglm
use the naming
convention Variable#Start#End where # represents your desired
separator.
titanic <- titanic %>%
dplyr::mutate(Age_0_18 = prettyglm::splineit(Age,0,18),
Age_18_35 = prettyglm::splineit(Age,18,35),
Age_35_120 = prettyglm::splineit(Age,35,120)) %>%
dplyr::mutate(Fare_0_55 = prettyglm::splineit(Fare,0,55),
Fare_55_600 = prettyglm::splineit(Fare,55,600))
survival_model4 <- stats::glm(Survived ~ Pclass +
Sex:Fare_0_55 +
Sex:Fare_55_600 +
Age_0_18 +
Age_18_35 +
Age_35_120 +
Embarked +
SibSp +
Parch,
data = titanic,
family = binomial(link = 'logit'))
For interactions variables are grouped on the left pane.
pretty_coefficients(survival_model4, significance_level = 0.1, spline_seperator = '_')
You also need to provide a spline_seperator
input in
pretty_relativities
.
pretty_relativities(feature_to_plot= 'Age',
model_object = survival_model4,
relativity_label = 'Liklihood of Survival',
spline_seperator = '_'
)
By default pretty_relativities
will colour by the factor
variable.
pretty_relativities(feature_to_plot= 'Sex:Fare',
model_object = survival_model4,
relativity_label = 'Liklihood of Survival',
spline_seperator = '_',
upper_percentile_to_cut = 0.03
)
If you prefer to facet by the factor variable, change
iteractionplottype
to “facet”
pretty_relativities(feature_to_plot= 'Sex:Fare',
model_object = survival_model4,
relativity_label = 'Liklihood of Survival',
spline_seperator = '_',
upper_percentile_to_cut = 0.03,
iteractionplottype = 'facet'
)
one_way_ave()
For continuous variables one_way_ave
will bucket value
into 30 buckets by default, and plot the density on a dual axis.
one_way_ave(feature_to_plot = 'Age',
model_object = survival_model4,
target_variable = 'Survived',
data_set = titanic,
upper_percentile_to_cut = 0.1,
lower_percentile_to_cut = 0.1)
one_way_ave(feature_to_plot = 'Cabintype',
model_object = survival_model4,
target_variable = 'Survived',
data_set = titanic)
You can facet the one_way_ave
plot by providing a
variable to facet by in facetby
.
one_way_ave(feature_to_plot = 'Age',
model_object = survival_model4,
target_variable = 'Survived',
facetby = 'Sex',
data_set = titanic,
upper_percentile_to_cut = 0.1,
lower_percentile_to_cut = 0.1)
By default one_way_ave
uses . If you would like to use
one_way_ave
with another model type (which is not
compatible with predict.glm), or provide modified predictions,
one_way_ave
allows a custom prediction function.
This function must return a data.frame with two columns: “Actual_Values” and “Predicted_Values”.
# Custom Predict Function and facet
a_custom_predict_function <- function(target, model_object, dataset){
dataset <- base::as.data.frame(dataset)
Actual_Values <- dplyr::pull(dplyr::select(dataset, tidyselect::all_of(c(target))))
if(class(Actual_Values) == 'factor'){
Actual_Values <- base::as.numeric(as.character(Actual_Values))
}
Predicted_Values <- base::as.numeric(stats::predict(model_object, dataset, type='response'))
to_return <- base::data.frame(Actual_Values = Actual_Values,
Predicted_Values = Predicted_Values)
to_return <- to_return %>%
dplyr::mutate(Predicted_Values = base::ifelse(Predicted_Values > 0.4,0.4,Predicted_Values))
return(to_return)
}
one_way_ave(feature_to_plot = 'Age',
model_object = survival_model4,
target_variable = 'Survived',
data_set = titanic,
upper_percentile_to_cut = 0.1,
lower_percentile_to_cut = 0.1,
predict_function = a_custom_predict_function)
actual_expected_bucketed()
actual_expected_bucketed(target_variable = 'Survived',
model_object = survival_model4,
data_set = titanic)
actual_expected_bucketed(target_variable = 'Survived',
model_object = survival_model4,
data_set = titanic,
facetby = 'Sex')