library(elo)
Now that we’ve explored the Elo setup, we turn our attention to other
methodologies implemented in the elo
package.
The first model computes teams’ win percentages, and feeds the
differences of percentages into a regression. Including an adjustment
using adjust()
in the formula also includes that in the
model. You could also adjust the intercept for games played on neutral
fields by using the neutral()
function.
<- elo.winpct(score(points.Home, points.Visitor) ~ team.Home + team.Visitor + group(week), data = tournament,
e.winpct subset = points.Home != points.Visitor) # to get rid of ties for now
summary(e.winpct)
##
## An object of class 'elo.winpct', containing information on 8 teams and 51 matches.
##
## Mean Square Error: 0.1566
## AUC: 0.8339
## Favored Teams vs. Actual Wins:
## Actual
## Favored 0 1
## TRUE 7 32
## (tie) 0 0
## FALSE 9 3
rank.teams(e.winpct)
## Athletic Armadillos Blundering Baboons Cunning Cats
## 1 7 4
## Defense-less Dogs Elegant Emus Fabulous Frogs
## 8 5 2
## Gallivanting Gorillas Helpless Hyenas
## 3 6
predict(e.winpct, newdata = data.frame(team.Home = "Athletic Armadillos", team.Visitor = "Blundering Baboons", stringsAsFactors = FALSE))
## 1
## 0.9690678
$neutral <- replace(rep(0, nrow(tournament)), 30:35, 1)
tournamentsummary(elo.winpct(score(points.Home, points.Visitor) ~ team.Home + team.Visitor + neutral(neutral) + group(week),
data = tournament, subset = points.Home != points.Visitor))
##
## An object of class 'elo.winpct', containing information on 8 teams and 51 matches.
##
## Mean Square Error: 0.1565
## AUC: 0.825
## Favored Teams vs. Actual Wins:
## Actual
## Favored 0 1
## TRUE 6 32
## (tie) 0 0
## FALSE 10 3
The models can be built “running”, where predictions for the next
group of games are made based on past data. Consider using the
skip=
argument to skip the first few groups (otherwise the
model might have trouble converging).
Note that predictions from this object use a model fit on all the data.
<- elo.winpct(score(points.Home, points.Visitor) ~ team.Home + team.Visitor + group(week), data = tournament,
e.winpct subset = points.Home != points.Visitor, running = TRUE, skip = 5)
summary(e.winpct)
##
## An object of class 'elo.winpct', containing information on 8 teams and 51 matches.
##
## Mean Square Error: 0.2141
## AUC: 0.8339
## Favored Teams vs. Actual Wins:
## Actual
## Favored 0 1
## TRUE 5 19
## (tie) 6 13
## FALSE 5 3
predict(e.winpct, newdata = data.frame(team.Home = "Athletic Armadillos", team.Visitor = "Blundering Baboons", stringsAsFactors = FALSE)) # the same thing
## 1
## 0.9690678
It’s also possible to compare teams’ skills using logistic
regression. This is essentially the Bradley-Terry
model. A matrix of dummy variables is constructed, one for each
team, where a value of 1 indicates a home team and -1 indicates a
visiting team. The intercept then indicates a home-field advantage. To
denote games played in a neutral setting (that is, without home-field
advantage), use the neutral()
function. In short, the
intercept will then be set to 1 - neutral()
. Including an
adjustment using adjust()
in the formula also includes that
in the model.
<- elo.glm(score(points.Home, points.Visitor) ~ team.Home + team.Visitor + group(week), data = tournament,
results subset = points.Home != points.Visitor) # to get rid of ties for now
summary(results)
##
## Call:
## stats::glm(formula = wins.A ~ . - 1, family = family, data = dat.qr,
## weights = wts, subset = NULL, na.action = stats::na.pass)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0108 -0.8255 0.4050 0.6560 2.1217
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## home.field 1.0307 0.3871 2.663 0.00775 **
## `Athletic Armadillos` 1.4289 0.9546 1.497 0.13442
## `Blundering Baboons` -0.9637 0.9043 -1.066 0.28659
## `Cunning Cats` 0.5377 0.9483 0.567 0.57074
## `Defense-less Dogs` -1.7413 1.0356 -1.681 0.09268 .
## `Elegant Emus` 0.3931 0.8818 0.446 0.65576
## `Fabulous Frogs` 0.8489 0.8807 0.964 0.33509
## `Gallivanting Gorillas` 0.3994 0.9500 0.420 0.67417
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 70.701 on 51 degrees of freedom
## Residual deviance: 48.037 on 43 degrees of freedom
## AIC: 64.037
##
## Number of Fisher Scoring iterations: 5
##
## Mean Square Error: 0.1566
## AUC: 0.8375
## Favored Teams vs. Actual Wins:
## Actual
## Favored 0 1
## TRUE 7 32
## (tie) 0 0
## FALSE 9 3
rank.teams(results)
## Athletic Armadillos Blundering Baboons Cunning Cats
## 1 7 3
## Defense-less Dogs Elegant Emus Fabulous Frogs
## 8 5 2
## Gallivanting Gorillas Helpless Hyenas
## 4 6
predict(results, newdata = data.frame(team.Home = "Athletic Armadillos", team.Visitor = "Blundering Baboons", stringsAsFactors = FALSE))
## 1
## 0.9684256
summary(elo.glm(score(points.Home, points.Visitor) ~ team.Home + team.Visitor + neutral(neutral) + group(week),
data = tournament, subset = points.Home != points.Visitor))
##
## Call:
## stats::glm(formula = wins.A ~ . - 1, family = family, data = dat.qr,
## weights = wts, subset = NULL, na.action = stats::na.pass)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0349 -0.7789 0.3933 0.7618 2.2148
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## home.field 1.0886 0.4229 2.574 0.0101 *
## `Athletic Armadillos` 1.6006 0.9750 1.642 0.1007
## `Blundering Baboons` -0.8541 0.8930 -0.956 0.3389
## `Cunning Cats` 0.5801 0.9446 0.614 0.5391
## `Defense-less Dogs` -1.8507 1.0449 -1.771 0.0765 .
## `Elegant Emus` 0.5762 0.8994 0.641 0.5218
## `Fabulous Frogs` 0.8470 0.8804 0.962 0.3360
## `Gallivanting Gorillas` 0.5777 0.9279 0.623 0.5335
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 70.701 on 51 degrees of freedom
## Residual deviance: 48.405 on 43 degrees of freedom
## AIC: 64.405
##
## Number of Fisher Scoring iterations: 5
##
## Mean Square Error: 0.1556
## AUC: 0.8375
## Favored Teams vs. Actual Wins:
## Actual
## Favored 0 1
## TRUE 7 32
## (tie) 0 0
## FALSE 9 3
The models can be built “running”, where predictions for the next
group of games are made based on past data. Consider using the
skip=
argument to skip the first few groups (otherwise the
model might have trouble converging).
Note that predictions from this object use a model fit on all the data.
<- elo.glm(score(points.Home, points.Visitor) ~ team.Home + team.Visitor + group(week), data = tournament,
results subset = points.Home != points.Visitor, running = TRUE, skip = 5)
summary(results)
##
## Call:
## stats::glm(formula = wins.A ~ . - 1, family = family, data = dat.qr,
## weights = wts, subset = NULL, na.action = stats::na.pass)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0108 -0.8255 0.4050 0.6560 2.1217
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## home.field 1.0307 0.3871 2.663 0.00775 **
## `Athletic Armadillos` 1.4289 0.9546 1.497 0.13442
## `Blundering Baboons` -0.9637 0.9043 -1.066 0.28659
## `Cunning Cats` 0.5377 0.9483 0.567 0.57074
## `Defense-less Dogs` -1.7413 1.0356 -1.681 0.09268 .
## `Elegant Emus` 0.3931 0.8818 0.446 0.65576
## `Fabulous Frogs` 0.8489 0.8807 0.964 0.33509
## `Gallivanting Gorillas` 0.3994 0.9500 0.420 0.67417
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 70.701 on 51 degrees of freedom
## Residual deviance: 48.037 on 43 degrees of freedom
## AIC: 64.037
##
## Number of Fisher Scoring iterations: 5
##
## Mean Square Error: 0.2098
## AUC: 0.8375
## Favored Teams vs. Actual Wins:
## Actual
## Favored 0 1
## TRUE 6 19
## (tie) 6 13
## FALSE 4 3
predict(results, newdata = data.frame(team.Home = "Athletic Armadillos", team.Visitor = "Blundering Baboons", stringsAsFactors = FALSE)) # the same thing
## 1
## 0.9684256
It’s also possible to compare teams’ skills using a
Markov-chain-based model, as outlined in Kvam
and Sokol (2006). In short, imagine a judge who randomly picks one
of two teams in a matchup, where the winner gets chosen with probability
p (here, for convenience, ‘k’) and the loser with probability 1-p (1-k).
In other words, we assume that the probability that the winning team is
better than the losing team given that it won is k, and the probability
that the losing team is better than the winning team given that it lost
is (1-k). This forms a transition matrix, whose stationary distribution
gives a ranking of teams. The differences in ranking are then fed into a
logistic regression model to predict win status. Any adjustments made
using adjust()
are also included in this logistic
regression. You could also adjust the intercept for games played on
neutral fields by using the neutral()
function.
<- elo.markovchain(score(points.Home, points.Visitor) ~ team.Home + team.Visitor, data = tournament,
mc subset = points.Home != points.Visitor, k = 0.7)
summary(mc)
##
## An object of class 'elo.markovchain', containing information on 8 teams and 51 matches.
##
## Mean Square Error: 0.1688
## AUC: 0.8
## Favored Teams vs. Actual Wins:
## Actual
## Favored 0 1
## TRUE 10 29
## (tie) 0 0
## FALSE 6 6
rank.teams(mc)
## Athletic Armadillos Blundering Baboons Cunning Cats
## 1 7 3
## Defense-less Dogs Elegant Emus Fabulous Frogs
## 8 4 2
## Gallivanting Gorillas Helpless Hyenas
## 6 5
predict(mc, newdata = data.frame(team.Home = "Athletic Armadillos", team.Visitor = "Blundering Baboons", stringsAsFactors = FALSE))
## 1
## 0.9594476
summary(elo.markovchain(score(points.Home, points.Visitor) ~ team.Home + team.Visitor + neutral(neutral),
data = tournament, subset = points.Home != points.Visitor, k = 0.7))
##
## An object of class 'elo.markovchain', containing information on 8 teams and 51 matches.
##
## Mean Square Error: 0.1732
## AUC: 0.7857
## Favored Teams vs. Actual Wins:
## Actual
## Favored 0 1
## TRUE 10 28
## (tie) 0 0
## FALSE 6 7
These models can also be built “running”, where predictions for the
next group of games are made based on past data. Consider using the
skip=
argument to skip the first few groups (otherwise the
model might have trouble converging).
Note that predictions from this object use a model fit on all the data.
<- elo.markovchain(score(points.Home, points.Visitor) ~ team.Home + team.Visitor + group(week), data = tournament,
mc subset = points.Home != points.Visitor, k = 0.7, running = TRUE, skip = 5)
summary(mc)
##
## An object of class 'elo.markovchain', containing information on 8 teams and 51 matches.
##
## Mean Square Error: 0.229
## AUC: 0.8
## Favored Teams vs. Actual Wins:
## Actual
## Favored 0 1
## TRUE 6 19
## (tie) 6 13
## FALSE 4 3
predict(mc, newdata = data.frame(team.Home = "Athletic Armadillos", team.Visitor = "Blundering Baboons", stringsAsFactors = FALSE)) # the same thing
## 1
## 0.9594476
Note that by assigning probabilities in the right way, this function
emits the Logistic Regression Markov Chain model (LRMC). Use the
in-formula function k()
for this. IMPORTANT: note that
k()
denotes the probability assigned to the
winning team, not the home team (for instance). If
rH(x)
denotes the probability that the home team is better
given that they scored x
points more than the visiting team
(allowing for x to be negative), then an LRMC model might look something
like this:
elo.markovchain(floor(wins.home) ~ team.home + team.visitor + k(ifelse(x > 0, rH(x), 1 - rH(x))))
Why do we use floor()
here? This takes care of the odd
case where teams tie. In this case, rH(x) < 0.5
because
we expected the home team to win by virtue of being home. By default,
elo.markovchain()
will split any ties down the middle
(i.e., 0.5 and 0.5 instead of p and 1-p), which isn’t what we want; we
want the visiting team to get a larger share than the home team. Telling
elo.markovchain()
that the visiting team “won” gives the
visiting team its whole share of p
.
Alternatively, if h
denotes a home-field advantage (in
terms of score), the model becomes:
elo.markovchain(ifelse(home.points - visitor.points > h, 1, 0) ~ team.home + team.visitor + k(pmax(rH(x), 1 - rH(x))))
In this case, the home team “won” if it scored more than
h
points more than the visiting team. Since
rH(x) > 0.5
if x > h
, then
pmax()
will assign the proper probability to the
pseudo-winning team.
Finally, do note that using neutral()
isn’t sufficient
for adjusting for games played on neutral ground, because the adjustment
is only taken into account in the logistic regression to produce
probabilities, not the building of the transition matrix. Therefore,
you’ll want to also account for neutral wins/losses in k()
as well.
It’s also possible to compare teams’ skills using the Colley Matrix
method, as outlined in Colley (2002). The
coefficients to the Colley matrix formulation gives a ranking of teams.
The differences in ranking are then fed into a logistic regession model
to predict win status. Here ‘k’ denotes how convincing a win is; it
represents the fraction of the win assigned to the winning team and the
fraction of the loss assigned to the losing team. Setting ‘k’ = 1 emits
the bias-free method presented by Colley. Any adjustments made using
adjust()
are also included in this logistic regression. You
could also adjust the intercept for games played on neutral fields by
using the neutral()
function.
<- elo.colley(score(points.Home, points.Visitor) ~ team.Home + team.Visitor, data = tournament,
co subset = points.Home != points.Visitor)
summary(co)
##
## An object of class 'elo.colley', containing information on 8 teams and 51 matches.
##
## Mean Square Error: 0.1565
## AUC: 0.8339
## Favored Teams vs. Actual Wins:
## Actual
## Favored 0 1
## TRUE 7 32
## (tie) 0 0
## FALSE 9 3
rank.teams(co)
## Athletic Armadillos Blundering Baboons Cunning Cats
## 1 7 4
## Defense-less Dogs Elegant Emus Fabulous Frogs
## 8 5 2
## Gallivanting Gorillas Helpless Hyenas
## 3 6
predict(co, newdata = data.frame(team.Home = "Athletic Armadillos", team.Visitor = "Blundering Baboons", stringsAsFactors = FALSE))
## 1
## 0.9687583
summary(elo.colley(score(points.Home, points.Visitor) ~ team.Home + team.Visitor + neutral(neutral),
data = tournament, subset = points.Home != points.Visitor))
##
## An object of class 'elo.colley', containing information on 8 teams and 51 matches.
##
## Mean Square Error: 0.1565
## AUC: 0.8268
## Favored Teams vs. Actual Wins:
## Actual
## Favored 0 1
## TRUE 6 32
## (tie) 0 0
## FALSE 10 3
These models can also be built “running”, where predictions for the
next group of games are made based on past data. Consider using the
skip=
argument to skip the first few groups (otherwise the
model might have trouble converging).
Note that predictions from this object use a model fit on all the data.
<- elo.colley(score(points.Home, points.Visitor) ~ team.Home + team.Visitor + group(week), data = tournament,
co subset = points.Home != points.Visitor, running = TRUE, skip = 5)
summary(co)
##
## An object of class 'elo.colley', containing information on 8 teams and 51 matches.
##
## Mean Square Error: 0.2173
## AUC: 0.8339
## Favored Teams vs. Actual Wins:
## Actual
## Favored 0 1
## TRUE 4 19
## (tie) 6 13
## FALSE 6 3
predict(co, newdata = data.frame(team.Home = "Athletic Armadillos", team.Visitor = "Blundering Baboons", stringsAsFactors = FALSE)) # the same thing
## 1
## 0.9687583
elo.glm()
, elo.markovchain()
, and
elo.winpct()
all allow for modeling of margins of victory
instead of simple win/loss using the mov()
function. Note
that one must set the family="gaussian"
argument to get
linear regression instead of logistic regression.
summary(elo.glm(mov(points.Home, points.Visitor) ~ team.Home + team.Visitor, data = tournament,
family = "gaussian"))
##
## Call:
## stats::glm(formula = wins.A ~ . - 1, family = family, data = dat.qr,
## weights = wts, subset = NULL, na.action = stats::na.pass)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -10.6339 -2.8996 -0.0402 2.7879 12.9286
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## home.field 2.6964 0.6941 3.885 0.000313 ***
## `Athletic Armadillos` 3.1250 1.8363 1.702 0.095263 .
## `Blundering Baboons` -2.4375 1.8363 -1.327 0.190655
## `Cunning Cats` 0.3125 1.8363 0.170 0.865584
## `Defense-less Dogs` -3.5000 1.8363 -1.906 0.062646 .
## `Elegant Emus` -0.9375 1.8363 -0.511 0.612014
## `Fabulous Frogs` 0.6875 1.8363 0.374 0.709759
## `Gallivanting Gorillas` 0.2500 1.8363 0.136 0.892277
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 26.97582)
##
## Null deviance: 2161.0 on 56 degrees of freedom
## Residual deviance: 1294.8 on 48 degrees of freedom
## AIC: 352.81
##
## Number of Fisher Scoring iterations: 2
##
## Mean Square Error: 23.1221
## AUC: NA
## Favored Teams vs. Actual Wins:
## Actual
## Favored TRUE (tie) FALSE
## TRUE 31 3 10
## (tie) 0 0 0
## FALSE 4 2 6