Alea Jacta Est

This is the very last analysis before the election. A far-right nationalist candidate, Jair Messias Bolsonaro, is leading the polls with about 40% of the intentions, while the runner up candidate, Fernando Haddad, of a leftist coalition has about 25%; all the others have 35% in total.

A very intriguing debate put forward in the press in the last days was if the next president would be elected right way in the primary election round. So, will the Bandwagon, shy Tory, and something else effect help electing the far-right nationalist candidate by the absolute-majority criterion vote? Following a similar strategy presented before here and here, Bayes says don't worry about Bolsonaro’s victory by now.

Although polling houses are showing Bolsonaro’s support augmenting systematically over the last weeks, it’s fair to remember that pollsters did a very poor job in fielding the true vote share last elections. For instance, in 2014 the main polling firms mis-predicted both Dilma Rousseff’s and Aecio Neves’ true positions by saying Dilma was to win a majority with a margin, but the decision went to a instant runoff between these two candidates.

The following numbers represent the forecast with polling data made available over the last three days. Since there is a considerable number of swing voters in these polls, I did some math by distributing these undecideds before computing the final likely results. It’s a simplified simulation exercice as I do not account for time trends, house effects etc. I’m only accounting for the sample sizes.

The data

           Bolsonaro Haddad  Ciro Others Swing Wasting     N
Datafolha      0.360  0.220 0.130  0.200 0.040   0.050 19552
Ibope          0.360  0.220 0.110  0.180 0.050   0.080  3010
MDA            0.367  0.240 0.099  0.155 0.060   0.078  2002
Ipespe         0.360  0.220 0.110  0.290 0.020   0.000  2000
VoxPopuli      0.340  0.270 0.110  0.130 0.070   0.080  2000
ParanaPesq     0.349  0.218 0.094  0.171 0.046   0.120  1080

Poll of polls

Here is where the magic begins. I weigh polls so to reflect their sample sizes. The new results are shown in last line (7) of the table.

options(digits=3)

wtd.polls <- rbind(data, c(apply(data[,1:6],2, weighted.mean, data$N), sum(data$N)))

print(wtd.polls)
           Bolsonaro Haddad  Ciro Others  Swing Wasting     N
Datafolha      0.360  0.220 0.130  0.200 0.0400  0.0500 19552
Ibope          0.360  0.220 0.110  0.180 0.0500  0.0800  3010
MDA            0.367  0.240 0.099  0.155 0.0600  0.0780  2002
Ipespe         0.360  0.220 0.110  0.290 0.0200  0.0000  2000
VoxPopuli      0.340  0.270 0.110  0.130 0.0700  0.0800  2000
ParanaPesq     0.349  0.218 0.094  0.171 0.0460  0.1200  1080
7              0.359  0.225 0.122  0.195 0.0433  0.0561 29644

Adjusting for the undecideds

Adjusting for swing voters, the new results are now the line (8) of the table.

options(digits=3)

wtd.polls[8,] <- data.frame(wtd.polls[7,1:4] +
                              wtd.polls[7,1:4] / 
                              sum(wtd.polls[7,1:4]) * 
                              wtd.polls[7,5], 
                              Swing=0, 
                              Wasting=wtd.polls[7,6], 
                              N=wtd.polls[7,7])
print(wtd.polls)
           Bolsonaro Haddad  Ciro Others  Swing Wasting     N
Datafolha      0.360  0.220 0.130  0.200 0.0400  0.0500 19552
Ibope          0.360  0.220 0.110  0.180 0.0500  0.0800  3010
MDA            0.367  0.240 0.099  0.155 0.0600  0.0780  2002
Ipespe         0.360  0.220 0.110  0.290 0.0200  0.0000  2000
VoxPopuli      0.340  0.270 0.110  0.130 0.0700  0.0800  2000
ParanaPesq     0.349  0.218 0.094  0.171 0.0460  0.1200  1080
7              0.359  0.225 0.122  0.195 0.0433  0.0561 29644
8              0.376  0.235 0.128  0.205 0.0000  0.0561 29644

Adjusting for the wasting votes

Adjusting for wasting votes, follows the same principle. The last line of the following tbale (9) has the new adjusted preference distribution, with correct sample size.

options(digits=3)

wtd.polls[9,] <- data.frame(wtd.polls[8,1:4] +
                              wtd.polls[8,1:4] / 
                              sum(wtd.polls[8,1:4]) * 
                              wtd.polls[8,6], 
                              Swing=0, 
                              Wasting=0, 
                              N=(wtd.polls[8,7] - (wtd.polls[8,6] * wtd.polls[8,7])))

print(wtd.polls)
           Bolsonaro Haddad  Ciro Others  Swing Wasting     N
Datafolha      0.360  0.220 0.130  0.200 0.0400  0.0500 19552
Ibope          0.360  0.220 0.110  0.180 0.0500  0.0800  3010
MDA            0.367  0.240 0.099  0.155 0.0600  0.0780  2002
Ipespe         0.360  0.220 0.110  0.290 0.0200  0.0000  2000
VoxPopuli      0.340  0.270 0.110  0.130 0.0700  0.0800  2000
ParanaPesq     0.349  0.218 0.094  0.171 0.0460  0.1200  1080
7              0.359  0.225 0.122  0.195 0.0433  0.0561 29644
8              0.376  0.235 0.128  0.205 0.0000  0.0561 29644
9              0.398  0.249 0.135  0.217 0.0000  0.0000 27980
wtd.polls$N[9] *
  c(wtd.polls$Bolsonaro[9], (wtd.polls$Haddad[9] + wtd.polls$Ciro[9] + wtd.polls$Others[9]), 1 - wtd.polls$Bolsonaro[9] - (wtd.polls$Haddad[9] - wtd.polls$Ciro[9] - wtd.polls$Others[9]))+1
[1] 11146 16832 19708

Draw 1 million samples

Finally I draw a lot os samples from the posterior distribution using the weighted polls and uninformative priors to keep it simple.

poll <- c(4910, 3011, 1756, 2718)

library(SciencesPo)
library(MCMCpack)


### draw samples from the posterior
set.seed(1234)
MC <- 1000000

### Using uninformative prior (1,1,1,1)
#samples <- getDirichletSamples(MC, alpha = poll + rep(1,4))  


row= 9
prob2win = function(row, export=1){
  p=rdirichlet(100000,
  wtd.polls$N[row] *
  c(wtd.polls$Bolsonaro[row], wtd.polls$Haddad[row] + wtd.polls$Ciro[row] + wtd.polls$Others[row], 1 - wtd.polls$Bolsonaro[row] - wtd.polls$Haddad[row] - wtd.polls$Ciro[row] - wtd.polls$Others[row])+1)
  if(export==1){
    mean(p[,1]<p[,2]) ## No exceeds Yes?
  } else {
    return(p)
  }
}

Here we want to look at the margins of Bolsonaro over the combined opposition candidates. The more candidates contesting for the seat, the greater the probability that the winning candidate will receive only a minority of the votes cast.

We can also use the middle 95% range to represent the uncertainty. The numbers say about 20% of times in 1 million elections Bolsonaro appears ahead the opposition formula. Therefore, it’s very unlikely he could win the election in the primary election round.

samples = prob2win(row= 9, export=0)

combinedOpposition <- (samples[,2])
frontRunner <- (samples[,1])

margin <- (combinedOpposition - frontRunner)

quantile(margin, probs = c(0.025, 0.5, 0.975))
 2.5%   50% 97.5% 
0.192 0.203 0.215 

Finally, we can plot the posterior distribution of simulated elections where Bolsonaro is greater than the combined opposing votes. Based on the polling data at hands, and very little effort, we can believe the far-right nationalist candidate won’t make it this Sunday as quite a few press pundits are suggesting.

 hist(margin, 
      col="gray",
      prob = FALSE, # posterior distribution
      breaks = "FD", xlab = expression(p[Bolsonaro] > p[Opposition]),
      main = expression(paste(bold("Posterior Distribution of Elections With "),  p[Bolsonaro] > p[Opposition])));
# Bayes estimate (middle 95%)
abline(v=mean(margin), col='red', lwd=3, lty=3);

center

#reproducible