Chris Hanretty, University of East Anglia

(building on work by)

Benjamin Lauderdale, London School of Economics

Nick Vivyan, Durham University

Our model combines data provided by the British Election Study with all publicly released national polls, historical election results, and historical polling. To read commentary on the election using these forecasts, follow Election4castUK on Twitter. If you would like to give us feedback on this forecast, please email us at c.hanretty@uea.ac.uk

Party | Lo | Seats | Hi | Swing |
---|---|---|---|---|

Conservatives | 318 | 366 |
412 | 36 |

Labour | 162 | 207 |
257 | -25 |

Liberal Democrats | 2 | 7 |
14 | -1 |

SNP | 35 | 46 |
54 | -10 |

Plaid Cymru | 1 | 3 |
4 | 0 |

Greens | 0 | 1 |
2 | 0 |

UKIP | 0 | 1 |
5 | 0 |

Other | 1 | 1 |
1 | 0 |

Party | Lo | Votes | Hi | Swing |
---|---|---|---|---|

Conservatives | 38.6% | 43.8% |
49.0% | 6.2% |

Labour | 28.0% | 33.1% |
38.1% | 2.0% |

Liberal Democrats | 4.3% | 8.6% |
12.8% | 0.5% |

SNP | 3.5% | 4.1% |
4.7% | -0.8% |

Plaid Cymru | 0.5% | 0.7% |
0.9% | 0.1% |

Greens | 0.0% | 2.2% |
6.0% | -1.7% |

UKIP | 2.5% | 6.9% |
11.5% | -6.0% |

Other | 0.0% | 1.1% |
4.8% | 0.1% |

Scenario | Probability |
---|---|

Conservative Majority | 0.95 |

Hung Parliament | 0.05 |

Labour Majority | 0.00 |

Will any party have 326 or more seats?

Scenario | Probability |
---|---|

Conservative Plurality | 1.00 |

Labour Plurality | 0.00 |

Which party will have the most seats?

Our current prediction is that there will be a majority for the Conservatives, who will have 366 seats. The sidebar at the right includes predicted probabilities of the key outcomes of the election, as well as vote and seat forecasts for each party with 95% uncertainty intervals. This forecast was last updated Thursday 08 Jun 2017.

- And now the party forecast...
- Conservatives. Seat gain almost certain. Majority almost certain. Plurality almost certain.
- Labour. Seat loss very likely. Majority very unlikely. Plurality very unlikely.
- Liberal Democrats. Seat loss probable.
- SNP. Seat loss almost certain.
- Plaid Cymru. Seat loss possible.

- Greens. Seat loss possible.
- UKIP. Seat loss moderately unlikely.

When reading our seat predictions, please keep in mind that our model may not know as much about your specific seat of interest as you do. The model knows how the general patterns of support across the UK have changed in constituencies with different kinds of political, geographic and demographic characteristics. In particular, the model does not know whether your MP is beloved by constituents or embroiled in scandal, nor what the implications of that might be.

Some of this might be picked up in the polls, but not all of it will be, and we do not have much polling data to go on when it comes to constituencies. In the aggregate, these aspects of constituency-specific competition tend to average out across parties, but they certainly matter in individual constituencies. Think of our seat-level projections as a baseline for what you might expect from past election results, geography and demography, plus a little bit of polling data.

The following tables focuses on potential seat gains and losses for each of the parties, including only those seats for which the probability of a change of control is estimated at over 10%. If the table is blank, there are currently no such seats.

Conservatives: | Gains | Losses |

Labour: | Gains | Losses |

Liberal Democrats: | Gains | Losses |

SNP: | Gains | Losses |

Plaid Cymru: | Gains | Losses |

Greens: | Gains | Losses |

UKIP: | Gains | Losses |

The following table provides the individual seat predictions, aggregated up to England, Scotland and Wales. Please note that these may not exactly match the totals in the main forecast table, as they are based on the individual seat forecasts..

Con | Lab | LD | SNP | PC | GRN | UKIP | Oth | |
---|---|---|---|---|---|---|---|---|

England |
356 | 175 | 0 | 0 | 0 | 1 | 0 | 1 |

Scotland |
6 | 1 | 1 | 51 | 0 | 0 | 0 | 0 |

Wales |
6 | 30 | 1 | 0 | 3 | 0 | 0 | 0 |

The following table provides the individual seat predictions (columns), aggregated by the party that won the seat at the 2010 general election (rows). Please note that these may not exactly match the totals in the main forecast table, as they are based on the individual seat forecasts..

Con | Lab | LD | SNP | PC | GRN | UKIP | Oth | |
---|---|---|---|---|---|---|---|---|

2015 Con |
323 | 7 | 0 | 0 | 0 | 0 | 0 | 0 |

2015 Lab |
35 | 197 | 0 | 0 | 0 | 0 | 0 | 0 |

2015 LD |
4 | 2 | 2 | 0 | 0 | 0 | 0 | 0 |

2015 SNP |
5 | 0 | 0 | 51 | 0 | 0 | 0 | 0 |

2015 PC |
0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 |

2015 GRN |
0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |

2015 UKIP |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

2015 Oth |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |

- Who are you?
- Were you wrong in 2015?
- Will you be wrong again?
- What information do you use in this forecast?
- Why do the forecasts change so slowly?
- If you are wrong, how are you most likely to be wrong?
- What does it mean when you say a party has a 95% chance of between X and Y seats?
- Why is there so much uncertainty in your forecasts?
- What about Northern Ireland?
- What about Wales?
- How many seats is a majority?
- What are the the "house effects" for the pollsters?
- Why do your individual seat predictions not add up to your aggregate seat predictions?
- Why do you only use historical data back to 1979?
- What do statements like "moderately unlikely" mean?
- What software do you use?
- How did you choose these lovely/garish website colours?
- Do you have any conflicts of interest?
- Do you have any acknowledgements?

Electionforecast.co.uk is run by Chris Hanretty from the University of East Anglia, Ben Lauderdale at the LSE, and Nick Vivyan at Durham University. Chris Hanretty is responsible for the 2017 forecasts.

In 2015 we predicted that the Conservatives would be the largest party, but we categorically ruled out a Conservative majority.

You can see what our 2015 forecast looked like here.

It's hard for probabilistic forecasts to be wrong -- but if you say that the probability of something happening is close to zero, and it happens, then you're wrong.

We've learnt from what went wrong in 2015. Some of the modelling choices that we've made reflect things that went wrong in that election. Hopefully in this election we'll do better than we did in the last election. But if the polls in 2017 are as wrong as they were in 2015 (or in 1992), then our forecasts will also be inaccurate.

This forecast is based on several different sources of information. These include *past election results*, *current* and *historic national polling*, *individual polling*, and *information about constituencies*.

We use information on election results from 1979 onwards to help us model the outcome of the 2017 election. This information is useful in two ways. First, it helps us set bounds on likely outcomes. The Conservative party is unlikely to get less than 10% of the vote, or more than 60% of the vote. Second, past election results help us calibrate the relationship between polls and the outcome. If we know how informative polling was in previous elections, that helps us when we using current polling to predict this year's elections.

Many pollsters poll GB voting intention continuously, whether there is an election soon or not. You can see lists of polls on UK Polling Report or Wikipedia. If all polling companies produced a poll every day with the same methods and the same sample size, we could take a simple average of these polls, and use this as our best guess of the true support for each party. Unfortunately, polls are carried out using different methods by different companies at varying intervals and with smaller or larger samples. We therefore *pool the polls* to get an estimate of relative party support across Great Britain for every day during the year before the election, using an assumption that relative party support is changing slowly to smooth out the gaps between the polls.

We use a variant of an idea developed by Stephen Fisher following Erikson and Wlezien for determining how to use current pooled polling to predict the election day vote share for each party nationwide. The basic principle is that polling has some systematic biases, in particular a tendency to overstate changes from the previous election. We used historical polling data starting with the 1979 election compiled by UK Polling Report to calibrate how much weight we should put on past electoral performance relative to current polling performance, and how those weights should change as we approach the election.

Aggregate polling helps us forecast parties' national vote shares. What matters, though, is how many *seats* each party wins. In order to forecast seat shares, we need to know how well each party will do in each constituency.

We use individual polling responses to the December 2016 British Election Study as the basis for our seat predictions. We model how individuals respond as a function of the characteristics of the constituency they live in.

This gives us a model-based prediction for each seat as of December 2016. On the basis of tests on 2015, we know that the model-based prediction on its own can perform poorly. We therefore blend these model-based predictions with the results of applying a uniform national swing based on how the parties polled in December 2016. We then take this blended estimate, and bring it in line with our forecast national vote shares.

Our model of constituency outcomes is based on constituency characteristics. By constituency characteristics, we mean things like past vote and incumbent party, as well as region and how the constituency voted in the 2016 EU referendum. The more strongly these characteristics are related to individuals' vote choice, the more confident we can be in estimating constituency vote shares, even for constituencies where we only have a few observations in the raw data. For the purposes of prediction, we don't need these characteristics to *cause* people to vote in any particular way: correlation is enough.

Because the forecast has a lot of inertia -- as it should. Polls have sampling error. Pollsters also have systematic biases, because surveying a random sample of the people who will choose to turn out to vote *at some point in the future* is very difficult. Different pollsters make different choices about how to best approximate this, which is why our model includes *house effects*. So the estimate of where current polling puts the parties will only change noticeably if changes are evident across multiple polls from multiple pollsters. In addition to requiring many polls to show a shift in party support, the forecast puts weight on both past vote share as well as current polling, with the weight on the latter increasing as the election approaches. We estimated the optimal weighting of past vote share and current polling based on polling leading up to elections from 1979 forward. This means that even when all the polls show a change, if it is far from the election, the change in our forecast vote share will be substantially smaller than the change in pooled polls.

In the last election, we were worried about modelling Lib Dem losses and UKIP gains. In the end, we got UKIP exactly right, but predicted too many Lib Dem seats.

This time, we're still worred about the Liberal Democrats and UKIP. We're worried about the Liberal Democrats because our model may not be sensitive enough to pick up pockets of Liberal Democrat strength. (This applies generally to all smaller parties).

We're worried about UKIP for different reasons. UKIP has decided to stand only in certain seats in the country. Where UKIP does not stand, we have to make assumptions about what happens to their vote. These assumptions may be wrong, or not detailed enough. We'll say more about these assumptions when the final lists of candidacies are published.

At the level of individual seats, there are lots of factors that may matter, that we are not measuring. We don't know whether we'll see a particularly strong performance for the Bus Pass Elvis Party, or unduly heavy rain in that region on election day, or whether the local MP is embroiled in a scandal. If there is something systematic that might affect the results across a range of constituencies, and which can be measured, let us know.

Our forecast is based on a Bayesian model that incorporates the various sources of information described above. The model reflects what we believe are reasonable assumptions about how to combine these sources of information, but we could be wrong. The intervals we report are the central 95% of the posterior distribution for seats or for vote share. This means that our model, given the data we have so far, indicates that there is a 5% chance that the quantity in question will fall below the lower bound (Lo) and a 5% chance that the quantity in question will fall above the upper bound (Hi). Thus, there is a 95% chance that the true figure will fall between the numbers marked Lo and Hi.

These intervals, as well as the *mean posterior* estimates that we report as our primary prediction, are derived from an MCMC estimate of the entire distribution of possible outcomes for each of the parties.

Most of the uncertainty in our predictions comes from the fact that *even immediately before election day* general election polls in the UK have not been very accurate.

One consequence of this is that even on election day, we will have substantial uncertainty in our estimates. The forecasts will get more precise, but not until very close to election day.

This year we are not producing forecasts for Northern Ireland. There is very limited (aggregate) political polling in Northern Ireland, and we do not have access to any individual polling on which basis to make seat forecasts.

At the moment, the forecast is very pessimistic about Plaid Cymru's chances of holding on to the seats it won in the 2015 General Election. This doesn't match predictions based on uniform national swing, which would see Plaid fall back, but not by so much that they would lose seats.

We suspect this results from a limitation of the data we have. We have information on far fewer Welsh respondents to wave 10 of the BES, and Plaid Cymru supporters are a small proportion of those respondents. Consequently, it's difficult to tell whether strong Plaid support in one region is the result of genuine support or sampling error. We suspect we will under-estimate Plaid's support absent better (=more abundant) data.

We use 326 as the standard for a majority, even though the non-voting Speaker plus the abstaining Sinn Fein MPs reduce the number of votes required to survive a confidence vote to 323 (given the current number of Sinn Fein MPs).

The house effects describe systematic differences in support for the various parties that do not reflect sampling variability, but instead appear to reflect the different decisions that pollsters make about how to ask about support for smaller parties, about weighting, and about modelling voter turnout. Here are the current estimates of the house effects for each polling company, for each party.

Imagine there are 3 constituencies that we estimate each have a 2/3 chance of going to Labour, with the remaining 1/3 for the Conservatives. If we want to make our best guess for each constituency individually, we would predict Labour in all three constituencies. However, if we wanted to make our best guess as to the total number of Labour seats, we would predict 2 total Labour seats rather than 3. The discrepency between our individual seat predictions and our aggregate seat predictions arises from this kind of difference, across many constituencies, with varying and non-independent probabilities, across many parties.

We use data starting in 1979 for two reasons. In 1974 there were UK general elections in both February and October due to a hung parliament after the February election and the inability of any set of parties to form a majority coalition. Having two elections in 1974 makes studying the trajectories of the polls in the run-up to the October election difficult. Second, the further we go back, the greater risk we have that polling performance has changed fundamentally, and so it makes sense to stop at some point.

Probability | Name |
---|---|

0-10% | very unlikely |

11-25% | unlikely |

26-40% | moderately unlikely |

40-59% | possible |

60-74% | probable |

75-89% | very likely |

90-100% | almost certain |

This scale comes from "Quantitative meanings of verbal probability expressions" by Reagan, Mosteller and Youtz.

The core of our system for estimation and reporting of our forecasts is the R programming language. Our pooling-the-polls model is implemented in JAGS, called directly from R scripts. Our reports are generated using ggplot2 and pandoc. The pipeline is automated: each day we drop in new data, and then a master script re-estimates the model, re-generates the report you are currently reading, and uploads it to this web site.

We thank YouGov for data access, the University of East Anglia for funding, and Stephen Fisher, Simon Hix and Jouni Kuha for helpful conversations.

We also thank Roger Scully for passing on data from the Welsh Political Barometer.

A number of polling companies have now moved to constituency-specific prompts. These prevent or reduce the chances that respondents will say they intend to vote for UKIP in seats in which UKIP is not standing. Accordingly, we have removed the adjustment for the UKIP (and Green) vote share, with knock on consequences for other parties.

Additionally, we have incorporated new constituency-level data from ICM, generously supplied by Martin Boon. Thanks Martin!

We changed the model for predicting seat level outcomes. The model component is now based on a Dirichlet multinomial model, which allows for some overdispersion. The uniform national swing component is now stochastic. Both of these technical changes mean that the constituency level outcomes are more variable, and the 95% forecast intervals are wider. This is as it should be.

We updated our forecasts to include data from January's Welsh Political Barometer, kindly donated by the most fashionable of psephologists, Prof. Roger Scully. Including this data has moved our forecast for Plaid Cymru from 1 seat (range: 0 to 2) to 2 seats (range: 1 to 4).

We updated our forecasts to take account of the fact that not all parties are standing in all constituencies. We've relied a lot on the data crowd-sourced by by Democracy Club, who are amazing. If you see a non-zero prediction for a party that's not standing in a constituency, please let us know.

We published the first forecasts for the 2017 general election.