Why do UK election polls differ so much?

The UK election campaign has confounded expectations, most notably with Labour's surge in the opinion polls. From an average vote share of around 26% at the start of the campaign, just six weeks later they now stand at 36% in the polls conducted over the past seven days.

Few, if any, commentators expected Labour's support to be at this level as we head into the final week of the campaign.

One of the theories advanced to explain Labour's unexpectedly dizzying rise is that the opinion polls are, once again, wrong. Their historical tendency to over-state Labour support has not been adequately addressed by the pollsters since the debacle of 2015.

Key to this line of thinking is that Labour's support appears to be 'soft', in the sense that those who say they will vote Labour in the polls are more likely to also report that they may change their mind before election day, compared to Conservative 'intenders'.

Labour's core support is also concentrated within demographic groups that are, historically at least, less likely to cast a ballot, in particular younger voters.

Patterns of turnout across demographic groups will, of course, be key to determining the outcome of the election. But might turnout – and how pollsters deal with it – also be the cause of another polling miss on 8 June?

Who will turn out and who won't?

Adjusting for turnout is one of the most difficult tasks a pollster has. Polls work by collecting samples of individuals and weighting them to match the general public on characteristics such as age, gender, region and education for which the population distribution is known.

But around a third of any representative sample of eligible voters will not vote, so an additional adjustment has to be made to filter out likely non-voters from the vote share estimate. The problem here is that there is no entirely satisfactory way of doing this.

The most obvious approach to determining whether poll respondents will vote or not is to ask them. This is indeed the way that the vast majority of polls in the UK have approached turnout weighting in previous elections. In order to allow respondents to express some level of uncertainty, pollsters usually ask them to rate their probability of voting on a 1 to 10 scale (where 1 = certain not to vote and 10 = certain to vote).

The problem with this approach is that, for a variety of reasons, people are not very good at estimating their probability of voting. So predicted turnout data based on self-report questions tend to have high error rates, mainly of the 'false-positive' variety. Some pollsters use additional questions on turnout at previous elections to produce a turnout probability, but these also suffer from problems of recall.

A second approach is to use historical survey data containing a measure of actual turnout (either self-reported after the election or via validation of actual votes using the electoral register).

Such data is used to build a statistical model which predicts turnout on the basis of demographic characteristics of respondents. This 'historical' model can then be applied to current polling data in order to produce turnout probabilities based on actual patterns from the previous election.

While this gets round the problems with faulty reporting by respondents, we must believe with this approach that patterns of turnout haven't changed very much since the previous election, an assumption which cannot be tested at the time the estimates are required. And, as the EU referendum showed, sharp changes in patterns of turnout from one election to another can and do arise.

In sum, turnout weighting is an essential component of accurate polling, but there is no fail-safe way of doing it.

The inquiry into the 2015 election polling concluded that, although the turnout probabilities used by the pollsters in that election were not very accurate, there was little evidence to suggest these were the cause of the polling errors. Might inaccuracies in the turnout weights be more consequential in 2017?

Effect of turnout weighting on vote intention estimates

We can get some handle on this by comparing the poll estimates of the Conservative-Labour margin before and after turnout weights have been applied. The table below shows estimated Conservative and Labour vote shares before and after turnout weighting for eleven recently published polls.

It is clear that the turnout weights have a substantial effect on the size of the Conservative lead. Without the turnout weight (but including demographic and past-vote weights), the average Conservative lead over Labour is 5 percentage points. This doubles to 10 points after turnout weights have been applied.

Vote estimates with turnout weight			Vote estimates without turnout weight
Pollster	Fieldwork End Date	CON	LAB	CON	CON	LAB	CON
Pollster	Fieldwork End Date	(%)	(%)	lead	(%)	(%)	lead
ORB/Sunday Telegraph	4th June	46	37	9	44	38	6
IpsosMORI/Standard	1st June	45	40	5	40	43	-3
Panelbase	1st June	44	36	8	40	39	1
YouGov/Times	31st May	42	39	3	41	39	2
Kantar	30th May	43	33	10	40	34	6
ICM/Guardian	29th May	45	33	12	41	38	3
Survation (phone)	27th May	43	37	6	43	37	6
ComRes/Independent	26th May	46	34	12	43	38	5
Opinium	24th May	45	35	10	42	36	6
Survation (internet)	20th May	46	34	12	43	33	10
GfK	14th May	48	28	20	45	29	16
Mean	= 10	Mean	= 5
S.D.	= 4.5	S.D.	= 4.9

Particularly notable are the Ipsos-MORI estimates, which change a 3-point Labour lead into a 5-point lead for the Conservatives. Similarly, ICM's turnout adjustment turns a 3-point Conservative lead into a 12-point one. It is also evident that pollsters using some form of demographic modelling to produce turnout probabilities tend to have somewhat higher estimates of the Conservative lead.

For this group (Kantar, ICM, ORB, Opinium, ComRes), the turnout weight increases the Conservative lead by an average 5.4 points compared to 3.7 points for those relying on self-report questions only.

It is also worth noting that the standard deviation of the Conservative lead is actually slightly lower with the turnout weights (4.5) than without (4.9). So, the turnout weighting would not appear to be the main cause of the volatility between the polls that has been evident in this campaign.

This pattern represents a substantial change in the effect of the turnout weights compared to polls during the 2015 campaign, where the increase in the Conservative lead due to turnout weighting was less than one percentage point (for the nine penultimate published polls conducted by members of the British Polling Council).

Why is turnout weighting having a bigger effect now than it did in 2015? One reason is that many pollsters are applying more aggressive procedures than they did in 2015, with the aim of producing an implied turnout in their samples that is closer to what it will actually be on election day.

While there is a logic to this approach it seems, in effect, to rely on getting the turnout probabilities wrong in order to correct for over-representation of likely voters in the weighted samples.

A second reason turnout weighting matters more in this election is that the age gap in party support has increased since 2015, with younger voters even more likely to support Labour and older voters the Conservatives. Thus, any adjustment that down-weights younger voters will have a bigger effect on the Conservative lead now than it did in 2015.

Corbyn-mania among younger voters?

Another idea that has been advanced in some quarters is that young voters are over-stating their likelihood to vote in this election even more than they did in 2015. Come election day, these younger voters will end up voting at their recent historical levels and Labour will underperform as a result.

We can obtain some leverage on this by comparing the distributions of self-reported likelihood to vote for young voters, aged 18-24, in 2015 and 2017 (the 2017 figures are from the polls in the table above, the 2015 estimates are taken from the penultimate published polls in the campaign). We also present these estimates for the oldest age category (65+).

There is no evidence here that younger voters are especially enthused in 2017 compared to 2015. And, while the implied level of turnout is substantially too high for both age groups, the 20-point gap between them is broadly reflective of actual turnout in recent elections.

The inquiry into the 2015 polling miss found that representative sampling was the primary cause of the under-statement of the Conservative lead. The fact that implied turnout is still so high in the current polls suggests that the representativeness of samples remains a problem in 2017, on this measure at least. Turnout weighting is having a much bigger effect on poll estimates now than it did in 2015.

This may be because the pollsters have improved their methods of dealing with the tricky problem of turnout weighting. However, it also suggests that getting turnout weighting right in 2017 is likely to be both more difficult and more consequential than it was in 2015.

This article originally appeared on a University of Southampton blog.

General election