Did Wikipedia hint at the Labour surge?

Fair to say, most people did not predict the result of the election. A few pollsters were correct – Survation and YouGov, take a bow – but most thought that even if May didn’t get the landslide she wanted, at least she’d gain and Labour would lose, right?

Then came shocks as safe-safe-safe Tory seats like Kensington and Canterbury, which have been Tory for decades, fell to Labour. The result was a hung parliament. Who saw it coming?

Maybe, Wikpedia.

You can view the page view counts for any Wikipedia page with the Pageviews tool. This has a couple of limitations (it struggles if a page’s name is changed, and sometimes pageviews are inflated by malfunctioning web crawlers) but it’s generally pretty good. (The Wikipedia Top 25 Report is always a good way of keeping up with what’s interesting the English-speaking world. Usually celebrity deaths, wrestling, politics, crazy facts on Reddit, and Bollywood)

According to this tool, in the two weeks in the run up to the election, the two most popular constituencies on Wikipedia were Maidenhead and Islington North – home to Theresa May and Jeremy Corbyn respectively – with over 25,000 views each and the two least popular ones were Motherwell and Wishaw (an SNP seat, narrowly held) and Merthyr Tydfil and Rhymney (ultra-safe Welsh Labour), with 758 and 687 views respectively. In between, there’s a huge variation.

There are some interesting blips in there: East Yorkshire got a huge boost when its MP* Greg Knight had his weird campaign advert go viral, while South Thanet got ridiculously high views on the day MP Craig Mackinley was charged over election expenses. In general, marginal seats were more popular than safe seats, but there are also some interesting hints at what was to come. Kensington, considered a safe Tory seat, was the 48th most popular constituency with 4575 views, putting it 8 places above the fiercely contested Derby North, the number 2 Con target, and just 3 places below Gower, their number 1 (it was also 20th most popular in London). Canterbury, with 3948 views, was as popular as Ilford North (Con target 7) and NE Derbyshire (Con target 18). Could it be that tactical voters were doing their homework?

This raises an interesting question – can you work out where political upheaval is possible, by looking at which constituencies people are researching? Let’s have a look!


In this analysis, I’ve excluded some seats. The pattern of exclusions may have biased the data slightly (since a disproportionate number of held seats are excluded), but I believe they are fair, and similar patterns are visible without these exclusions – they’re just much weaker. When I say “All constituencies”, I mean, “All constituencies except the following exclusions”.

The set of results I’m using comes from the good people at Britain Elects. They only do mainland GB, so Northern Ireland is excluded. The data also has a few errors – as this is just a quick analysis, I simply deleted obviously wrong entries (for example, Leicester West which showed Tories having more votes than Labour despite losing the seat). More rigorous analysis would fix these – and search the data more thoroughly for other mistakes.

Seats belonging to (ex-)party leaders and other big names got the highest views, so I have excluded these as pages that would be popular no matter what. Most of these were holds, but Sheffield Hallam (Nick Clegg), Gordon (Alex Salmond) and Moray (Angus Robertson) were gains. Seats won at the last election by any current or former party leader, or by any of the great offices of state and their shadow were therefore also excluded.

A handful seats in Newcastle and Sunderland (all holds) were also ridiculously popular. Why? Because they declared before 1 AM (Wikipedia uses UTC – midnight UTC is equivalent to 1 AM BST) and people looked them up to find out either the results, the candidates, or just where they were. For this reason, I’ve done most of the work excluding polling day. Where I did include polling day, I excluded all seats that declared before 1 AM.

Finally, I excluded the Speaker’s seat, for obvious reasons. For the mess in Rochdale, I treated the Labour candidate as the incumbent, and treated Danczuk as an independent challenger.

View counts from the Massviews tool on Wikimedia Foundation Labs. I had to get the results for East Yorkshire manually, because someone renamed it Yorkshire East and messed everything up. (Maybe I should have excluded it anyway, since its popularity was driven by external circumstances, but eh. Excluding it would only have made my results even stronger.)

To work out the correlation between page views and the gaining/holding of seats, I used the point-biserial coefficient.


At the most basic level, there is a definite connection between the number of page views a constituency gets, and whether or not it changed hands. On average, seats that changed hands got more views (3173 mean, 2784 median) than those that didn’t (2084 mean, 1706 median). Arrange the seats by popularity, and seats that changed hands are mostly in the upper half (Figure 1).

Figure 1: All constituencies ranked by popularity. Gains are highlighted.**

This held up across all regions (Figure 2) although it was stronger in London and the South of England, and weaker in Scotland. If we just look at England, only 3 of the 39 seats studied that changed hands were in the lower half (Figure 3). Case closed?

Figure 2: Mean views per constituency by region.

Figure 3: English constituencies ranked by popularity. Gains are highlighted

Not quite. The trouble is, it’s not very predictive – the correlation between views and odds of winning is a pathetic r=0.15. There’s huge variation in the number of page views, and there are lots of seats with high views that were held. Plot standard deviation error bars, and there is a lot of overlap (Figure 4).

Figure 4: Mean views in held and gained seats, with error bars of 1 standard deviation.

Besides, there is a very obvious reason why gained seats might be more popular on Wikipedia. You can’t realistically win a seat that isn’t marginal, and marginal seats are just more interesting. Who looks up Hull East, or North East Cambridgeshire, when we all know exactly what the result will be there? By contrast, ultra-tight marginals get a lot more interest.

So let’s narrow it down a bit, and look solely at marginals. These are the ones pollsters, politicians, pundits and the press really care about anyway.

The top 75 Labour targets, and the top 75 Conservative targets. From Great Yarmouth to Slough, pivoting on Gower and Chester, these were the battleground of this election. It was was expected that the Conservatives would sweep these – in the end, Labour ended up with most of them. The Tories won extremely few seats from that list (especially outside Scotland) in the end, and those they did win don’t seem to have been especially impressive in terms of Wikipedia views – in fact, the correlation is a junky r=0.08. But for Labour, there does seem to have been a quite strong link between Wikipedia views and their odds of taking a seat. The correlation coefficient is r=0.48 (rising to r=0.52 for England alone, and slightly to r=0.49 for Lab/Con marginals). The correlation for all marginal seats*** is r=0.25.

What’s really interesting though is in the “long tail” of Labour marginals, there’s a strong divide between holds and gains. Labour did not win any seat where they got less than 1500 views – not even Waveney, 20th on the list of targets, but which saw a huge 6% swing to the Tories. By contrast, wherever a marginal constituency got more than around 3,500 views, they usually won it – even relatively difficult seats like Reading East which saw a 6% swing in Labour’s favour.

Figure 5: Seats on Lab target list, with holds and gains marked. (Not all gains were gained by Labour – the Conservatives took Southport, for instance).

So this looks like a fairly decent correlation. What’s the explanation?

First may be that I’ve inadvertently p-hacked the data. The exclusions seem mostly reasonable, but my hypothesis only held for Labour seats, not really for all marginals as I’d initially expected. Is it fair to say “Wikipedia views suggest upsets in Labour target marginals, but not Tory targets?” I also looked at swing, but this was totally uncorrelated. Complicated dynamics of Lab-Con-LD-UKIP-SNP/PC mean that this year, constituency swing is a total mess. Is it fair to exclude that?

Second may be the fact that even among top 75 marginals, some are more marginal than others, and are still more likely to be gained and more “interesting”. The correlation disappears if you only look at the top 20 seats, but this is too small a sample to meaningfully analyse anyway.

Third may be that there’s an underlying factor. The seats Labour gained were generally young and educated – people who are engaged with politics and are internet literate. Those they failed to take, or even lost to the Tories (yes, 5 Labour seats actually went blue – working-class Leave constituencies like Middlesbrough South and Mansfield) are older and less educated, who are less likely to look up election information on the internet. Demographic analysis would tease this detail out.

Fourth is that this a genuine, interesting effect. People who want to vote tactically are looking up their constituencies on Wikipedia to see who came second last time, or trying to find information about their local candidates. It could be an early predictor (albeit a weak one) of a surprising race.

If you’re a running a political campaign, and you see a constituency is strangely popular on Wikipedia, maybe have a look into it. Who knows, you might be spotting the early days of a surge.

*Yes, I know they aren’t technically MPs while Parliament is dissolved. You’re very clever.

**I messed up and included the NI seats in the ranking, which is why the scale runs to 650, not 632. They aren’t plotted, though.

***Not quite 150, because some SNP and Lib Dem seats were targets for both Labour and Tories.

This entry was posted in Maths. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.