Polling failure, Big Data and the Battle for Warsaw
Like in the UK, US and Australia, pre-polling was found wanting during the recent European Parliament election in Poland. Could big data be used to more accurately predict the next general election in November?
The 2019 European Parliament election in Poland will be remembered as an election of surprises. A record voter turnout of 45.68% was registered this year, which was nearly double the amount in 2014. The vote was polarised between the two major parties along metro-rural lines, with the Law and Justice party (Prawo i Sprawiedliwość: PiS) winning 56.3% of the vote in towns and villages, while the opposition European Coalition (Koalicja Europejska: KE) took out 51% of the vote in the larger cities.
But perhaps the most important result that had the pollsters scratching their heads was the overwhelming victory of PiS over Koalicja. PiS won 45.38% of the vote compared to Koalicja’s 38.47%, amounting to an almost seven-point gap. This result was in stark contrast to the pre-polling average data for May that had PiS on 37.9% and Koalicja on 36.9%, although the last IBRiS and IPSOS surveys had pushed the gap out to 3.6 and 5 percentage points respectively. In fact, five surveys had even placed Koalicja ahead of PiS in the lead up to the election.
The size of the victory is perhaps not as significant as the overall distribution of votes. A number of smaller parties, such as Konfederacja and Kukiz, failed to win their predicted allotment of 5% each. Only Robert Biedroń’s new liberal party, Wiosna (Spring), surpassed the threshold with a return of 6.06%. While PiS might have trimmed off a few percentage points from their rivals on the right, Koalicja lost some of its traditional support to Wiosna. A rural defection from coalition partner the Polish People’s Party (Polskie Stronnictwo Ludowe: PSL) did not help Koalicja’s cause.
Polish pollsters are not alone
The issue of survey reliability has recently plagued the survey research industry all around the world. First, Brexit left the pollsters red-faced in June 2016 and then Donald Trump defied the pre-polls to take the White House. A similar outcome occurred during the Australian federal election just one week before the European Parliament elections, where the embattled government pulled off an unlikely victory.
The methodology used by the polling industry has since come into question. The view is that the science has not kept up with changes in technology and our lifestyles.
The decline of the landline phone has created issues. Once a respondent’s fixed location (and therefore electorate) could be automatically recorded by a phone number. It is much more problematic with mobile phones. Survey response rates have been affected by the advent of robo-calling. Furthermore, despite the extra anonymity on offer, respondents might be more inclined to provide rushed or disingenuous answers to a robot rather than a discerning human being. There is also the claim that sample sizes have dwindled.
What now? Is it time to turn to Citta the elephant, the resident football psychic of Kraków Zoo?
The rise of Big Data
A new method has arisen out of Big Data to complement traditional survey research. Data analysts have begun to mine the mass of social media data freely available online to predict the outcomes of various elections. Take Professor Bela Stantic for example, the director of the Big Data and Smart Analytics lab at Griffith University in Australia. He has accurately predicted the trifecta of Brexit, Trump’s victory and the more recent election in Australia. In the Australian case, his algorithm predicted the result by cross-referencing two million social media comments from 500,000 unique accounts against 50 key terms.
“In a time where we have this huge amount of data in social media, it is obviously easier to extract correct information,” Professor Stantic told SBS News in Australia. “For example, if someone posts I’m in a rally for climate change – I can immediately calculate the sentiment – that it’s positive – and identify which party supports climate change.”
Apart from the obvious benefits of IP addresses, the main premise and difference to traditional polling are that these algorithms can gather voter sentiment from unconscious behaviour rather than stated opinion, which is open to self-censorship.
With the stockpiles of data increasing at the same pace as machine learning technology, this big-data approach to polling should only grow. There are, of course, hurdles for the technology to overcome. Ethical questions must be addressed, as seen with Cambridge Analytica. Issues around demographics and online representation are also bound to cause problems.
Namely, how will Professor Stantic’s algorithm gauge the political sentiment of babcia at the bazaar?