People and #Brexit: an analysis of public opinion at the time of Twitter

February 28, 2020


The fateful day has come; on the night of 31st January, the United Kingdom officially left the European Union. Since the 2016 referendum, many political experts have speculated about the possible consequences of the UK’s withdrawal from  the EU and the measures which should be adopted in response to it. However, the majority of the actual outcomes won’t become visible in the short run; in particular, the effective implementation of post-Brexit policies both on the side of the UK and of the EU will require some time.


In the meantime, there are other ways to obtain valuable knowledge from the world around us. Nowadays, social networks are the epicentre of the exchange of information. For this reason, many fields of data analysis aimed at extracting information from social networks are gaining momentum, such as the Natural Language Processing and the Social Network Analysis. I decided to use some of these tools to analyse the evolution of the public’s views with respect to Brexit in the days preceding and following the event.



The dataset

I collected approximately 100,000 English-language tweets containing the hashtag #Brexit from almost 60,000 distinct users around the globe between the 27th January and the 5th February 2020 – five days before and five days after the event. The location of 75% of these tweets has been retrieved; the analysis has been done on a worldwide basis, the users originate from 206 different countries, covering all the 27 states of European Union. These tweets have been filtered to ignore all retweets and all tweets containing a media (gif, image or video). Roughly 20% of the tweets are quotes of other tweets (meaning that they refer to another tweet, but they add some personal content) while approximately 22% of them were replies to other tweets.



Temporal and spatial analysis

The first step consists in analysing the distribution of the tweets both temporally and geographically speaking.

The first graph shows the number of tweets published each day; the majority of them were sent before the withdrawal (61,849 before against 37,825 after) although the 31st January accounts for more than one third of all posts. Broadly speaking, more than half of the tweets were sent on the two days immediately preceding and following the event. Apart from the anomalous posts flow of these two days, we can see that the number of tweets increased just before the day of Brexit and it gently decreased right afterwards.



Looking at their spatial distribution in the choropleth map below, we notice that more than half of the total tweets were sent from the United Kingdom, followed by the United States (15%), Ireland (3.5%) and then by some of the major European countries such as Germany, France, Belgium, Austria, Spain and Italy and some other important countries such as Canada or India. This data, however, are not an optimal proxy of the actual geographical distribution of tweets as only tweets written in English were taken into account; furthermore, this are raw numbers and so to measure the actual impact of Brexit on a country level, they should be measured against the population of the country or, even better, the active Twitter population of each country. In fact, normalizing this data for the Twitter population density obtained from Datareportal, we see that in Ireland Brexit was discussed even more than in the United Kingdom (by 0.14% against 0.13% of Twitter users per country).




Words matter

The next phase consists in taking a closer look at the content of the examined tweets. The most common words seem to pertain to the semantic fields of time (Now, Day, After,…), politics (Brexitday, Boris Johnson, Voted,…) and socioeconomic factors (People, Trade,…). I found quite interesting the fact that the word People had such a high occurrence, even higher than many words from political jargon, confirming that the event was indeed perceived as a phenomenon directly connected to the population rather than to the government alone.


Further insights may be extracted studying the correlation of words through a co-occurrence network rather than looking at them individually. For a matter of clarity, only the words that appeared in at least 1% of the tweets were taken into consideration (roughly one-third of the words used in the tweets). In a co-occurrence network, each word is represented by a node and it is linked to other words by some edges; the more central the node is, the more connections the word presents. In fact, if the word Brexit had not been removed, it would appear at the center of the network. It follows that eu and uk were strongly correlated with Brexit as they are also pretty central and have, not surprisingly, many connections to other words. In particular, eu shows the highest degree of centrality – 900 links – approximately 15% more connections than uk.

The other central terms fall in the semantic fields mentioned above: people and trade in society and economy, day in time and borisjohnson in politics (the fact that name and surname are linked suggests that it is an hashtag, meaning that the figure of Boris Johnson is presumably even more central in reality as the correlation between “Boris Johnson” and other words is not present in the network).



Sentiment and emotion analysis

Finally, I decided to analyse the sentiment and the emotions of these tweets in order to understand the Twitter users’ thoughts on Brexit.


Both the sentiment and emotion of each tweet was assessed through the assignment of a score based on the co-occurrence of certain words and expressions. For the sentiment analysis, the score ranged from -1 to 1, where -1 stands for completely negative, 1 for completely positive sentiment and 0 for neutral. For the emotion analysis, the division was made on the basis of Ekman theory, which identifies six main types of primal emotions – anger, sadness, joy, surprise, disgust, fear – and the relative weight of each of them has been assessed.

For the sentiment analysis, neutral tweets were eliminated to better highlight the contrast; the dataset was restricted to approximately 64000 tweets. In general, the distribution of negative and positive sentiments seems quite uniform in terms of the number of tweets, with a slightly higher amount of negative tweets (3.64 percentage points). The stark divergence concerns the intensity rather than the quantity: it seems that, on average, the sentiment score of positive tweets is considerably higher than that of the negative tweets in absolute terms – roughly, a 20 percentage point difference. That is to say, according to this overview, the feelings of people about Brexit are rather mixed, but the emphasis of people who are satisfied by the event is much higher.


Another graph, another story. Here many facts are presented. The bars represent the average sentiment score for a given time slot at a given date (each time slot covers 6 hours, with a total of 4 bars per day). In general, it can be seen that the average tended to fluctuate massively up until the day of Brexit, when it stabilised on a persistent positive trend. The highest negative score was reached at the end of 29th January when the European Parliament approved the withdrawal agreement, while the positive peak came at the dawn of the 4th February – the day after the publications of British government’s main policy objectives for the post-Brexit era.



The line graphs describe the number of positive and negative tweets per day. The two trends tend to be roughly aligned, with a slight general prevalence of negative tweets, particularly in the days preceding Brexit. The most drastic difference amounts to slightly more than 1500 and it is reached on the night of the 31st. It can be noticed that when the average score is negative, the number of negative tweets prevails over the number of positive ones, while the inverse isn’t true. Namely, it can be inferred some kind of relation between the quantity of negative tweets and the predominance of negative views, while it doesn’t emerge for positive tweets.


The second part of the analysis concerns the distribution of emotions rather than sentiment. The spider charts below show the number of tweets per emotion and the average sentiment score per dominant emotion. The most frequent emotions are surprise, joy and fear with a marked predominance of the latter. However, the emotions prevailing in intensity are joy, followed by sadness and fear. In other words, what can be inferred from this data is that the majority of users are frightened by Brexit, but the enthusiasm expressed by joyful tweets is stronger in its intensity.