Status	:	Completed
Type	:	Deep Learning Model
Contributor(s) ¹	:	Hanzholah Shobri Github, LinkedIn M Noorosyid Sulaksono Github, LinkedIn
Repository	:	https://www.github.com/hanzholahs/brexit-polarity-tweets
Analysis	:	Part 1: Exploratory Data Analysis (click here) Part 2: Deep Learning Model (click here)

Background

The Brexit was a term that refers to the withdrawal of the United Kingdom (UK) from the European Union. It is a combination of Britain and exit. The country is the sole country to ever left the EU. As Brexit has significant implications to the people of the UK, diversing opinions (positively and negatively) arose with the event. Consequently, there is a polarity among Twitter users which then can be observed by the tweets.

Dataset

These Brexit Polarity Tweets dataset were collated by Visalakshi Iyer as part of a Master’s dissertation project. This Twitter dataset covers the January - March 2022 period and comprises tweets relating to Brexit or Europe from Twitter accounts with publicly stated Brexit positions in their bio.

The Boolean search for pro-Brexit tweet is:

(bio:“Brexit support” OR bio:“pro-brexit” OR bio:“pro brexit” OR bio:“Pro #Brexit” OR bio:brexiteer OR bio:probrexit) AND (EU OR Brexit OR CUSTOMS OR EUROPEAN OR EUROPE OR #Remain OR *Brexit OR #rejoinEU)

The Boolean search for anti-Brexit tweet is:

(bio:“anti brexit” OR bio:“anti-brexit” OR bio:“antibrexit” OR bio:“Pro remain” OR bio:“pro-remain” OR bio:remainer) AND (EU OR BREXIT OR CUSTOMS OR EUROPEAN OR EUROPE OR #Remain OR *Brexit)

Objectives

This project aims to achieve two objectives:

Explore how the British perceive Brexit based on their tweets data
Develop Deep Learning model to predict the sentiment of the Brexit dataset (pro or anti).

Technology

The analysis was conducted in Python. Several important libraries to process tabular data (the original format) are numpy and pandas, and visualisation is created using matplotlib and seaborn. In addition, re and nltk libraries were essential in processing text data. To help with deep learning model training, GPU computation in Google Colab was leveraged supplemented by Kaggle API for accessing data and Google Drive for saving the models. Sentences were embedded into vector representations by leveraging the pre-trained GloVe.

Footnotes

Contributors are ordered alphabetically↩︎