Status | : | Completed |
Type | : | Deep Learning Model |
Contributor(s) 1 | : | Hanzholah Shobri Github, LinkedIn M Noorosyid Sulaksono Github, LinkedIn |
Repository | : | https://www.github.com/hanzholahs/brexit-polarity-tweets |
Analysis | : | Part 1: Exploratory Data Analysis (click here) Part 2: Deep Learning Model (click here) |
Background
The Brexit was a term that refers to the withdrawal of the United Kingdom (UK) from the European Union. It is a combination of Britain and exit. The country is the sole country to ever left the EU. As Brexit has significant implications to the people of the UK, diversing opinions (positively and negatively) arose with the event. Consequently, there is a polarity among Twitter users which then can be observed by the tweets.
Dataset
These Brexit Polarity Tweets dataset were collated by Visalakshi Iyer as part of a Master’s dissertation project. This Twitter dataset covers the January - March 2022 period and comprises tweets relating to Brexit or Europe from Twitter accounts with publicly stated Brexit positions in their bio.
The Boolean search for pro-Brexit tweet is:
(bio:“Brexit support” OR bio:“pro-brexit” OR bio:“pro brexit” OR bio:“Pro #Brexit” OR bio:brexiteer OR bio:probrexit) AND (EU OR Brexit OR CUSTOMS OR EUROPEAN OR EUROPE OR #Remain OR *Brexit OR #rejoinEU)
The Boolean search for anti-Brexit tweet is:
(bio:“anti brexit” OR bio:“anti-brexit” OR bio:“antibrexit” OR bio:“Pro remain” OR bio:“pro-remain” OR bio:remainer) AND (EU OR BREXIT OR CUSTOMS OR EUROPEAN OR EUROPE OR #Remain OR *Brexit)
Objectives
This project aims to achieve two objectives:
- Explore how the British perceive Brexit based on their tweets data
- Develop Deep Learning model to predict the sentiment of the Brexit dataset (pro or anti).
Technology
The analysis was conducted in Python
. Several important libraries to process tabular data (the original format) are numpy
and pandas
, and visualisation is created using matplotlib
and seaborn
. In addition, re
and nltk
libraries were essential in processing text data. To help with deep learning model training, GPU computation in Google Colab was leveraged supplemented by Kaggle API for accessing data and Google Drive for saving the models. Sentences were embedded into vector representations by leveraging the pre-trained GloVe.
Footnotes
Contributors are ordered alphabetically↩︎