Skip to content
This repository has been archived by the owner on Jan 16, 2024. It is now read-only.

Latest commit

 

History

History
75 lines (47 loc) · 5.38 KB

File metadata and controls

75 lines (47 loc) · 5.38 KB

Sentiment Analysis of COVID-19 Tweets Actions Status

download

Hello there! The proposed approach is divided into four phases: 1) pre-processing, 2) keyword trend analysis, 3) word embeddings for feature extraction, and 4) classification methods. The CovidSenti dataset is divided into two chunks, training and testing. We take care of the various factors of the dataset, such as over-fitting, noisy or small and large datasets. The main objective of this study is to evaluate the classification performance of state-of-the-art classifiers on the COVIDSenti dataset and then attempt to improve performance by extracting key features of tweets. The proposed technique classifies the CovidSenti dataset with higher accuracy and competently for the COVIDSenti dataset containing COVID-19 associated Twitter posts

About Us

Development Phase

For the Model Training and Validation the fastai approach was used along with Keras(Tensorflow 2.0) and Pytorch. The development phase of the project is divided in 5 phases:

  1. Data Collection and Cleaning.
  2. Exploratory Data Analysis and Preprocessing.
  3. Model Training and Sentiment Extractor
  4. Create a Web File

Website Development

The Development of the website is divided into 4 phases:

  1. Public Sentiment Analysis
  2. Real Time Sentiment Analysis
  3. Twitter Live Feed Analysis
  4. Live Case count

Notebooks

Data Collection and Data Cleaning

Collecting data for training the ML model is the basic step in the machine learning pipeline. The predictions made by ML systems can only be as good as the data on which they have been trained. Following are some of the problems that can arise in data collection:

  1. Inaccurate data. The collected data could be unrelated to the problem statement.
  2. Missing data. Sub-data could be missing. That could take the form of empty values in columns or missing images for some class of prediction.

Data cleaning is one of the important parts of machine learning. It plays a significant part in building a model. It surely isn’t the fanciest part of machine learning and at the same time, there aren’t any hidden tricks or secrets to uncover. However, the success or failure of a project relies on proper data cleaning.

Exploratory Data Analysis

EDA is the approach for analyzing the dataset to summarise its main features. The dataset summaries can be of 2 types,

  1. Numerical Summary: Numerical summaries are summaries in terms of Numbers. Ex: Mean( Average), Median, etc…It can be either a) Univariate – Measure relies only on one variable or b) Bivariate – measure relies on two variables.
  2. Graphical Summary: Graphical summaries will be in the form of graphs. Ex: Histogram, Box-plot, etc…

The data set thus obtained after cleaning was then subjected to Exploratory Data Analysis (EDA) by plotting various types of graphs based on the sentiments and sentiment triggers, to gain valuable insights from the data. The frequency distribution graphs gives us a good perspective of the dataset and also gives us an insight into predicting the model's generalization capability. By plotting the graphs on the basis of sentiments and the sentiment triggers, it was clear that there was not much of a difference in the sentiment trend in tweets prevalent in India when compared to the rest of the world.

Model Training & Sentiment Extractor

A training model is a dataset that is used to train an ML algorithm. It consists of the sample output data and the corresponding sets of input data that have an influence on the output. The training model is used to run the input data through the algorithm to correlate the processed output against the sample output. The result from this correlation is used to modify the model. This iterative process is called “model fitting”. The accuracy of the training dataset or the validation dataset is critical for the precision of the model.

Model training in machine language is the process of feeding an ML algorithm with data to help identify and learn good values for all attributes involved. There are several types of machine learning models, of which the most common ones are supervised and unsupervised learning. Supervised learning is possible when the training data contains both the input and output values. Each set of data that has the inputs and the expected output is called a supervisory signal. The training is done based on the deviation of the processed result from the documented result when the inputs are fed into the model.

WebsiteWe

We create the website for:

  1. Public Sentiment Analysis
  2. Real Time Sentiment Analysis
  3. Twitter Live Feed Analysis
  4. Live Case count

Thanks for checking out the repo!

205451316-73b971b8-9132-4d53-8e22-e7e718ff1b3b 205451332-8a135418-11ca-4b4f-941c-91b54a3c18a8