Multiclass classification task to perform emotion analysis of tweet using NLP
Objectives :
- Know effective ways to identify and classify emotions in Indonesian tweets.
- Obtain a model with good performance and accuracy in classifying emotions in Indonesian tweets on Twitter.
- Get prediction results for someone's emotions based on tweets that have been published.
Credits :
- Annisa Fitria Anwar Damanik
- Nasywa Safira Ardanty
- Kamal Muftie Yafi
- Rifa Nayaka Utami
- Dataset: Emotion dataset from Indonesian tweet obtained from Data Analysis Competition (DAC) Informatics Festival (IFest) 2023;
- Slang: Modified Kamus Alay based on Kamus Alay (Colloquial Indonesian Lexicon);
- Feature Extraction:
Bag-of-Words
,TF-IDF
; - Classifier:
Naive Bayes
,SVM
,Logistic Regression
,Decision Tree
.
This dataset was formed from Indonesian tweet containing five emotion values, namely fear, joy, love, sadness, and anger. The total data in this dataset is 5,153 with the last 1,000 data unlabeled to predict. Each label has a varied amount of data distribution, including 654 data for fear, 1,002 data for joy, 498 data for love, 1,123 data for sadness, and 876 data for anger.
Emotion Label | Fear | Joy | Love | Sadness | Anger |
---|---|---|---|---|---|
Total Data | 654 | 1,002 | 498 | 1,123 | 876 |
- Text cleaning/preprocessing
- Non-standard word replacement
- Feature extraction: BoW, TF-IDF
- Classification: Naive Bayes, SVM, Logistic Regression, Decision Tree
- Predicting
pandas
numpy
matplotlib
seaborn
re
string
emoji
unicodedata
unidecode
html
sklearn
tqdm
random
Confusion matrix
Predicted 0 1 2 3 4
Actual
0 158 23 2 8 23
1 61 115 4 5 12
2 29 8 57 2 3
3 41 5 0 76 9
4 41 9 0 3 114
Classification report
precision recall f1-score support
0 0.48 0.74 0.58 214
1 0.72 0.58 0.64 197
2 0.90 0.58 0.70 99
3 0.81 0.58 0.68 131
4 0.71 0.68 0.70 167
accuracy 0.64 808
macro avg 0.72 0.63 0.66 808
weighted avg 0.69 0.64 0.65 808
Confusion matrix
Predicted 0 1 2 3 4
Actual
0 142 41 6 1 24
1 47 127 7 3 13
2 15 15 64 1 4
3 42 10 1 70 8
4 50 16 0 1 100
Classification report
precision recall f1-score support
0 0.48 0.66 0.56 214
1 0.61 0.64 0.63 197
2 0.82 0.65 0.72 99
3 0.92 0.53 0.68 131
4 0.67 0.60 0.63 167
accuracy 0.62 808
macro avg 0.70 0.62 0.64 808
weighted avg 0.66 0.62 0.63 808
Confusion matrix
Predicted 0 1 2 3 4
Actual
0 137 33 9 9 26
1 36 125 10 7 19
2 16 13 66 2 2
3 20 10 2 90 9
4 40 16 1 7 103
Classification report
precision recall f1-score support
0 0.55 0.64 0.59 214
1 0.63 0.63 0.63 197
2 0.75 0.67 0.71 99
3 0.78 0.69 0.73 131
4 0.65 0.62 0.63 167
accuracy 0.64 808
macro avg 0.67 0.65 0.66 808
weighted avg 0.65 0.64 0.65 808
Confusion matrix
Predicted 0 1 2 3 4
Actual
0 83 43 16 16 56
1 38 91 17 12 39
2 18 10 65 5 1
3 15 10 4 92 10
4 36 31 4 9 87
Classification report
precision recall f1-score support
0 0.44 0.39 0.41 214
1 0.49 0.46 0.48 197
2 0.61 0.66 0.63 99
3 0.69 0.70 0.69 131
4 0.45 0.52 0.48 167
accuracy 0.52 808
macro avg 0.54 0.55 0.54 808
weighted avg 0.52 0.52 0.52 808
Confusion matrix
Predicted 0 1 2 3 4
Actual
0 186 18 0 0 10
1 94 101 0 0 2
2 73 19 6 0 1
3 93 3 0 31 4
4 83 8 0 0 76
Classification report
precision recall f1-score support
0 0.35 0.87 0.50 214
1 0.68 0.51 0.58 197
2 1.00 0.06 0.11 99
3 1.00 0.24 0.38 131
4 0.82 0.46 0.58 167
accuracy 0.50 808
macro avg 0.77 0.43 0.43 808
weighted avg 0.71 0.50 0.47 808
Confusion matrix
Predicted 0 1 2 3 4
Actual
0 154 32 6 0 22
1 45 134 4 2 12
2 20 12 65 1 1
3 38 10 0 74 9
4 50 16 0 1 100
Classification report
precision recall f1-score support
0 0.50 0.72 0.59 214
1 0.66 0.68 0.67 197
2 0.87 0.66 0.75 99
3 0.95 0.56 0.71 131
4 0.69 0.60 0.64 167
accuracy 0.65 808
macro avg 0.73 0.64 0.67 808
weighted avg 0.70 0.65 0.66 808
Confusion matrix
Predicted 0 1 2 3 4
Actual
0 144 34 6 5 25
1 36 139 4 3 15
2 17 11 67 1 3
3 29 10 0 83 9
4 41 16 0 2 108
Classification report
precision recall f1-score support
0 0.54 0.67 0.60 214
1 0.66 0.71 0.68 197
2 0.87 0.68 0.76 99
3 0.88 0.63 0.74 131
4 0.68 0.65 0.66 167
accuracy 0.67 808
macro avg 0.73 0.67 0.69 808
weighted avg 0.69 0.67 0.67 808
Confusion matrix
Predicted 0 1 2 3 4
Actual
0 98 44 21 16 35
1 51 77 17 13 39
2 14 11 64 5 5
3 20 9 3 92 7
4 63 22 4 10 68
Classification report
precision recall f1-score support
0 0.40 0.46 0.43 214
1 0.47 0.39 0.43 197
2 0.59 0.65 0.62 99
3 0.68 0.70 0.69 131
4 0.44 0.41 0.42 167
accuracy 0.49 808
macro avg 0.52 0.52 0.52 808
weighted avg 0.49 0.49 0.49 808
Model | F1 Score | Accuracy |
---|---|---|
NB-BoW | 64.36 | 64.36 |
SVM-BoW | 62.25 | 62.25 |
LR-BoW | 64.48 | 64.48 |
DT-BoW | 51.73 | 51.73 |
NB-TfIdf | 49.50 | 49.50 |
SVM-TfIdf | 65.22 | 65.22 |
LR-TfIdf | 66.96 | 66.96 |
DT-TfIdf | 49.38 | 49.38 |