Spam Detection by Generative Learning

Who likes spam? No one! This project implements a spam detection system using generative learning (Gen AI) by means of the Naïve Bayes algorithm. It preprocesses raw email data, extracts features, and evaluates the model's performance in Java. We also have a simple GUI here just to test if the model works as expected.

Dataset

Source: SpamAssassin Public Corpus
Format: CSV with HTML formatted email content and labels (0 = ham, 1 = spam)

Workflow

Preprocessing:
- Removes headers and metadata from emails.
- Strips HTML tags, special characters, and stopwords.
Feature Extraction:
- Converts email content into a bag-of-words representation.
- Optionally uses TF-IDF for weighting.
Model:
- Naive Bayes classifier with Laplace smoothing.

Results

Accuracy: 98.79%
Precision: 100%
Recall: 96.31%
F1-Score: 98.12%

How to Run

Clone the repository:

   git clone https://github.com/username/spam-detector.git
   cd spam-detector

Compile the code:

javac src/main/*.java

Run the model training program or the GUI to test visually the model (we need 4GB heap from the huge dataset):

java -Xmx4G -cp src/main ModelTrainer
java -Xmx4G -cp src/main SpamDetectorGUI

Acknowledgements

Special thanks to the SpamAssassin dataset for providing valuable email data for this project.

License

This project has a MIT license.

Let me know if you need help with anything else! 🎯

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github/workflows		.github/workflows
data		data
resources		resources
src/main		src/main
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example.gif		example.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spam Detection by Generative Learning

Dataset

Workflow

Results

How to Run

Acknowledgements

License

About

Releases

Packages

Languages

License

davidomanovic/spam-detector

Folders and files

Latest commit

History

Repository files navigation

Spam Detection by Generative Learning

Dataset

Workflow

Results

How to Run

Acknowledgements

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages