Skip to content

Who likes spam? NO ONE! This project implements a machine learning-based Spam Detection system in Java. The application classifies emails as either spam or ham (non-spam) using supervised learning techniques. It demonstrates data preprocessing, feature extraction, model training, and evaluation.

License

Notifications You must be signed in to change notification settings

davidomanovic/spam-detector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spam Detection by Generative Learning

Who likes spam? No one! This project implements a spam detection system using generative learning (Gen AI) by means of the Naïve Bayes algorithm. It preprocesses raw email data, extracts features, and evaluates the model's performance in Java. We also have a simple GUI here just to test if the model works as expected.

Example

Dataset

  • Source: SpamAssassin Public Corpus
  • Format: CSV with HTML formatted email content and labels (0 = ham, 1 = spam)

Workflow

  1. Preprocessing:

    • Removes headers and metadata from emails.
    • Strips HTML tags, special characters, and stopwords.
  2. Feature Extraction:

    • Converts email content into a bag-of-words representation.
    • Optionally uses TF-IDF for weighting.
  3. Model:

    • Naive Bayes classifier with Laplace smoothing.

Results

  • Accuracy: 98.79%
  • Precision: 100%
  • Recall: 96.31%
  • F1-Score: 98.12%

How to Run

  1. Clone the repository:
   git clone https://github.com/username/spam-detector.git
   cd spam-detector
  1. Compile the code:
javac src/main/*.java
  1. Run the model training program or the GUI to test visually the model (we need 4GB heap from the huge dataset):
java -Xmx4G -cp src/main ModelTrainer
java -Xmx4G -cp src/main SpamDetectorGUI

Acknowledgements

Special thanks to the SpamAssassin dataset for providing valuable email data for this project.

License

This project has a MIT license.


Let me know if you need help with anything else! 🎯

About

Who likes spam? NO ONE! This project implements a machine learning-based Spam Detection system in Java. The application classifies emails as either spam or ham (non-spam) using supervised learning techniques. It demonstrates data preprocessing, feature extraction, model training, and evaluation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages