This project builds a Spam Mail Detection model using Natural Language Processing (NLP) and Machine Learning techniques.
The model classifies messages as:
- Spam (0)
- Ham (1)
It uses TF-IDF Vectorization and Logistic Regression for classification.
- Python
- NumPy
- Pandas
- Scikit-learn
- Matplotlib
- Seaborn
- Load dataset
- Data cleaning & handling null values
- Label encoding (Spam = 0, Ham = 1)
- Train-test split (80-20)
- Text feature extraction using TF-IDF
- Model training using Logistic Regression
- Model evaluation (Accuracy, Confusion Matrix)
- Custom message prediction
- Training Accuracy: ~97%
- Testing Accuracy: ~96%
(Accuracy may slightly vary depending on random_state.)
Input: "Congratulations! You have won a $1000 gift card. Click the link now!"
Output: Spam
spam-mail-detection/ β βββ spam_mail_detection.ipynb βββ mail_data.csv βββ README.md
- Text preprocessing
- TF-IDF feature extraction
- Logistic Regression implementation
- Model evaluation techniques
- Real-world text classification workflow
Atiur Rahaman
AIML Student | Machine Learning Enthusiast